Shift-Left Security Scanning With Claude Code & Semgrep

`01` Introduction

Agentic engineering has changed how we build software – and, in the process, exposed a critical security weak point. As teams adopt agentic tools like Claude Code, the security review infrastructure, including shift-left frameworks that are built around human-paced development, becomes less reliable. The unit of production is no longer a function but a complete, deployable module generated in seconds. This acceleration has introduced the Verification Tax: a phenomenon where the labor of verification replaces the labor of creation.

BaxBench, a 2025 benchmark study from ETH Zurich published at ICML, evaluated 11 leading LLMs on 392 real-world backend generation tasks across 14 frameworks and six programming languages. No model exceeded 37% correct and secure generation. In every tested model, end-to-end attacks successfully exploited roughly half of the programs that passed all functional tests, demonstrating how important integrated security checks are.

BaxBench: Can LLMs Generate Correct and Secure Backends?

More recently, DryRun Security’s Agentic Coding Security Report (published in March 2026) moved beyond benchmarks to production-representative development. Claude Code (Sonnet 4.6), OpenAI Codex (GPT-5.2), and Google Gemini (2.5 Pro) were tasked to build two full applications from identical specifications, introducing features via pull requests as a real engineering team would. The results: 143 security issues across 38 scans, with 26 of 30 pull requests (87%) containing at least one vulnerability. Not a single agent produced a fully secure application.

For more than a decade, shift-left security functioned as the industry’s answer to catching vulnerabilities early, moving validation from post-deployment audits to the point of code creation. For human-paced development, it worked. The problem is that the tooling suite built to implement it has not kept up with the evolution of software development.

This article will guide you through a modernized shift-left architecture using Claude Code and Semgrep via MCP.

What This Article Covers:

Why conventional shift-left breaks down in AI-native workflows
How Claude Code, Semgrep, and MCP combine into a “scan-on-generate” architecture
Step-by-step implementation with real terminal commands and configuration
What the integration catches, its limits, and how to measure impact

`02` What Is Shift-Left Security & Why It Is Essential

Shift-left security is the principle of moving vulnerability detection from post-deployment audits directly to the point of creation.

In an AI-native workflow, however, “left” has moved further than ever: it now refers to the moment of generation, before code is shipped.

The economics behind it are well-established: a vulnerability caught at the coding stage costs minutes to remediate. The same vulnerability detected in a production incident is way more expensive in engineering time, compliance exposure, and reputational cost.

The Verification Tax necessitates this shift. While AI tools like Claude Code can generate the entire code base, 64% of development teams report that manually verifying this output takes as long as, or longer than, writing it from scratch. Senior developers are the hardest hit, spending an average of 4.3 minutes reviewing each AI suggestion compared to 1.2 minutes for juniors.When this verification tax is not addressed, organizations accumulate technical debt at an unsustainable rate. This is measured by the Technical Debt Ratio (TDR):

the Technical Debt Ratio (TDR)

A TDR above 5% is a high-risk indicator for systemic collapse. Traditional SAST tools fail at agentic development due to the speed of code generation. They flag issues after the developer has already moved to the next prompt, increasing the cognitive cost of refactoring.

The true shift-left architecture for Claude Code must run within the agent loop itself – in the same terminal session, against the code just produced – before moving on. This setup, using the Semgrep plugin, ensures consistent security checks for pattern-based vulnerabilities and secrets, giving you confidence in early detection while recognizing that it doesn’t catch logic flaws or zero-days, which remain challenging in AI-native workflows.

The benefits of an “In-Loop” Shift-Left approach include:

Deterministic Anchors: LLMs are probabilistic; Semgrep is deterministic. Integrating the two provides a “ground truth” that captures the AI-generated code that fails security tests even though it appears “functionally clean.”
The Regenerate Loop: By using MCP hooks, the agent is automatically prompted to fix findings such as SQL injection or hardcoded secrets in real time.
Developer Flow: A true Shift-Left framework runs checks silently in the background of the session. This design helps avoid ‘Alert Fatigue,’ making security checks less burdensome and more integrated into your workflow, so you can focus on development with peace of mind.

This architecture transforms Claude Code from a speed-optimized lines of code generator into a security-aware agent, closing the gap between functional correctness and production readiness.

`03` The Integration Architecture

Claude + Semgrep via MCP

These three components provide the base layer of the scan-on-generate security architecture:

#1. Semgrep: The Rules Engine

Semgrep is an open-source SAST tool that analyzes code structure to detect vulnerabilities such as injection flaws, secrets, and insecure patterns, with high precision and low latency – fast enough to run inline on every generation event without disrupting the development loop.

The three capabilities that matter for this architecture:

Code analysis. Over 5,000 rules covering OWASP Top 10, language-specific CVEs, and common insecure patterns across 30+ languages – all updated to map to the OWASP Top 10, with guidance on tuning rules to minimize false positives and negatives based on data from 127 million findings across 340,000 repositories.
Supply chain analysis. Dataflow reachability analysis identifies which vulnerable dependencies are actually exploitable in the specific codebase – not just which vulnerable versions are listed in the manifest. This reduces false positives for high- and critical-severity findings by up to 98% across 10 supported languages.
Secrets detection. Semantic analysis combined with entropy analysis to detect hardcoded credentials, API keys, and tokens in generated code before they reach version control. The DryRun study found poor JWT secret management and insecure default credentials in every final codebase tested – this is the layer that catches them at the generation stage.

#2. Claude Code: The Reasoning Layer

It handles intent analysis and contextual risk assessment. When Semgrep flags a pattern, Claude evaluates its exploitability and generates a logic-preserving patch.

For example, if Semgrep finds a potential SQL injection path in a newly generated repository layer, Claude can:

Evaluate whether it is a true positive or a false positive, in the context of the specific application’s architecture
Determine the actual risk severity given the data flows it has already generated
Generate a fix that preserves the original logic and coding style
Identify if the vulnerability is in auto-generated output that should be fixed at the source generator, not the output file
Explain the underlying secure coding principle so the developer learns, not just patches

The BaxBench study found that when models were given security-specific prompting – explicit instructions to reason about vulnerabilities – the rates of correct and secure generation improved significantly, particularly for reasoning models.

#3. MCP: The Integration Bridge

The Model Context Protocol provides the standardized interface that enables the loop to be automatic.

With it, Semgrep is registered as a tool in Claude Code’s environment, a post-write hook triggers a scan on every generated file, and findings return as structured context within the same session, before the developer has moved on.

The official Semgrep MCP server exposes the following tools to Claude Code:

security_check – runs predefined security rule sets against generated code
semgrep_scan – scans code files with a specified Semgrep config string
semgrep_scan_with_custom_rule – applies a custom YAML rule to generated code
get_abstract_syntax_tree – returns the AST of code, enabling Claude to reason about structure
semgrep_findings – fetches findings from the Semgrep AppSec Platform API for enterprise deployments

`04` Implementation

The following steps build the full architecture from scratch in Claude Code. Learn more about Semgrep implementation here.

Prerequisites

Claude Code version 2.1.7 or higher (run claude –version to verify)
Python 3.10 or later
A Semgrep account (free tier covers the core SAST and secrets capabilities)

Step 1 – Install and Verify Semgrep

# Via Homebrew (macOS/Linux)
brew install semgrep

# Via pip
python3 -m pip install semgrep

# Verify installation — must be 1.146.0 or higher
semgrep --version

# Log in to Semgrep to enable Pro rules and Supply Chain analysis
semgrep login

Tip: The authenticated Pro engine enables cross-file and cross-function dataflow reachability, reducing false positives by an additional 25% and increasing true positive detection by 250% compared to the Community Edition running locally without authentication.

Step 2 – Install the Semgrep Plugin in Claude Code

# Start a Claude Code instance
claude

# Open the plugin browser
/plugin

# Navigate to Discover, search for Semgrep, and install
# Then run the setup skill — this configures the MCP server and hooks
/setup-semgrep-plugin

This single command installs three components simultaneously:

The MCP server registers Semgrep’s scanning tools in Claude Code’s tool environment
Post-write hooks – trigger an automatic scan on every file Claude generates
Skills – instructs Claude on how to reason over Semgrep findings, not just surface them

Alternatively, for enterprise deployments where you want explicit control over the MCP configuration, add the following to your Claude Code settings directly:

{
  "mcpServers": {
    "semgrep": {
      "type": "stdio",
      "command": "uvx",
      "args": ["semgrep-mcp"],
      "env": {
        "SEMGREP_APP_TOKEN": "<your-token>"
      }
    }
  }
}

Enterprise deployment note: Use stdio transport (local, communicates via standard input/output) rather than the remote streamable-http endpoint (mcp.semgrep.ai).

Step 3 – Configure High-Signal Rule Sets

The default Semgrep rule set covers everything, which means it includes too much for inline development. Alert fatigue is the primary reason shift-left implementations fail in practice. The goal is not comprehensive coverage but high-confidence, low-noise coverage of the vulnerability classes that AI-generated code introduces consistently.

A recommended starting configuration for Claude Code workflows:

# .semgrep.yml — place in project root
rules:
  - id: use-security-audit
    patterns:
      - pattern: $X
    message: Security audit
    languages: [python, javascript, typescript, java, go]
    severity: ERROR

# Run with focused rule sets:
# semgrep scan --config=p/security-audit --config=p/secrets --severity=ERROR

For CI/CD enforcement, use the full suite. For the Claude Code inline loop, restrict to:

p/security-audit – OWASP Top 10 focused, high signal
p/secrets – hardcoded credentials and API keys
p/supply-chain – reachable vulnerable dependencies

Step 4 – Prompt Engineering for Security Reasoning

To maximize this shift-left approach, use prompts that force Claude to reason over the Semgrep output, for example:

“Scan this module with Semgrep. For any findings, differentiate between true positives and test placeholders. For real vulnerabilities, implement a secure fix that preserves our current architecture.”

Step 5 – The CI/CD Enforcement Gate

The scan-on-generate loop is the developer assist layer. It catches the majority of issues at generation, before a PR is ever opened. It is not the enforcement gate, and it should not be treated as one.

Configure the CI pipeline to run Semgrep independently on every PR and block merges on critical-severity findings:

# .github/workflows/semgrep.yml
name: Semgrep Security Gate

on:
  pull_request:
    branches: [main, develop]

jobs:
  semgrep:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write

    steps:
      - uses: actions/checkout@v4

      - name: Run Semgrep
        run: |
          pip install semgrep
          semgrep scan \
            --config=p/security-audit \
            --config=p/secrets \
            --sarif \
            --output=semgrep.sarif \
            --severity=ERROR

      - name: Upload SARIF findings
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: semgrep.sarif
        if: always()

The two layers serve two different functions.

The Claude Code + Semgrep MCP loop removes cognitive overhead for developers – security feedback arrives before they move on to the next prompt.
The CI gate provides the compliance record, the merge enforcement, and the audit trail that security teams and enterprise governance require. If a vulnerability survives the local loop, the CI gate catches it and surfaces a feedback signal: the local prompt engineering needs to be tightened.

`05` What Semgrep Catches

Every security tool has a defined surface area. Here is where this architecture thrives and where it struggles.

What Semgrep reliably catches

Pattern-based vulnerability classes – SQL injection, XSS, command injection, path traversal, insecure deserialization, use of dangerous functions (eval, exec, pickle.loads). These have clear structural signatures in the AST, are well-represented in Semgrep’s 5,000+ rule library, and map directly to the OWASP Top 10.
Hardcoded secrets and credentials – API keys, database passwords, private keys, and OAuth tokens embedded in generated code. Semgrep’s semantic analysis and entropy-based detection flags these attack vectors before they reach version control.
Reachable vulnerable dependencies – When Claude Code generates code that imports third-party packages, Semgrep Supply Chain’s dataflow reachability analysis determines whether the vulnerable function in that dependency is actually called in a vulnerable way, not just whether a vulnerable version is listed.
Insecure defaults in generated configurations – Missing HTTPS enforcement, permissive CORS configurations, and disabled security headers. These are pattern-matchable against Semgrep’s infrastructure and configuration rules.

What Semgrep does not catch

Logic and authorisation flaws – The biggest issue found in AI-generated code is the failure to consistently apply security measures. Common problems include authentication middleware for REST APIs not being used for WebSocket endpoints, rate limiting defined but not implemented, and client-side validation instead of server-side. These mistakes show a lack of oversight. Static analysis tools can’t check all these policies, as doing so requires a deeper understanding of the system’s architecture.
Zero-day vulnerabilities – Semgrep’s rules address known vulnerability patterns. A novel attack vector with no existing rule signature will not be detected. This is a property of all rule-based systems, not a limitation of Semgrep. This is why it’s important to supplement these security parameters with human oversight.
Multi-file architectural flaws – When a vulnerability emerges from the interaction of two modules (an authentication bypass that only exists when a specific combination of middleware ordering occurs across three files), Semgrep’s intra-file analysis may miss it. Semgrep Pro’s cross-file dataflow analysis significantly narrows this gap, but complex distributed-logic flaws remain on the boundary of what static analysis can reliably detect.
Infrastructure and cloud misconfigurations – Semgrep has rules for IaC (Terraform, CloudFormation, Kubernetes manifests). Still, misconfigurations in cloud networking, IAM policy design, and runtime environments sit outside the application code layer that the Claude Code + Semgrep MCP integration covers.

`06` Measuring the Security Impact

If the integration is not measured, it cannot be optimized.

Engineering leaders must move beyond vanity metrics: raw vulnerability counts without severity weighting tell you nothing about actual risk reduction. The signals that matter are those that demonstrate a measurable reduction in developer toil and security debt.

#1. Mean Time to Remediation (MTTR) in the AI Loop

The traditional security lifecycle suffers from a triage delay where CI/CD findings sit in a backlog for days or weeks. In a properly configured Claude Code + Semgrep loop, detection and remediation happen within the same session, MTTR should drop to minutes. A rising MTTR is a diagnostic signal: either prompts are ineffective, Claude is generating fixes that require significant manual rework, or the rule set is producing too many false positives for developers to engage with.

#2. The Verification Tax

As mentioned earlier, ~40% of the time saved by AI code generation is immediately consumed by reviewing, correcting, and verifying the output. This burden falls disproportionately on senior engineers, who spend high-value time auditing AI-generated code rather than focusing on architecture. The scan-on-generate loop is designed to absorb baseline verification automatically, shifting it from human review to deterministic tooling.

#3. Technical Debt Ratio

Security debt is a particularly high-interest form of technical debt. Defects caught at generation cost minutes to fix; the same defects discovered post-deployment carry weeks of engineering time, compliance exposure, and incident cost. The scan-on-generate architecture does not add a new security budget line. It reallocates existing remediation costs to the phase where it is much cheaper to absorb.

#4. Vulnerability Density at the CI Gate

Establish a baseline over two weeks of instrumentation before activating the local MCP loop, then measure the reduction after activation. This is the cleanest before/after signal for scan-on-generate impact, because it isolates the generation-time intervention from other security program changes. A successful integration shows a corresponding decline in CI gate findings as fewer issues survive the local loop.

`07` Conclusion

The transition to AI-driven development has transformed the economics of software security. The main hustle is no longer code generation, it is verification. As AI agents generate sophisticated, production-ready modules at speeds no human review process can match, verification debt accumulates silently – a problem that compounds with every PR merged and every deployment shipped.

The deployment of Claude Code with Semgrep via MCP represents a strategic move towards a verification-first SDLC. Fast, deterministic pattern-matching handles the known vulnerability classes that AI introduces structurally. Claude’s reasoning layer handles the contextual judgment that rules cannot replicate. MCP removes the manual step that breaks down under velocity pressure. Together, they form a defence that addresses both the surface-level risks that static analysis was designed to detect and the logic-level risks that require semantic understanding.

The goal is not to eliminate all risks; this is nearly impossible, as zero-day threats and architectural flaws will always require human judgment. Instead, it is to minimize the cost of security remediation by identifying vulnerabilities early. This approach helps reduce the Mean Time to Remediation (MTTR) from days to minutes, ensuring that the efficiency gains from AI translate into genuine quality improvements rather than an increasing burden of technical debt.

Ready to Develop, Secure, and Scale agentic development workflows across your enterprise? Contact us!