Using AI to Find Security Bugs in Your Code

The AI Security Audit That Changed Things

In 2026, researchers used Claude Code to systematically audit open source C codebases — not looking for anything in particular, just running repeatable prompts against suspicious-looking functions. They found a 23-year-old vulnerability in a widely-used Linux utility. Twenty-three years. Thousands of security researchers, static analysis tools, and code reviews had missed it. An AI model — trained on billions of lines of code, CVE disclosures, and security research papers — spotted the pattern in minutes.

Here’s the thing: that’s not magic. That’s pattern matching at scale. And that’s exactly what your code needs.

Why AI Code Auditing Actually Works

LLMs like Claude are trained on enormous datasets of real vulnerabilities and how they’re fixed. They’ve ingested CVE databases, security research papers, GitHub issues, Stack Overflow conversations where people ask “why is this code vulnerable?” They’ve seen SQL injection 50,000 times. They recognize command injection patterns. They know which functions are footguns and which aren’t.

Static analysis tools are rule-based. They catch what you tell them to look for. AI models catch patterns they’ve learned from examples. A linter flags eval() use in Python — reasonable. But an LLM understands why eval(user_input) is catastrophic, and can reason about whether that input is actually validated somewhere upstream.

Static analysis also generates noise — lots of false positives that make developers tune out warnings. AI auditing is conversational. You ask specific questions. You get reasoned answers with examples and explanations. It’s not a red/yellow/green dashboard; it’s a code review from someone who’s read every security paper ever written.

Approach 1: Audit a Suspicious Function

The simplest technique: paste a function and ask directly.

Review this function for security vulnerabilities. Look specifically for:
- Buffer overflows or memory safety issues
- SQL injection or command injection
- Unvalidated or unsanitized user input
- Missing access control checks
- Path traversal vulnerabilities
- Hardcoded secrets or credentials
- Race conditions (if applicable)
- Use of dangerous functions (eval, exec, system, etc.)

For each issue found, explain:
1. What the vulnerability is
2. An exploit scenario
3. A specific fix

Function:
[paste code here]

This works. Seriously. Most developers don’t think systematically about security. They write code. They test it. They ship it. An AI takes 10 seconds to think through 15 different attack vectors.

Approach 2: Systematic Codebase Audit

For a whole repository or microservice, create an audit session. Start with a broad security review, then drill into specific functions.

I'm auditing a Python/Node/Go/[language] project for security vulnerabilities.

Repository overview:
- Purpose: [what does it do?]
- Takes user input? [yes/no - if yes, from where?]
- Manages sensitive data? [what kind?]
- Network-facing? [yes/no]
- Runs as root? [yes/no]

First pass: scan the codebase conceptually. What are the highest-risk attack surfaces?
1. User input handling (forms, APIs, CLI args)
2. Database queries
3. File system access
4. External process execution
5. Authentication and authorization

Identify the 5 riskiest functions or code sections.
Then let's audit them one by one.

Then, for each function the AI flags:

Review this function in detail:
[paste function]

Is the user input validated? Where?

Approach 3: Claude Code for Interactive Audits

If you’re using Claude Code (or Cursor), you can run an entire audit session on your codebase:

Load the repo — point Claude Code at your project directory
Ask targeted questions — “Show me everywhere we execute shell commands. Is the input sanitized?”
Get file-specific audits — Claude Code can read your actual codebase, not guesses
Interactive fixes — “Write a patch that fixes the SQL injection in auth.py”

This is faster than copying-pasting. Claude Code understands the entire context.

What AI Auditing Actually Catches (Really Well)

SQL injection — unprepared queries with user input
Command injection — shell execution with unsanitized args
Path traversal — file access without os.path.realpath() checks
Hardcoded secrets — API keys, passwords in constants
Dangerous functions — eval(), pickle.loads(), unserialize(), exec(), subprocess.call() with shell=True
Missing auth checks — endpoints that should require login but don’t validate
XSS vulnerabilities — user input rendered without escaping
CSRF gaps — state-changing operations without token validation

What AI Misses (Use Tools Together)

AI is good, but it’s not a silver bullet:

Complex race conditions — requires timing analysis, not just pattern matching
False negatives — sometimes it says “that looks fine” when it’s actually vulnerable
Business logic flaws — “user can reset anyone’s password” isn’t a code smell, it’s a feature request gone wrong

Pair AI auditing with:

Semgrep — fast, customizable static analysis
Terminal window
```
semgrep --config=p/security-audit --json
```
Bandit (Python) — finds dangerous function usage
CodeQL (GitHub) — semantic code analysis, great for complex queries
OWASP ZAP or Burp (if web-facing) — dynamic testing

Pre-Commit Security Audits with LLMs

Want AI reviewing security-sensitive code automatically?

#!/bin/bash

# Find files with security keywords
DANGEROUS=$(git diff --cached --name-only | grep -E "\.(py|js|go|rs)$")

for file in $DANGEROUS; do
  DIFF=$(git diff --cached "$file")

  # Send to LLM API
  curl -X POST https://api.anthropic.com/v1/messages \
    -H "x-api-key: $ANTHROPIC_API_KEY" \
    -H "content-type: application/json" \
    -d "{
      \"model\": \"claude-opus-4-1-20250805\",
      \"max_tokens\": 1024,
      \"messages\": [{
        \"role\": \"user\",
        \"content\": \"Review this code change for security issues. If critical issues found, respond with REJECT. Otherwise, approve.\n\n$DIFF\"
      }]
    }" | grep -q "REJECT" && exit 1
done

exit 0

(Obviously, don’t block on LLM latency in real life — async this or use it as a warning, not a blocker.)

Real Example: Flask App with SQL Injection

Here’s a vulnerable Flask route:

from flask import Flask, request
import sqlite3

app = Flask(__name__)

@app.route('/user/<username>')
def get_user(username):
    conn = sqlite3.connect('users.db')
    cursor = conn.cursor()

    # This is vulnerable!
    query = f"SELECT email, role FROM users WHERE username = '{username}'"
    cursor.execute(query)

    user = cursor.fetchone()
    return {"email": user[0], "role": user[1]}

Ask Claude Code:

Review this Flask endpoint for SQL injection. What's the attack?

It immediately flags: username parameter is unsanitized. Attacker passes admin' OR '1'='1 and dumps the whole table. Fix: use parameterized queries.

query = "SELECT email, role FROM users WHERE username = ?"
cursor.execute(query, (username,))

Done. AI found it. You fixed it. Before it shipped.

The Real Outcome

You’re not replacing security researchers or static analysis. You’re adding a high-powered code reviewer who never sleeps, doesn’t get distracted, and has read every security paper ever written. Run an audit session before you deploy. Ask “what am I missing?” Ask it again on a refactor. Ask it before you merge that third-party dependency.

The Linux maintainers weren’t careless. They just didn’t run this particular audit. You should.