Pull Requests Have Changed

The pull request you are reviewing today was probably written by AI. Maybe entirely. Maybe partially. Either way, the review process needs to evolve because AI-generated code has different failure patterns than human-written code, and the traditional approach of skimming the diff and approving is now actively dangerous.

I review AI-generated PRs every day. Here is the framework I have built for catching the bugs that slip through standard review processes.

Why Traditional Code Review Falls Short

Traditional code review evolved for human-written code. Reviewers look for:

  • Style consistency
  • Obvious logic errors
  • Missing error handling
  • Performance concerns
  • Architecture violations

These are still important, but AI-generated code creates a new problem: it passes the smell test. The code looks clean, follows conventions, uses reasonable variable names, and reads like it was written by a competent developer. This surface-level quality makes reviewers less vigilant, which is exactly when bugs slip through.

The AI Code Review Framework

Layer 1: Intent Verification

Before reading any code, answer this question: what is this PR supposed to do?

Read the PR description. Read the ticket or issue it references. Understand the business requirement. Then, as you read the code, constantly check whether the implementation actually achieves that intent.

AI is excellent at generating code that does something. It is less reliable at generating code that does the right thing. The gap between "this code works" and "this code solves the problem" is where the most expensive bugs hide.

Layer 2: Assumption Audit

Every piece of code makes assumptions. Human developers make assumptions consciously and can explain them when asked. AI makes assumptions based on pattern matching and cannot explain why.

For every AI-generated function, identify the assumptions:

  • What does it assume about input types and ranges?
  • What does it assume about the state of the database or external services?
  • What does it assume about the order of operations?
  • What does it assume about concurrent access?

Then verify each assumption against your actual system. This is where context mismatch bugs surface.

Layer 3: Dependency Validation

AI frequently references libraries, methods, and APIs that either do not exist or behave differently than assumed. For every external reference in the PR:

  • Verify the import is a real package in your dependency list
  • Verify the method exists in your installed version
  • Verify the method signature matches how it is being called
  • Verify the return type matches how the result is being used

This takes time but catches a category of bug that is nearly impossible to find through logic review alone.

Layer 4: Error Path Analysis

Read the code backward from every error handler. AI generates optimistic code — it builds the happy path fluently and handles errors as an afterthought. Check:

  • Are errors caught at the right granularity? (Catching Error when you should catch NetworkError)
  • Do catch blocks actually handle the error or just log and continue?
  • Are error messages specific enough to debug in production?
  • Does error handling clean up resources (close connections, release locks)?
  • Are errors propagated to callers who need to know about them?

Layer 5: Test Coverage Assessment

If the PR includes tests, evaluate whether the tests actually verify behavior or just confirm the code runs. AI-generated tests often:

  • Test that a function returns something without verifying it returns the right thing
  • Mock so aggressively that the test does not exercise any real logic
  • Cover the happy path thoroughly but skip edge cases
  • Assert on implementation details rather than behavior

A test suite that passes but does not catch regressions is worse than no tests because it creates false confidence.

Red Flags in AI-Generated PRs

Over hundreds of reviews, I have identified patterns that reliably indicate problems:

Overly Generic Variable Names

When AI does not fully understand the domain, it falls back to generic names like data, result, items, or response. If a PR is full of generic names, the AI probably did not have enough context about the specific domain.

Suspiciously Clean Code

Real-world code has rough edges — workarounds for known issues, comments explaining non-obvious decisions, handling for legacy data formats. AI-generated code that is too clean probably does not account for the messy realities of your system.

Copy-Paste Patterns

AI sometimes generates multiple similar functions that should be abstracted into a single parameterized function. If you see the same logic repeated with minor variations, the AI was pattern-matching rather than thinking about structure.

Missing Validation

AI often trusts input data implicitly. If a function accepts user input and processes it without validation, that is a security concern regardless of how clean the processing logic looks.

Hardcoded Configuration

AI embeds configuration values directly in code more often than humans do. API URLs, timeouts, retry counts, and feature flags should be configurable, not hardcoded.

The Review Conversation

Reviewing AI code changes how you give feedback. You are not giving feedback to the original author — you are giving feedback to the person who prompted the AI. Frame comments around:

  • Missing context: "The AI probably did not know that our database has a unique constraint on this field."
  • Unstated requirements: "This handles the API call but does not account for our rate limiting middleware."
  • Verification requests: "Can you confirm that this library method exists in the version we are using?"

This framing is more productive than traditional review comments because it addresses the root cause (insufficient context) rather than the symptom (incorrect code).

Building Review Muscle

Effective AI code review is a skill that develops with practice. Start by:

  1. Keeping a bug journal. Track every bug that escaped review. Note the category (logic error, context mismatch, hallucinated API, etc.) and what would have caught it.
  2. Reviewing your own AI-generated code first. Before submitting a PR, review it as if someone else wrote it. This builds the habit of critical reading.
  3. Time-boxing reviews. Spend at least five minutes per hundred lines of AI-generated code. Rushing through reviews defeats the purpose.
  4. Automating what you can. Linters, type checkers, and dependency validators catch mechanical errors so you can focus on logic and architecture.

FAQ

Should AI-generated PRs be labeled as such?

Yes. Knowing that code was AI-generated changes how you review it. The reviewer should know to watch for AI-specific failure patterns rather than human-specific ones.

How do I review a PR when I do not understand the AI tool that generated it?

Focus on the output, not the tool. You do not need to understand how the AI works to evaluate whether the code is correct. Apply the same framework: verify intent, audit assumptions, validate dependencies, analyze error paths, and assess test coverage.

Is AI-generated code less reliable than human-written code?

Not necessarily less reliable, but differently reliable. AI code is more consistent in style and less likely to contain typos, but more likely to contain subtle logic errors and context mismatches. The reliability depends entirely on the quality of the review process.

How long should an AI code review take compared to a human code review?

Expect to spend roughly the same time or slightly more. The code reads faster because it is clean and consistent, but the verification steps (checking dependencies, testing edge cases, auditing assumptions) add time that traditional reviews do not require.

Share this article
LinkedIn (opens in new tab) X / Twitter (opens in new tab)
Written by Atticus Li

Revenue & experimentation leader — behavioral economics, CRO, and AI. CXL & Mindworx certified. $30M+ in verified impact.