Why AI Code Reviews Don't Replace Targeted Error Checking (and How to Use Both)

AI code reviewers like Claude Code, GitHub Copilot, Cursor, and CodeRabbit are good at catching logic bugs, naming problems, missing tests, and architectural issues. They are not good at consistently finding every unhandled error case for the 160+ npm packages — axios, prisma, stripe, redis, and more — your TypeScript project depends on. AI reviewers are probabilistic. Ask the same question twice, get different answers. A deterministic scanner like Nark runs the same way every time, checking your code against Nark Profiles built from npm changelogs, CVEs, and package documentation.

Quick Answer: Use AI code review for logic, architecture, and intent. Use Nark for dependency error completeness. They solve different problems. Run both: npx nark --tsconfig ./tsconfig.json

What Are AI Code Reviews Good At?

AI code reviewers have genuine strengths. They read your code contextually, understanding intent, not just syntax. A tool like Claude Code or CodeRabbit can look at a function and tell you:

The variable name data should be userProfile for clarity
This function does two things and should be split
You're missing a test for the edge case where items is empty
This architecture will cause circular imports when you add the next feature

These are judgment calls. They require understanding what the code is trying to do, not just what it literally does. AI reviewers handle this well because they process code the way a senior engineer reads a PR: holistically, with context about patterns and intent.

// An AI reviewer catches this kind of problem:
async function processOrder(order: Order) {
  const user = await getUser(order.userId);
  const payment = await chargeCard(user.cardId, order.total);
  const shipment = await createShipment(order.items, user.address);
  // AI reviewer: "If chargeCard succeeds but createShipment fails,
  // you've charged the user without shipping. Consider a saga pattern
  // or compensating transaction."
  return { payment, shipment };
}

That's a real architectural insight. No static analysis tool catches it. AI reviewers earn their place in the workflow here.

What Do AI Code Reviews Miss?

AI reviewers are generalists. They know a lot about a lot of packages, but they don't carry a structured database of every error case for every npm package version. They miss three things consistently.

Package-specific error cases

The axios changelog documents that axios.get() throws AxiosError on 4xx, 5xx, and network failures. The Prisma docs document that prisma.user.create() throws PrismaClientKnownRequestError with code P2002 on unique constraint violations. The Stripe SDK throws StripeCardError when a card is declined.

An AI reviewer might know some of this. It might not. It depends on what was in the training data, the context window, the prompt, and the specific model version. Ask Claude to review an axios call on Monday and it might flag missing error handling. Ask on Tuesday with a different surrounding context and it might not.

Changelog-level knowledge

When Prisma v5 changed PrismaClientKnownRequestError to include a meta property, the Nark Profile was updated. When axios v1.6.0 added AxiosError.status as a convenience accessor, the Profile documented it. AI reviewers have a training cutoff. They don't read changelogs after that date. They don't know about breaking changes in versions released last month.

Consistency across a codebase

This is the critical gap. An AI reviewer sees one file at a time (or a PR diff). It might flag an unhandled axios call in the file it's reviewing. But it won't scan your entire codebase and find every instance of the same pattern across every file, every module, every package.

Real Example: 134 Unguarded Axios Calls in Botpress

We ran Nark against Botpress, a popular open-source chatbot platform. The scan found 134 unguarded axios calls across 35 integrations. Not one or two. 134 separate call sites where axios.get(), axios.post(), or axios.request() was called without a try-catch, across 2,348 files.

nark v1.2.0 scan results:
  Files analyzed: 2,348
  Axios call sites found: 194
  Unguarded calls: 134
  Across: 35 integrations

Here's the question: would an AI reviewer catch all 134?

If you send a single file to Claude Code or CodeRabbit, it would probably flag the missing try-catch. AI reviewers are good at spotting obvious error handling gaps in isolation. But Botpress has 35 separate integration packages. Each has its own src/client.ts or src/index.ts. Each makes axios calls in slightly different patterns. Some wrap calls in helpers. Some use interceptors. Some don't handle errors at all.

An AI reviewer processes PRs one at a time. It would need to review all 35 integration packages, remember the pattern from the first one, and apply it consistently to the 35th. That's not how PR review works. PRs touch one or two files. The AI sees a narrow window.

Nark sees the whole codebase at once. It doesn't get tired. It doesn't forget. It applies the same 21 postconditions from the axios Profile to every call site, every time.

Real Example: zod.parse() vs zod.safeParse()

Here's a subtler case. The zod library has two ways to validate data:

import { z } from 'zod';

const UserSchema = z.object({
  email: z.string().email(),
  name: z.string().min(1),
});

// Option 1: parse() - throws ZodError on invalid input
const user = UserSchema.parse(input);

// Option 2: safeParse() - never throws, returns discriminated union
const result = UserSchema.safeParse(input);
if (result.success) {
  const user = result.data;
} else {
  const errors = result.error;
}

parse() throws ZodError on validation failure. safeParse() never throws. If you use parse() without a try-catch, invalid user input crashes your request handler.

An AI reviewer might flag this. It might not. It depends on whether the model has seen enough zod-specific examples, whether the surrounding code provides enough context, and whether the prompt or system instructions mention error handling. Run the same review three times and you might get three different results.

Nark's zod Profile is explicit: parse() throws ZodError. parseAsync() throws ZodError. safeParse() never throws. safeParseAsync() never throws. The Profile even documents that parse() with async refinements throws $ZodAsyncError (a programmer error, not a validation error). Every parse() call without a try-catch is flagged. Every time. No variation.

// nark output:
// ERROR  zod  UserSchema.parse() called without try-catch
//        src/routes/signup.ts:18  in handleSignup()
//        zod parse() throws ZodError on validation failure
//        Fix: wrap in try-catch or use safeParse() instead

The Pre-Flight Checklist Analogy

You wouldn't skip a pre-flight checklist just because you have an experienced pilot. The pilot knows the aircraft. They've flown hundreds of times. They'll catch most problems instinctively. But the checklist catches what experience misses, consistently, every time. It doesn't have good days and bad days. It doesn't skip step 14 because step 13 looked fine.

AI code review is the experienced pilot. It brings judgment, context, and pattern recognition that no checklist can replicate. Nark is the pre-flight checklist. It checks the same 165+ Profiles, the same postconditions, the same error cases, on every scan.

Aviation learned this decades ago: checklists and expertise are complementary, not competing. Software is catching up.

How to Use Both: The Complementary Workflow

The highest-coverage setup runs AI review and Nark together. Each catches what the other misses.

What AI review handles

Logic errors: "This conditional is inverted"
Architecture: "This creates a circular dependency"
Naming: "This variable name doesn't reflect its contents"
Missing tests: "This branch has no test coverage"
Intent mismatch: "The function name says 'delete' but it archives"

What Nark handles

Dependency error completeness: "axios.get() on line 42 has no try-catch"
Package-specific error types: "Catch StripeCardError, not generic Error"
API surface coverage: "You call 6 prisma methods but only handle errors on 2"
Consistency: Same check across every file, every scan, every CI run
Changelog knowledge: Profiles updated when packages change throwing behavior

A practical CI setup

# .github/workflows/quality.yml
name: Code Quality

on: [pull_request]

jobs:
  ai-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      # AI review for logic, architecture, naming
      - uses: coderabbit-ai/coderabbit@v1
        # or your preferred AI reviewer

  nark-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      # Deterministic error handling check
      - run: npx nark ci --tsconfig ./tsconfig.json

The AI reviewer comments on your PR with suggestions about code quality. Nark fails the build if you have unhandled error cases. Different tools, different jobs, complete coverage.

Why Not Just Ask the AI Reviewer to Check Error Handling?

You can. You can prompt Claude Code or Copilot to focus on error handling. It will do a reasonable job for the files it sees. Three problems remain:

1. Scope. AI reviewers see a PR diff. Nark sees the full codebase. The unhandled axios call might be in a file nobody touched this sprint.

2. Consistency. Prompt an AI to check for missing error handling on ten different PRs. Compare the results. The depth and specificity will vary. Nark checks the same postconditions every time because they're defined in YAML, not generated on the fly.

3. Package knowledge depth. Nark's axios Profile has 21 postconditions covering get(), post(), put(), delete(), patch(), head(), request(), postForm(), putForm(), and patchForm(). Each with specific error types: 4xx/5xx responses, network failures, rate limiting (429), request setup errors, payload too large (413), and serialization errors. An AI reviewer won't enumerate all 21 unless you paste the axios documentation into the prompt.

The AI reviewer is better at telling you why your error handling logic is wrong. Nark is better at telling you where your error handling is missing. Use both.

Frequently Asked Questions

Can Claude Code or Copilot replace static analysis tools?

No. AI code reviewers are complementary to static analysis, not replacements. Static analysis tools like ESLint, Semgrep, and Nark run deterministically. They produce the same output on the same input every time. AI reviewers are probabilistic. They're better at judgment calls (architecture, naming, logic) and worse at exhaustive checking (did you handle every error case for every package call across every file).

Is Nark an AI tool?

Nark uses the TypeScript compiler to parse your code into an AST, then checks it against YAML-defined Profiles. There is no LLM in the scanning path. The Profiles themselves were researched with AI assistance (reading changelogs, CVEs, documentation), but the scanner is deterministic. Same code, same Profiles, same results.

What if the AI reviewer and Nark disagree?

They check different things, so they rarely conflict. If an AI reviewer says "this error handling is fine" but Nark flags a violation, check the Nark Profile. It's citing a specific postcondition from the package documentation. If Nark says the code is clean but the AI reviewer spots a logic issue in the catch block, fix the logic. Both can be right at the same time.

How many packages does Nark cover?

Nark ships with 165+ Profiles covering the most common npm packages: axios, prisma, stripe, openai, anthropic, redis, pg, zod, twilio, and many more. Each Profile encodes the throwing behavior documented in the package's official docs, changelogs, and CVE history.

Try It Now

npx nark --tsconfig ./tsconfig.json

Nark scans your TypeScript project against 165+ Nark Profiles. It finds the unhandled error cases that AI reviewers catch sometimes and miss sometimes. It finds them every time.

Run it alongside your AI code reviewer. Let the AI handle judgment. Let Nark handle completeness.