← Back to Blog

The Real Cost of One Unhandled Error in Production

By Nark Team

A single unhandled error in production costs between $5,000 and $150,000 depending on where it happens, how long it takes to detect, and which customers it affects. Most engineering teams underestimate this because they only count the fix. The fix is the cheapest part. The real cost is the incident response, the lost revenue during downtime, the customer trust that never comes back, the engineer who was woken up at 2am and is now useless the next day, and the three meetings that follow to make sure it never happens again. All of that from a missing try-catch.

Quick Answer: The average production incident costs $5,000-$50,000 when you account for response time, lost revenue, engineering hours, and customer churn. A missing try-catch around axios.get() or an unhandled Prisma P2002 is the most common root cause. Preventing one incident per quarter pays for any tooling investment many times over. Check your codebase now: npx nark --tsconfig ./tsconfig.json


The Anatomy of a $25,000 Bug

Here is a composite incident based on patterns common across SaaS companies. The specifics change. The cost structure does not.

Thursday, 2:17 AM. PagerDuty fires. The payments endpoint is returning 500 errors. A Stripe webhook that processes subscription renewals is crashing because stripe.subscriptions.retrieve() throws StripeInvalidRequestError when a subscription was deleted on Stripe's side. No try-catch. The error propagates up, crashes the webhook handler, and Stripe marks the webhook as failing.

2:17 - 2:45 AM. The on-call engineer wakes up, opens their laptop, reads the alert, pulls up logs, identifies the error, and starts investigating.

2:45 - 3:30 AM. The fix is a try-catch with a check for deleted subscriptions. The engineer writes it, tests locally, pushes a PR, waits for CI, merges, and deploys.

3:30 - 4:00 AM. The engineer verifies the fix in production. Retries the failed webhooks. Confirms no data was lost. Writes a brief incident note. Goes back to bed.

The next morning. Three customers emailed support because their dashboard showed "subscription inactive" for 90 minutes. The CS team spends 45 minutes responding to each one. The engineering manager schedules a post-mortem. The on-call engineer is tired and unproductive for the rest of the day.

The following week. 30-minute post-mortem meeting with 5 engineers. Action items are created. One customer on the enterprise plan asks for a formal incident report for their compliance team. The account manager spends 2 hours preparing it.

Total elapsed time from bug introduction to full resolution: 6 days. Total cost:


The Cost Breakdown

1. Direct Engineering Time: $2,500 - $5,000

ActivityTimeCost (at $150/hr fully loaded)
On-call response (2am investigation)2 hours$300
Fix, test, deploy1 hour$150
Post-incident verification0.5 hours$75
Post-mortem preparation1 hour$150
Post-mortem meeting (5 engineers)2.5 hours total$375
Action item implementation3 hours$450
Code review of fix + action items1 hour$150
Subtotal~11 hours$1,650

This is just the direct time. It does not include the productivity loss from context switching, the interrupted sleep, or the cognitive overhead of "we need to be more careful."

2. Lost Productivity (The Invisible Tax): $1,500 - $3,000

The on-call engineer loses the next day. Not completely, but they are running at 50% capacity after a 2am wake-up. That is 4 hours of lost productive output.

The post-mortem meeting pulls 5 engineers out of their flow for 30 minutes each. Research on maker schedules shows that a 30-minute meeting costs roughly 2 hours of deep work due to context switching. That is 10 hours of lost deep work across the team.

Every engineer on the team now spends a few minutes wondering "could this happen in my code?" Some of them spend 30 minutes auditing their own endpoints. This is productive work, but it is unplanned and displaces planned work.

Estimated lost productivity: $1,500 - $3,000

3. Customer Support: $500 - $2,000

ActivityTimeCost
3 customer emails, 45 min each2.25 hours$180
Enterprise incident report2 hours$200
Account manager follow-up calls1 hour$150
Subtotal~5 hours$530

For B2B SaaS with enterprise customers, the incident report alone can take a full day if the customer requires a specific format or a root cause analysis with remediation steps.

4. Lost Revenue: $0 - $50,000+

This is the variable that makes the difference between a $5,000 incident and a $150,000 incident.

Scenario A: Internal tool, no customer impact. $0 in lost revenue. The bug affected an internal dashboard. Nobody noticed except the on-call engineer.

Scenario B: 90 minutes of degraded payments. Some renewal webhooks failed. Stripe retries them after the fix. No revenue permanently lost, but 3 customers saw incorrect subscription status. One of them is on a $2,000/month plan and sends an angry email.

Scenario C: Critical path failure during peak hours. The signup endpoint crashes for 4 hours during a marketing campaign. Conversion tracking shows 200 signups were attempted during that window. At a 30% conversion rate and $50/month average revenue, that is $3,000/month in recurring revenue lost permanently. Over 12 months: $36,000.

Scenario D: Data corruption. The unhandled error caused a partial write. Customer data is inconsistent. The engineering team spends 3 days writing a migration script to fix affected records. Two customers leave because they lost trust. Annual contract value: $24,000 each.

5. Customer Trust and Churn: $5,000 - $100,000+

This is the cost that nobody tracks but everyone feels.

A customer who experiences a production bug does not forget it. They remember it every time they evaluate whether to renew. Research from Bain & Company shows that the cost of acquiring a new customer is 5-25x the cost of retaining an existing one.

If one enterprise customer churns because of accumulated trust erosion from incidents, the cost is their annual contract value plus the cost of replacing them:

  • Lost annual revenue: $24,000 - $120,000
  • Sales cost to replace: $5,000 - $15,000 (rep time, demos, negotiation)
  • Onboarding cost for new customer: $2,000 - $5,000

One churned enterprise customer can cost more than an engineer's annual salary.

6. Liability and Compliance: $0 - $50,000+

For companies handling financial data, health data, or operating under SOC 2, HIPAA, or GDPR:

  • A production error that exposes data or causes incorrect financial calculations can trigger audit findings
  • Audit remediation costs $10,000 - $50,000+ depending on severity
  • Customer contracts with SLA commitments may include financial penalties for downtime
  • Some enterprise contracts require formal incident reports that take 4-8 hours to prepare

7. Team Morale: Unmeasurable but Real

Engineers who get paged at 2am for preventable bugs burn out faster. The research is clear: on-call fatigue is a top driver of engineering turnover. Replacing a senior engineer costs $50,000 - $100,000 in recruiting, onboarding, and lost institutional knowledge.

Nobody quits over one incident. But every preventable incident adds weight. The engineer who gets paged three times in a quarter for missing try-catch blocks starts updating their LinkedIn.


Total Cost Summary

Cost CategoryLow EstimateHigh Estimate
Direct engineering time$1,650$5,000
Lost productivity$1,500$3,000
Customer support$500$2,000
Lost revenue$0$50,000
Customer churn$0$100,000
Liability/compliance$0$50,000
Total per incident$3,650$210,000

The median SaaS production incident from an unhandled error costs roughly $5,000 - $25,000 when you include all direct and indirect costs. Critical path failures during business hours push this above $50,000.


The Math on Prevention

A tool that prevents one production incident per quarter saves $20,000 - $100,000 per year.

Nark scans your TypeScript codebase against 160+ Nark Profiles and reports every unhandled error case. It catches the exact bugs that cause these incidents: missing try-catch on axios.get(), unhandled PrismaClientKnownRequestError P2002, uncaught StripeCardError, Redis clients without .on('error') handlers.

npx nark --tsconfig ./tsconfig.json

The scan takes 30 seconds. The violations it finds are the specific code paths that will page someone at 2am.


The Bugs That Actually Cause Incidents

These are not theoretical. These are the patterns that page engineers:

1. axios without try-catch. The upstream API returns 503 during a deployment. Your endpoint crashes. Every user hitting that feature sees a 500 for the duration of the outage.

2. Prisma P2002 unhandled. A user signs up with an email that already exists. Instead of "email already taken," they see "Internal Server Error." They try again. Same error. They leave.

3. Stripe StripeCardError uncaught. A customer's card is declined. Instead of showing "please update your payment method," the charge function crashes. The payment silently fails. The customer's subscription lapses without notification.

4. Redis client without error handler. The Redis connection drops during a deployment. Node.js treats the unhandled error event as an uncaught exception. The process crashes. Every request fails until the process restarts.

5. OpenAI RateLimitError unhandled. Traffic spikes and you hit the rate limit. Instead of queuing or returning a graceful fallback, your AI feature throws a 500. Users see broken functionality and assume the product is down.

Every one of these is a missing try-catch or a missing .on('error'). Every one of these is caught by Nark before it ships.


Prevention vs. Response

Preventing the bugResponding to the incident
Time5 minutes (add try-catch)2-8 hours (investigate, fix, deploy, verify)
People involved1 engineer3-7 people (on-call, manager, support, affected engineers)
Customer impactNoneMinutes to hours of degraded service
Cost$12.50 (5 min at $150/hr)$5,000 - $25,000+
StressZeroHigh (2am page, time pressure, customer complaints)

The ratio is roughly 400:1 to 2000:1. Five minutes of prevention saves $5,000-$25,000 of response.


Frequently Asked Questions

How many unhandled errors does a typical TypeScript project have?

Based on scanning 6,283 open-source TypeScript repositories, the median project has 4-8 unhandled npm package errors. Projects with 50+ dependencies typically have 10-20. Each one is a potential production incident.

What if we have good monitoring?

Monitoring reduces the detection time (MTTD). It does not reduce the response time (MTTR), the customer impact, or the engineering cost. Monitoring tells you the house is on fire. Nark prevents the fire.

What about unit tests?

Unit tests cover the scenarios someone thought to test. Unhandled errors are by definition the scenarios nobody thought about. You do not write a test for a try-catch you forgot to add. Nark checks for the try-catch itself.

Is this relevant for early-stage startups?

More so. A startup with 5 engineers has no dedicated on-call rotation. A 2am incident means the CTO or a founding engineer is investigating. Their time is the most expensive in the company. A single prevented incident pays for the entire engineering tooling budget.


Try It Now

npx nark --tsconfig ./tsconfig.json

Nark checks 160+ packages — including axios, prisma, stripe, and redis — for correct error handling. Every violation it finds is a production incident that has not happened yet. Fix them now for $12.50 each, or fix them later for $5,000-$25,000 each.