I Tried to Reverse Engineer Claude Code's Usage Limits

I hit my Claude Code limit last week. Middle of regular work. Just a couple of coding sessions. Came out of nowhere.

During peak hours (5am-11am PT), your 5-hour limit burns faster. About 7% of users will hit limits they didn't before. The advice: shift heavy work to off-peak hours.

That's the whole explanation. No formula. No dollar figure. Just "you'll hit limits faster" and a suggestion to rearrange your workday.

I pay $200/month for Max 20x. I want to know what I'm buying.

Tools like ccusage help with part of this. ccusage reads your local Claude Code session logs. It tells you how many tokens you used, which model served each request, and what it would cost at API rates.

Useful. But it answers a different question.

It tells you what you spent. Not what your budget is. Not where you stand against your 5-hour limit. Not when that window resets. That data isn't in the local logs.

It lives in the HTTP headers Anthropic sends back on every API response. Claude Code gets them and throws them away.

So I built a proxy to catch them.

The hidden headers

Every API response includes headers called anthropic-ratelimit-unified-*. Right there in the HTTP layer. Here's what's in them:

Utilization per window. A number between 0.0 and 1.0. Not a vague bucket. An exact float. Anthropic tracks three windows: 5h, 7d, and 7d_sonnet. Each has its own quota.

Status per window. allowed, exceeded, or rate_limited. The 5-hour and 7-day windows are independent. You can be fine on one and throttled on the other.

Reset timestamps. Unix epoch. You can calculate exactly when your window resets. To the second.

Surpassed threshold. A boolean that flips when you cross the limit.

All of this is sent to your machine on every API call. The percentage bar in Claude Code is a lossy summary.

Free Claude Code crash course

60-min video lesson + CLAUDE.md starter kit. Yours when you subscribe.

How I intercepted it

I built a proxy in Go. Sits between Claude Code and api.anthropic.com. Forwards every request unchanged. Reads the response headers and token counts. Writes them to a log. Passes everything through.

git clone https://github.com/abhishekray07/claude-meter
cd claude-meter && go build -o claude-meter ./cmd/claude-meter
./claude-meter setup    # points Claude Code at localhost:7735
./claude-meter start

Two minutes to set up. 1-2ms of added latency.

Claude Code → Proxy (:7735) → api.anthropic.com
                  ↓
           Raw JSONL capture
                  ↓
           Background normalizer
                  ↓
           Offline analysis + live dashboard

143 lines of Go. Standard library only. Zero external dependencies. I wanted something I could trust in my API path for months. Like a water meter on a pipe: it reads the flow, it doesn't change it.

What made this harder than expected

The proxy is simple. The normalization is where things get messy.

Claude Code streams responses using Server-Sent Events. Token counts arrive in two separate events. message_start has input and cache counts. message_delta has the final output count. You have to parse the stream, find both, and merge them.

Responses are also gzip-compressed. Some streams are truncated because the proxy grabbed what it could while forwarding. The parser handles partial gzip. Reads what it can. Moves on.

Then there's the headers. Not a flat list. A set of prefixed keys:

anthropic-ratelimit-unified-status: allowed
anthropic-ratelimit-unified-representative-claim: five_hour
anthropic-ratelimit-unified-fallback-percentage: 0.5
anthropic-ratelimit-unified-5h-status: allowed
anthropic-ratelimit-unified-5h-reset: 1774933200
anthropic-ratelimit-unified-5h-utilization: 0.07
anthropic-ratelimit-unified-7d-status: allowed
anthropic-ratelimit-unified-7d-utilization: 0.53

The parser groups these by window (5h, 7d, 7d_sonnet). Pulls out status, reset time, utilization, and threshold. A normalized record:

{
  "request_model": "claude-opus-4-6",
  "usage": {
    "input_tokens": 1210,
    "cache_creation_input_tokens": 4723,
    "cache_read_input_tokens": 156027,
    "output_tokens": 310
  },
  "ratelimit": {
    "status": "allowed",
    "representative_claim": "five_hour",
    "fallback_percentage": 0.5,
    "windows": {
      "5h": { "status": "allowed", "utilization": 0.07 },
      "7d": { "status": "allowed", "utilization": 0.53 }
    }
  }
}

The log writer strips authorization, cookie, and x-api-key headers before anything touches disk. The capture channel is non-blocking with a 256-item buffer. If the normalizer falls behind, it drops exchanges instead of blocking the API call.

25,000+ requests over a week. Zero drops.

Raw logs go to ~/.claude-meter/raw/ as daily JSONL files. Normalized records to ~/.claude-meter/normalized/. If you change the parsing logic later, claude-meter backfill-normalized re-runs everything from raw.

Reverse engineering the budget

25,477 API calls over a week. The question: how much is my 5-hour budget worth in dollars?

How the meter probably works

Anthropic's API pricing is public:

Opus 4.6 (per million tokens):
  Input:        $5.00
  Output:       $25.00
  Cache write:  $6.25
  Cache read:   $0.50

My assumption: the quota meter uses these same weights. Each API call has a cost. The meter tracks that against a ceiling.

Easy to test. I processed 2.27 billion tokens over the week. 95.2% were cache reads. If the meter counted raw tokens, I'd have blown past the limit on day one. Didn't happen. Utilization barely moved during cache-heavy sessions.

Cache reads cost $0.50/MTok. Fresh input costs $5.00/MTok. Output costs $25.00/MTok. That's a 50x spread between the cheapest and most expensive token type. It explains everything about when you hit the limit and when you don't.

The hard part: the ceiling

The weights are step one. The actual budget ceiling is harder.

Headers give you utilization (0.0 to 1.0). Never the dollar amount. So I backed into it:

Watch for utilization drops. When the 5h window resets, utilization falls toward zero. That's a boundary.
Between resets, add up the price-weighted cost of every API call.
Divide cost by the utilization delta.

Say I spent $50 and utilization went from 0.0 to 0.30. Implied budget: ~$167.

The challenge is noise. Some sessions are tiny: a quick question, $3 of usage. Some mix models. Some have gaps where utilization didn't update cleanly.

So the analysis filters hard. Single-model intervals only. Max 10 requests. Max 20 minutes. It throws away the first interval in each group because the anchor point is unreliable.

After filtering, 63 reset cycles on the 5-hour window:

5h window:
  Range:    $3 - $705
  Median:   $164
  P25-P75:  $104 - $272
  Sessions: 63

7d window:
  Range:    $718 - $2,659
  Median:   $1,378
  P25-P75:  $1,189 - $1,644
  Sessions: 8

Why the range is so wide

There are assumptions that could be wrong:

The weights might not match. I'm using Anthropic's published API pricing. They might use internal compute costs instead. If GPU cost doesn't track the public price sheet, my weights are off.

The ceiling might move. Thariq said limits are tighter during peak hours. The 5h budget could change by time of day. My analysis doesn't split peak vs off-peak yet. Some variance might be a real moving target, not noise.

Thinking tokens might count differently. They're billed as output. But the proxy can't see them. They don't show up in the usage response. Maybe the meter weights them differently than visible output.

Utilization might not update per-call. The headers could batch updates or report stale values. A utilization of 0.07 might come from a slightly different set of calls than what I captured.

The median is the useful number. $164 per 5-hour window. $1,378 per week. This is one person on one plan. More data from more people makes the estimates sharper.

I pay $200/month. Estimated weekly budget: ~$1,400 in API value. Roughly $5,600/month. If these numbers are right, that's about a 28x multiplier on what I pay. But the variance is wide enough that I wouldn't bet on it. Rough order of magnitude, not a measurement.

Other things in the headers

Every response has a representative_claim field. Mine always says five_hour. That's which window is binding right now.

There's fallback_percentage (0.5 in my data). Might be how Anthropic degrades service near the limit. And overage_status (always rejected for me, since I haven't turned on extra usage). The system checks if you've opted into overflow before throttling.

Some records skip the 5h utilization entirely. Especially Haiku calls. The meter tracks model tiers separately. 7d_sonnet has its own window with its own budget ($843 in my single observed session). Makes sense. Sonnet is cheaper to run.

The 5-hour window is the bottleneck. Simple coding sessions last all day. One deep architecture conversation throttles me by lunch.

Then the source code leaked

On March 31, Anthropic accidentally shipped a source map file in their npm package. A 60MB .map file that Bun generates by default. Someone forgot to add *.map to .npmignore. Within hours, 512,000 lines of TypeScript were mirrored across GitHub.

I went straight to the rate limit code. Here's what it confirmed and what it revealed.

What it confirmed

Everything I found from the proxy is correct. The source (claudeAiLimits.ts) reads the exact same anthropic-ratelimit-unified-* headers I was capturing. Same windows: five_hour, seven_day, seven_day_opus, seven_day_sonnet. Same status values. Same utilization float.

No hardcoded limits anywhere in the client. Everything comes from headers.

The representative_claim header picks the window with the furthest reset time. When multiple limits are hit, it shows whichever takes longest to recover. Mine always said five_hour because that was always the binding constraint.

What I couldn't see from outside

The source revealed things the proxy never could.

Warnings only show at 70%. Below that, Claude Code suppresses the warning as "potentially stale." This is why users feel blindsided. You're at 65% utilization and see nothing. Then suddenly "approaching limit." No middle ground.

Early warnings compare burn rate to clock time. The 5h window warns at 90% utilization, but only if less than 72% of the window has elapsed. Burning fast early? Warns sooner. Burning slow late? Stays quiet. The 7d window has three tiers:

75% utilization when 60% of time passed
50% utilization when 35% of time passed
25% utilization when 15% of time passed

Opus-to-Sonnet fallback is silent. When you hit an Opus limit and anthropic-ratelimit-unified-fallback is available, Claude Code catches the 429 and switches to Sonnet. No notification. Your next response comes from a cheaper model. You won't notice unless you check /model.

429s aren't retried for subscribers. If you're on Pro or Max and hit a 429, Claude Code doesn't retry. It shows you a menu: upgrade, enable extra usage, or wait. API users get automatic retries. Subscribers don't. Exception: Enterprise with pay-as-you-go.

Two smaller things. The server sends fallback_percentage (0.5 in my data) but the client never reads it. Dead code, or a feature that hasn't shipped. And Anthropic has 19 internal test scenarios in a /mock-limits command, with presets like session-limit-reached, approaching-weekly-limit, opus-limit, fast-mode-limit. The size of that test matrix tells you how complex this system is.

The percentage bar uses the same float. The /usage panel calls GET /api/oauth/usage, gets a number 0-100, divides by 100, renders a Unicode block-character bar (▏▎▍▌▋▊▉█). The exact number is right there. They chose to show blocks instead.

The dashboard

The proxy serves a dashboard at the same port. Open localhost:7735 in a browser:

Utilization gauges for 5-hour and 7-day, with peak markers
Token breakdown. 2.16 billion cache reads vs 15 million fresh input. That ratio is the whole story.
Budget estimates with range, median, and IQR per window
Utilization over time. The 5h window is spiky. The 7d is a slow climb.
Per-model breakdown. Opus: 12,046 calls (47%). Sonnet: 8,362. Haiku: 2,644.

Refreshes every 5 seconds. No build step. No npm.

What to do with this

Stay in sessions. Cache reads barely count. Every cold start repays the cache write cost at full weight. One long session is cheaper than four short ones.

Use medium effort. High effort makes more thinking tokens. Thinking tokens are output. Output is the most expensive thing on the meter.

Space out big sessions. The 5-hour window is the real constraint. If you have a big refactor later, don't burn the window on a deep conversation in the morning.

The obvious counterargument

Anthropic has reasons to keep this opaque. Dynamic limits let them manage load across millions of users. Publishing a formula invites gaming. I get it.

You can also run /cost or /stats in Claude Code to see token counts for your session. Not nothing.

But the leaked source tells a different story. The utilization float is already in the client. The /usage panel already fetches it and renders a bar. The statusline hook already reads rawUtilization from every response. The data flows through the client and gets displayed as a vague bar instead of a number.

Showing "You've used 73% of your 5-hour limit, resets in 2h14m" doesn't reveal the formula. It just shows what you've used. Like a gas gauge. You don't need to know the tank size to the milliliter. You need to know when you're running low.

Every cloud provider does this. AWS shows your service quotas. GCP shows your API usage. Anthropic computes the number, sends it to your machine, renders it as a bar, and hides the number behind it.

I'm on Max 20x. I plan my day around Claude Code. When the limit hits at 2pm on a Tuesday, that's not minor. I need to see it coming.

Claude Meter makes the invisible visible. If you're on Pro or a different Max tier, your data helps. More data points across plans, sharper estimates.

Run the proxy. Watch the numbers.