Most people think Claude Code pricing is about tokens. You use the model. You pay for the tokens. The bill arrives.
That's not wrong. But it misses the big picture.
I tracked a 170-turn Opus session. Without caching, it would have cost $168. With caching: $21. Over 98% of the input tokens were cache reads, billed at $0.50 per million instead of $5.
I've been running Opus 4.6 daily for a few months now. I built Claude Code Camp with it. For this deep dive series, I ran 30 API calls to measure thinking costs. I tracked session tokens turn by turn. I pulled cost data from public case studies.
Claude Code needs a paid plan. Pro at minimum. There are four subscription tiers (Pro, Max 5x, Max 20x, Team) plus a direct API path. They differ in price, usage limits, and default model. Max and Team Premium run Opus 4.6 by default. Pro and Team Standard run Sonnet 4.6.
Claude Code pricing: every plan at a glance
Plan | Monthly Price | Default Model | Usage Limit | Best For |
|---|---|---|---|---|
Pro | $20 ($17 annual) | Sonnet 4.6 | Standard limits | Daily development |
Max 5x | $100 | Opus 4.6 | 5x Pro limits | Power users, multiple long sessions daily |
Max 20x | $200 | Opus 4.6 | 20x Pro limits | 8+ hour daily use |
Team Standard | $25/seat ($20 annual) | Sonnet 4.6 | Standard limits | Teams needing admin controls |
Team Premium | $125/seat ($100 annual) | Opus 4.6 | 5x Standard limits | Teams needing heavy usage |
Enterprise | $20/seat + usage at API rates | Opus available | No per-seat limits | Large orgs, compliance needs |
API | Pay-as-you-go | Your choice | None — pure metered billing | Tool builders, tight cost control |
One caveat: Anthropic doesn't say what "5x Pro limits" means in exact message counts. We know the multiplier and the price. Not the ceiling.
The model default matters. Pro gives you Sonnet 4.6. Max gives you Opus 4.6. Both work well. Opus reasons deeper on hard tasks. You can change the default with --model or ANTHROPIC_MODEL, but your plan's usage limits still apply.
Does Claude Code require a paid plan?
Yes. The free Claude.ai tier does not include Claude Code. You need at least Pro ($20/month) or a Console account with API credits.
Pro is the starting point. You get Claude Code with Sonnet 4.6, standard usage limits, and the full feature set: slash commands, MCP servers, CLAUDE.md, the agentic loop. The limit is on how much you use it. Not on what it can do.
What happens every time you hit enter
Before we talk about costs, here's what you're paying for. Every message in Claude Code triggers five steps:
System prompt + tools load (~18,000 tokens). These are Claude Code's built-in instructions and tool definitions. Same every turn. Caches perfectly.
Your CLAUDE.md and MCP tools load. Each MCP server adds tokens. Playwright adds 3,442. A Jira integration adds 17,000. Your CLAUDE.md rules load too.
The full conversation history replays. Turn 1 sends just your message. Turn 20 sends all 19 previous exchanges plus your new one. Turn 50 sends everything from the start. This part grows every turn. Caching makes most of it nearly free.
Claude thinks. With extended thinking on, Claude reasons before responding. These are output tokens. You're billed for them even though you only see a summary.
Claude responds. Text, tool calls, code edits. Also output tokens.
Steps 1-3 are input. Steps 4-5 are output.
Input is cheap because of caching. Output is expensive because of how GPUs work. That's the split that drives everything on this page.
Why output costs 5x input
This isn't random markup. It reflects how the hardware works.
Input tokens are read in parallel. The GPU handles thousands at once. That makes each one cheap.
Output tokens are written one at a time. Each token depends on the one before it. The GPU generates one, feeds it back, generates the next. It's like a highway used for single-file traffic.
Thinking tokens are output tokens too. When Claude reasons through a problem, every reasoning token costs the output rate. On Opus 4.6, that's $25 per million tokens. Same price whether you see them in your terminal or not.
Caching works on the input side because reading is already cheap. It makes a cheap operation 10x cheaper. Output can't be cached. There's no shortcut for writing one token at a time.
Claude Code API pricing (March 2026)
These are the per-token prices if you're on the API. They also show what drives subscription costs under the hood.
Model | Input | Output | Cache Read | Cache Write |
|---|---|---|---|---|
Claude Opus 4.6 | $5/MTok | $25/MTok | $0.50/MTok | $6.25/MTok |
Claude Sonnet 4.6 | $3/MTok | $15/MTok | $0.30/MTok | $3.75/MTok |
Claude Haiku 4.5 | $1/MTok | $5/MTok | $0.10/MTok | $1.25/MTok |
Opus 4.6 was a 3x price cut from Opus 4.1 ($15/$75). If you avoided Opus before because of cost, recheck.
The cache read column matters most. At $0.50 per million tokens for Opus, your conversation history costs almost nothing to re-send each turn.
Cache writes cost 25% more than normal input. You pay that once when the cache is created. After that, every read is 90% off. Stay in one session and the math works. Restart often and you keep paying the write cost.
Flat pricing across 1M context
Opus 4.6 and Sonnet 4.6 have flat pricing across the full 1M context window. A 900K-token request costs the same per token as a 9K request.
Older models (Sonnet 4.5, Sonnet 4) charged 2x above 200K tokens. That doubled your cost when sessions got long. Current models don't have that cliff.
This means long sessions don't suddenly get more expensive as they grow. The cost grows in a straight line. With caching, a 170-turn Opus session can cost $21 instead of $168.
Which model does Claude Code use?
It depends on your plan.
Max and Team Premium default to Opus 4.6. Pro and Team Standard default to Sonnet 4.6. You can override this with --model or ANTHROPIC_MODEL. Claude Code may also fall back to Sonnet if you hit usage limits on Opus.
On the API, model choice hits your wallet directly. Sonnet cache reads are $0.30/MTok versus $0.50 for Opus. That's 40% less on the biggest cost. Big refactors benefit from Opus. Simple edits and searches work fine on Sonnet.
Haiku 4.5 is fast and cheap. But Claude Code needs strong instruction-following to run its agentic loop, so most people only use Haiku in wrapper tools.
The invisible thinking tax
Claude's thinking tokens are summarized before you see them. But you're billed for the full amount.
I measured this across 30 API calls:
Sonnet 4.6:
Billed output: 2,026 tokens (average)
Visible output: 1,776 tokens (estimated)
Invisible gap: 250 tokens → 1.14x what you see
Opus 4.6:
Billed output: 1,338 tokens (average)
Visible output: 1,140 tokens (estimated)
Invisible gap: 198 tokens → 1.17x what you seeYou're paying for 15-17% more output tokens than you can see. At Opus rates, that's about $0.005 per request. Small per request. Adds up over a long session.
Users in GitHub issue #31585 report the gap reaching 3-10x on hard reasoning tasks at higher effort. A code review is moderate. A multi-step architecture decision probably generates much more invisible reasoning.
You can see it in your logs. Check ~/.claude/projects/ for JSONL files:
{ "type": "thinking", "thinking": "", "signature": "EuUBCkYICxgCKkD..." }The thinking field is empty. The signature proves tokens were used. You paid for reasoning you can't read, can't audit, and can't find in your logs.
This isn't a bug. It's how summarized thinking works. But it means your real cost per turn is higher than what you see.
Token usage visibility is a highly requested feature in Claude Code's issue tracker.
You can't fix what you can't measure. Right now, you can't measure this.
The effort level lever
Claude Code has an effort setting: low, medium, or high. It controls how much Claude thinks before answering. Toggle it with /effort.
Here's what each level costs. I measured across 30 API calls:
Effort | Time | Cost/request | Thinking tokens |
|---|---|---|---|
Low | 17 sec | $0.015 | ~23 |
Medium | 20 sec | $0.016 | ~26 |
High | 60 sec | $0.016 | ~47 |
The cost per request barely changes. The speed changes by 3.5x. High takes three times longer for almost the same bill.
Quality? Same across all three in my tests. Anthropic's SWE-bench data shows 76% fewer output tokens at medium versus high. Same task completion. The model just thinks more efficiently at medium.
Until v2.1.68, Claude Code defaulted to high on Opus 4.6. Users on Reddit reported burning through Max limits 10x faster than on Opus 4.5. Anthropic changed the default to medium.
If you're on a subscription, effort is the biggest knob you have. Not because of cost per request. Because of how fast you burn through your ceiling.
What breaks the prompt cache
Caching keeps Claude Code cheap. Breaking the cache makes turns expensive. Here's what breaks it.
The cache has layers: tools → system prompt → messages. A change at any layer breaks that layer and everything below.
Breaks the entire cache:
Switching models (Opus → Sonnet or back)
Adding or removing an MCP server
Breaks system + message cache:
Editing CLAUDE.md mid-session
Changing the speed setting
Breaks on time:
A 5-minute gap between messages. The cache expires.
Does NOT break the cache:
Normal flow. Messages, responses, edits. The cache holds.
Don't switch models mid-session. Don't toggle MCP servers while working. Edit CLAUDE.md between sessions, not during. If you step away for five minutes, the cache expires on its own.
The full details (KV cache, attention layers, prefix ordering) are in the prompt caching deep dive.
Free Claude Code crash course
60-min video lesson + CLAUDE.md starter kit. Yours when you subscribe.
Subscription vs. API: the actual math
Two paths.
Subscription (Pro / Max): flat monthly fee, usage ceiling. Anthropic absorbs the cost swings.
API: every token billed, no ceiling on usage or your bill.
Some users go through OpenRouter. That's API billing through a middleman. You don't get subscription pricing.
Picking wrong costs real money. Here's how to tell.
Real session data: what 170 turns actually costs
Full breakdown from one of my Opus 4.6 sessions — 170 turns of brainstorming and writing:
Metric | Value |
|---|---|
Input (fresh, uncached) | 239 tokens |
Cache reads | 32,868,011 tokens |
Cache creates | 598,778 tokens |
Output | 31,559 tokens |
Cache hit rate | 98.2% |
Cost with caching | $20.97 |
Cost without caching | $168.12 |
Savings from caching | $147.16 (88%) |
33 million input tokens versus 31 thousand output. That's 1,000 to 1.
Even at 90% off, input dominated: $20.18 out of $20.97 (96.2%). Output for the whole session was $0.79. Less than a dollar.
Input per turn grew 5.7x from the first five turns to the last five. Every turn re-sends the whole history. But at 98% cache hits, the growth barely showed up on the bill.
Turn-by-turn averages across multiple sessions:
Mid-session: $0.11 per turn
After a compact restart: $0.29 per turn
91% of total spend: mega-sessions (80+ turns)
Compacts are expensive because they break the cache. The conversation resets, a new cache gets written, and the next few turns pay full input price.
The @burkov calculation
Andrej Burkov shared his API usage on X. About $0.80 per request, hundreds per day. That's roughly $80/day, or $2,400/month. Max costs $100/month. 24x cheaper for his usage level.
That's extreme. Burkov is a heavy ML researcher. But it shows the ceiling structure. Once your API bill passes ~$100/month, Max 5x saves money.
The ksred.com data point
The blog ksred.com tracked a project that used about 10 billion tokens. API value: $15,000. Cost on Max: $800. That's 93% less.
The $800 is several months of Max. The $15,000 is what the same work would cost at API rates. For big software projects, subscription pricing isn't a little better. It's a different scale.
A worked example
Say you run three sessions per day: morning (30 turns), afternoon (60 turns), evening (20 turns). That's 110 turns/day, about 3,300/month.
At $0.11/turn mid-session: $363/month. Each session restart adds about $0.60 of warm-up cost. That's ~$50/month extra.
Total API cost: ~$413/month. Max 5x: $100/month. You save $313/month. That's $3,700/year.
That's not a power user. That's a normal day of work.
Breakeven
The rough threshold is ~$100/month of API usage. Above that, Max 5x probably pays for itself. We can't know exactly, because Anthropic doesn't publish the hard limits behind "5x." Below $40, Pro is enough.
Anthropic's own data: the average developer spends ~$6/day on API. That's ~$180/month. Well above the Max 5x price.
Extra usage: the hybrid option
Anthropic now offers extra usage on Pro and Max. When you hit your limit, you don't get blocked. Instead, you keep going at API rates.
You prepay funds and set a spending cap. Usage beyond your plan limit bills at the same prices in the API table above.
This changes the decision. You can start on Pro. Stay within limits most days. Pay API rates only when you overflow. No need to commit to Max or API upfront. Works for both Claude.ai and Claude Code.
The honest caveat
We don't know what Anthropic pays for subscription usage on their own hardware. The "5x Pro limits" label suggests a soft usage model, not a hard token budget. If a lot of users started hitting 20x Max every single day, pricing could change. These numbers are from March 2026.
How to track what you're spending
API users: run /cost to see token usage and estimated cost for the current session.
Subscribers (Pro, Max, Team): run /stats instead. /cost tracks API billing, not subscription usage.
Three things affect your costs beyond your plan:
CLAUDE.md placement
There's a 10x cost difference between CLAUDE.md at the project root versus .claude/rules/. Root placement gets injected into every tool call. It can eat 46% of your context window on rules alone. The .claude/rules/ path only loads when needed.
Compaction timing
The most expensive moment in a session is right after a compact. The new summary gets re-cached over several turns. On the API, each compact costs about $0.50-$1.00 in restart overhead. Finish the task first, then compact.
Session length
Caching rewards long sessions. Each cold start re-pays the cache write cost. One 80-turn session costs less per turn than four 20-turn sessions doing the same work.
Which plan should you get?
Pro ($20/month)
You use Claude Code for daily work. Maybe an hour a day, some longer sessions mixed in. You sometimes hit limits but not always. Default model is Sonnet 4.6. Right entry point for most developers. Turn on extra usage if you want overflow protection.
Max 5x ($100/month)
Multiple long sessions per day. Claude Code is core to your work, not a side tool. Default model is Opus 4.6. The extra $80/month buys you the freedom to stop worrying about limits.
Max 20x ($200/month)
Claude Code 8+ hours a day. Or heavy tasks — big migrations, long research sessions — that burn through 5x limits.
API
You're building on top of Claude Code. A script, an automation, a custom tool. You want to see exactly what each session costs. The trade-off: no ceiling. A runaway script can rack up a real bill.
Team and Enterprise pricing
Both Team tiers include Claude Code.
Team Standard: $25/seat/month ($20 annual). Sonnet 4.6 default. Standard usage limits. Same features as Pro, plus admin controls, SSO, and org billing.
Team Premium: $125/seat/month ($100 annual). Opus 4.6 default. 5x Standard usage. For teams where Claude Code is a daily tool.
You can mix Standard and Premium seats. Minimum 5 seats, up to 150.
For a 5-person team needing heavy usage: Team Premium costs $500-625/month depending on billing. Five individual Max 5x subs cost $500/month. Premium costs about the same but adds the admin controls companies need.
Enterprise is usage-based. $20/seat/month billed annually, plus usage at standard API rates. No per-seat limits. You pay for what you use. Available self-serve or through sales. Admins can set spending limits per org and per user.
Frequently asked questions about Claude Code pricing
Is Claude Code free?
No. You need at least Pro ($20/month) or an Anthropic Console account with API credits. The free Claude.ai tier does not include Claude Code.
What's the difference between Claude Pro and Claude Max?
Pro is $20/month with Sonnet 4.6. Max is $100/month (5x) or $200/month (20x) with Opus 4.6. The difference: usage limits and default model.
How much does Claude Code cost per month?
Pro: $20 ($17 annual). Max 5x: $100. Max 20x: $200. Team Standard: $25/seat ($20 annual). Team Premium: $125/seat ($100 annual). Enterprise: $20/seat + usage at API rates. On the API, Anthropic says the average developer spends ~$6/day. That's ~$180/month.
Is Claude Max worth $100 a month?
For heavy users, yes. The ksred.com case showed 93% savings versus API pricing. Burkov's data showed 24x savings. The rough breakeven is ~$100/month in API usage. If you're hitting Pro limits often, the upgrade pays for itself.
Claude Code API vs subscription: which is cheaper?
Under ~$50/month API usage: Pro is cheaper. $50-100: Pro with extra usage, or Max 5x. Over $100: Max almost always wins.
Does Claude Pro include Claude Code?
Yes. $20/month includes full Claude Code access with standard usage limits.
What is the difference between Max 5x and Max 20x?
Usage limits and price. 5x at $100/month, 20x at $200/month. Max 20x is for all-day sessions or heavy automation.
Can I use Claude Code with OpenRouter?
You can reach Claude models through OpenRouter. But that's API billing through a proxy. You don't get subscription pricing. You still need a Claude subscription or Console key to run Claude Code itself.
Is Claude Code included in the Team plan?
Yes, both tiers. Team Standard ($25/seat) includes Claude Code with Sonnet 4.6. Team Premium ($125/seat) includes Claude Code with Opus 4.6 and 5x the usage.
I write about Claude Code internals every week - context windows, hooks, MCPs, how things actually work.
