If you use Claude Code, you've seen it in the bottom right corner of your terminal: medium · /effort. You've probably toggled it, or typed "ultrathink" because someone on Reddit said to. And at some point, you've watched your Max plan limits evaporate and wondered what happened.
I've been running Opus 4.6 daily for the past few months, building Claude Code Camp and digging into the internals for this series. For this post, I ran a MITM proxy between Claude Code and Anthropic's servers and captured the raw API traffic. Then ran controlled experiments on both Sonnet 4.6 and Opus 4.6.
The short version: the default is right, and most people should stop changing it. Here's the data.
What thinking is
Every time Claude responds, it can write to a scratchpad before producing output — working through the problem, checking approaches, catching errors. Same as a developer who talks through the problem on a whiteboard before opening their editor. The whiteboard takes time and costs money. But for hard problems, it produces better code.
Claude Code uses adaptive thinking by default. The model decides whether to think and how much. You steer it with the effort parameter — low, medium, or high.
The catch: starting with Claude 4 models, you only see a condensed summary of the thinking. Claude might reason for 2,000 tokens internally. You'll see a 400-token digest. usage.output_tokens includes the full reasoning i.e. the tokens you're billed for but you only see the short version.
The experiment: same code, three effort levels
I ran the same task: add error handling and SQL injection protection to a vulnerable function at all three effort levels, three times each, on Sonnet 4.6:
Effort Avg Output Tokens Avg Thinking (est) Avg Time Avg Cost
low 1,012 23 tokens 17.0s $0.015
medium 1,049 26 tokens 19.9s $0.016
high 1,051 47 tokens 60.0s $0.016Same output. Same code quality. The model knew the answer and it didn't need to think harder.
Cost difference: negligible. $0.015 vs $0.016.
Time difference: massive. high effort averaged 60 seconds — 3.5x slower than low at 17 seconds. One high run spiraled to 140 seconds. On a Max plan with a 5-hour rolling window, that's 200 turns per hour vs 60.
Effort is primarily a time and rate-limit cost, not a token cost. For routine coding tasks, medium produces the same quality as high in a fraction of the time.
Anthropic changed the default from high to medium in v2.1.68. Users on Reddit say medium actually outperforms high:
"Only people that don't know what they're doing put Claude on max thinking and leave it there. I even prefer medium over high with Opus 4.6. 4.6 definitely overthinks sometimes."
Does adaptive thinking actually adapt? I tested with three prompts of increasing complexity at medium:
simple (rename a variable): 255 tokens, 4 thinking tokens, 5s
medium (add error handling): 1,291 tokens, 26 thinking tokens, 21s
complex (repository pattern): 3,039 tokens, 43 thinking tokens, 39s4 tokens of thinking for a rename. 43 for an architectural refactor. The model self-allocates. You don't need to micromanage the budget.
One caveat: I tested one task type (code edits). For genuinely hard reasoning - novel algorithms, multi-file architecture decisions - high effort probably earns its keep. My data says medium is the right default. Not that high is never worth it.
What I found on the wire
I ran a MITM proxy between Claude Code and Anthropic's API and captured full request/response payloads across a real session.
Opus redacts thinking completely; Sonnet doesn't
When you send a prompt, Claude Code includes a set of beta flags in the request headers. Opus sends one that Sonnet doesn't: redact-thinking-2026-02-12.
The effect at the wire level:
Sonnet response content_blocks: ["thinking", "text"] ← thinking visible
Opus response content_blocks: ["text", "tool_use"] ← thinking goneSonnet returns a thinking block you can read. Opus strips it entirely. The thinking happened and you're billed for it in output_tokens. But the content never reaches your client.
Anthropic has documented that thinking is summarized on Claude 4+ models. What the sniffer reveals is the mechanism: a per-model beta flag. On Opus, the content doesn't get summarized. It gets deleted.
You can see the evidence in your own session files. Check ~/.claude/projects/:
{"type": "thinking", "thinking": "", "signature": "EuUBCkYICxgCKkD..."}Empty thinking field. Valid signature. Proof the model thought, proof you were billed, zero content. (GitHub #31143)
I ran the same code review prompt on both models to measure the gap:
Sonnet 4.6: avg 2,026 billed tokens, ~1,776 visible (1.14x)
Opus 4.6: avg 1,338 billed tokens, ~1,140 visible (1.17x)~15-17% gap on a moderate task. The API returns no separate thinking_tokens field — it's all lumped into output_tokens. Token usage visibility is the #1 most-requested feature in Claude Code issues. GitHub #31585 reports the gap can reach 3-10x on complex reasoning.
Thinking and effort are separate controls
Claude Code sends two independent parameters:
{
"thinking": {"type": "adaptive"},
"output_config": {"effort": "medium"}
}thinking controls whether the model can think. output_config.effort controls how hard. Two separate fields, two independent levers.
What to do with this
Reserve high effort for architecture and complex debugging. Use medium for everything else — it's the default now for a reason. Use low for simple edits where thinking adds nothing.
Set it per-project in .claude/settings.json inside each repo so complex codebases get high and greenfield projects get medium automatically. Type "ultrathink" when you hit a task that needs deeper reasoning for one turn.
Things that will bite you: effort persists across sessions. If you forget to switch back and every session runs high. max was removed from Claude Code in v2.1.72 (API-only now). And all subagents inherit the session default as there is no way to set the main agent to high and subagents to low (GitHub #25669).
Medium effort produces the same code as high, 3.5x faster. The default is right. Stop changing it.
