Deep dive on prompt caching

I ran experiments against the Anthropic API this week to understand prompt caching - the system that makes Claude Code 80% cheaper.

Also this week: you can now control Claude Code from your phone, and they banned third-party tools from using subscription auth.

How prompt caching actually works in Claude Code

Every time you send a message, the entire conversation gets re-sent to the API. Your fiftieth message sends everything from the beginning plus 49 rounds of history. Without caching, a long Opus session costs $50-100 in input tokens. With it, $10-19.

I wanted to see the numbers myself. The full post is here: How Prompt Caching Actually Works in Claude Code

Here's what I found.

Two capital letters broke the entire cache. I changed "senior software engineer" to "Senior Software Engineer" in the system prompt. 2,727 tokens of stored computation - gone. The model reprocessed everything from scratch. Prefix matching is byte-exact. If any byte changes, everything after it gets invalidated.

Real Claude Code sessions hit 96% because they cache more aggressively - not just the system prompt, but tool definitions, CLAUDE.md, and conversation history.

Things you're probably doing that break it:

Putting timestamps or changing data at the top of your system prompt. A dev on HN found this was busting their cache on every request. Moving it to the bottom recovered 20+ percentage points.
Adding or removing MCP tools mid-session. Tool definitions are part of the cached prefix. One new tool invalidates the cache for your entire conversation history.
Switching models mid-session. Caches are per-model. Going from Opus to Haiku for a "quick question" forces Haiku to build its own cache from scratch.

The post covers the KV cache internals, how compaction interacts with caching, the Claude Code team's seven architectural rules, and all four experiments with raw API output you can reproduce. Read it here.

Come hang out

I'm running the next Claude Code Camp meetup next week (Friday, Mar 6). We will do a few demos, then open discussion. RSVP on Luma.

I also started a Discord if you want to keep the conversation going between newsletters.

Plan first, then let go

Boris Tane wrote "How I use Claude Code: Separation of planning and execution" and it blew up this week.

His argument is simple: Claude Code does its best work when you do the thinking and it does the typing.

Plan in a separate session. Tell Claude you're planning, not coding. Describe what you want. Push back. Iterate. Get a step-by-step plan in a markdown file.
Execute in a new session. Feed it the plan. Tell it to follow it and not deviate. Step away.

Two sessions, two modes. He says it eliminates the drift problem — where Claude starts well, then slowly goes off-track as the conversation grows.

The community variations are worth noting:

Pin the plan to CLAUDE.md. It survives compaction and every follow-up session starts with the same context.
Use /clear between phases instead of two terminals. Plan, /clear, paste the plan back.
Checkpoint after each step. Commit after every completed plan step. If step 4 goes wrong, revert to step 3 and re-run with more specific instructions.
Subagents for parallel steps. If the plan has independent pieces (backend + frontend + tests), dispatch them to separate subagents.

claude remote-control

New in v2.1.51: you can now control a local Claude Code session from your phone or any browser.

Run claude remote-control in your project directory. It gives you a URL and a QR code. Open that URL on your phone, tablet, or another computer and you get the full session with your local filesystem, MCP servers, and project config. Everything still runs on your machine.

You can also type /remote-control (or /rc) inside an existing session to make it available remotely mid-conversation. And there's a setting to enable it for all sessions by default — run /config and toggle "Enable Remote Control for all sessions."

It's a research preview on Pro and Max plans. Docs here.

The auth ban

Anthropic updated their legal docs to ban OAuth tokens — including Agent SDK tokens in third-party tools.

If you built a tool that authenticates through Claude's subscription, you need to switch to API keys with your own billing. This affects wrappers, orchestrators, and anything piggybacking on Max plan auth.

If you're an API user, nothing changes. If you're building tools that use Claude Code's auth flow, read the updated docs.

The memes

"How Claude Opus 4.6 pulls up to your code base to clean up the slop left behind by other dumber models"

— @beffjezos

That's the week.

Abhishek