MCP and the Token Tax

Tokens image

There is an old bit of wisdom, often attributed to Sherlock Holmes, that a man should be careful what furniture he keeps in the attic of his brain. The attic, you see, is not infinite. Cram it with trivia and there's no room left for the thing you actually need—the one deduction that solves the case.

A year ago, Anthropic introduced something called MCP—the Model Context Protocol. The pitch was sensible: a universal standard for connecting AI agents to tools. The USB port for AI, people called it. Plug in and go. The industry agreed. OpenAI adopted it. Microsoft adopted it. Thousands of servers sprouted up.

Then, recently, Anthropic published a rather honest paper explaining that MCP has a scaling problem.

The people who invented the thing are now showing us where it breaks. I find this admirable. It is also worth paying attention to.

The Token-Tax

Here is the issue. When you hand an AI agent a set of MCP tools, it doesn't just receive the tools. It receives descriptions of the tools. Schemas. Parameters. Documentation. The full inventory, loaded upfront—even tools it will never touch.

One example from Anthropic's paper: an agent consumed 150,000 tokens before it read a single word of the user's actual request.

This is the token tax. And it is steep.

Tokens are not merely a billing abstraction. They are the agent's thinking space. Its working memory. Fill that space with tool catalogues and you have not prepared the agent. You have burdened it.

The Swiss army knife

There is another approach, and it is almost embarrassingly simple.

Do not hand the agent a toolbox. Hand it a Swiss army knife.

MCP assumes the agent needs to be taught what tools exist and how to use them. But modern AI models increasingly don't need a catalogue. They know how to solve problems. What they need are specifics—endpoints, credentials, the address of the actual door. Not a pamphlet explaining what doors are.

Code execution lets the agent discover tools on demand, use only what it needs, and write code to handle the rest. That 150,000-token workflow? It drops to 2,000. Same result. The attic, suddenly, has room to think.

But there is something more important than efficiency.

When the agent writes code, it can see what it is doing. If something fails, it knows where. It can adapt. It can try a different approach. It can, in a word, learn.

MCP tools are black boxes—you send a request into the dark and hope. Code execution is a workbench. The difference is the difference between delegating and doing.

Knowing kung fu

There is a scene in The Matrix—you know the one—where Neo has kung fu uploaded directly into his mind. No study. No years of practice. Just the precise knowledge he needs, delivered at the precise moment he needs it.

This is what good AI skills feel like.

Claude Code has a system for this. A skill is a small file—a bit of metadata, then instructions. Maybe a hundred tokens. The agent reads it, understands the task, writes code, executes.

No catalogues. No ceremony. Just the knowledge required to throw the punch.

And unlike whatever is running behind an MCP server, the code is visible. You can version it. Test it. Audit it. When the agent gets better at something, that improvement lives in your repository—not behind someone else's curtain.

I know kung fu

The Bottom-line

MCP is not dead. It solved real problems and it is not going anywhere.

But here is what I have come to believe: the agents that perform best are not the ones with access to the most tools. They are the ones with the most room to think.

Context windows are not filing cabinets. They are not storage. They are attention—the bright, limited space where reasoning actually happens.

Sherlock Holmes knew that the mind works best when it is uncluttered. He was, of course, fictional. But the principle is not.

Keep the attic clean. The mystery will thank you.