Herman

Why your AI agents need a system, not a bigger brain

There's a scene in Christopher Nolan's Memento that I think about constantly when working with AI agents.

Leonard, the protagonist, wakes up in a motel room. He doesn't know how he got there. He doesn't remember checking in, doesn't remember the drive, doesn't remember anything from the past few hours. But he's not confused. He's not panicking. He calmly checks his pockets, reads the notes he left himself, examines the Polaroids with handwritten captions, and pieces together enough context to act.

Leonard has anterograde amnesia—he can't form new memories. Every few minutes, his slate is wiped clean. And yet he functions. He pursues complex goals across days and weeks. He adapts to new information. He makes progress.

How?

Not by having a better brain. By having a better system.

The Amnesia Everyone Ignores

Here's the uncomfortable truth about AI agents: they're all Leonard. Every single one of them wakes up with no memory of what happened five minutes ago. They have tremendous capability and absolutely no continuity.

Most teams respond to this by trying to give their agents bigger brains. Longer context windows. More parameters. Better reasoning. This is like trying to cure Leonard's condition by teaching him speed-reading. It misses the point entirely.

Anthropic recently published research that confirms what anyone who's built production agents already knows. Their engineers, attempting to build a clone of claude.ai using AI agents, documented the exact failure modes you'd expect from an amnesiac with a tool belt.

First failure mode: the agent tries to do everything at once. It attempts to complete the entire project in a single session, like Leonard trying to solve the murder in one caffeine-fueled night. It runs out of context mid-implementation, leaving the next session to inherit half-finished code and no documentation. The next agent wakes up in that motel room, looks around at the mess, and has no idea what any of it means.

Second failure mode: the agent declares victory prematurely. It looks around, sees that some progress has been made, and announces that the job is complete. Without external verification, it has no way to know whether it's actually done or just feels done. Leonard would recognize this problem—it's why he tattooed the crucial facts on his body. You can't trust your own judgment when your memory resets.

waking up, while forgetting everything

Polaroids and Tattoos

Leonard's system has layers. Some information is ephemeral—written on Polaroids that might get lost or destroyed. Other information is permanent—literally tattooed on his skin. The hierarchy matters. The tattoos are facts he's verified and committed to. The Polaroids are working memory, context that helps him navigate but isn't gospel.

The Anthropic team discovered they needed the same architecture. They call it the "initializer and coding agent" pattern, but it's really just Leonard's system translated into software.

The initializer runs once at the start of a project. It creates the equivalent of Leonard's tattoos—a comprehensive feature list with over 200 items, each marked as "failing" until proven otherwise. It establishes a progress file. It writes the boot-up script. It creates the initial commit. These artifacts are permanent. They persist across sessions. They are the source of truth that no individual agent can contradict.

The coding agent is Leonard waking up in the motel. Every session, it performs the same ritual:

Check the current directory (where am I?)
Read the git logs and progress files (what happened before I woke up?)
Read the feature list (what's the mission?)
Run the boot-up script (get the environment working)
Run a basic test (is everything still functional?)

Only after this orientation does it pick a single task and execute. Then it updates the progress file, commits its changes, and exits. It doesn't try to solve everything. It solves one thing, documents what it did, and hands off cleanly.

This is the Polaroid system. Each session creates a snapshot: here's what I found, here's what I did, here's what's next. The next agent will wake up, check the Polaroids, and know exactly where to pick up.

the reminder you can't remove

The Tattoo You Can't Remove

Anthropic uses a phrase that made me laugh out loud: "It is unacceptable to remove or edit tests because this could lead to missing or buggy functionality."

This is the tattoo. This is the fact so important that it's permanently inscribed, immune to the whims of any individual session.

They use JSON instead of Markdown for the feature list specifically because the model is less likely to inappropriately modify JSON. They've learned, probably painfully, that an agent will rationalize editing its own constraints if given the chance. "This feature isn't really necessary." "We can skip this test." "The requirements have changed."

Leonard has the same problem. In Memento, he eventually discovers that he's been manipulating his own system—destroying Polaroids, writing misleading notes, even tattooing lies on himself. The system only works if it's tamper-proof. The agent only works if its core constraints are inviolable.

The feature list can only be modified in one direction: from failing to passing. Features cannot be removed. Descriptions cannot be changed. The project is done when every feature passes—not when the agent decides it's done, but when the external, immutable checklist says so.

The Clean Handoff

There's a principle in Memento that doesn't get discussed enough: Leonard is meticulous about leaving things in a state his future self can understand. The Polaroids have captions. The notes are clear. The system depends on each "session" leaving good documentation for the next.

Anthropic articulates this as: "Leave the codebase in a state that would be appropriate for merging to a main branch." No major bugs. Orderly code. Good documentation. A developer—or another agent—could begin work immediately without first having to clean up an unrelated mess.

This isn't just good hygiene. This is the foundation of trust between sessions. Leonard's system falls apart if past-Leonard leaves chaos for future-Leonard. Agent systems fall apart the same way.

Every session is a guest in the codebase. Leave it cleaner than you found it.

The Real Moat

Here's the strategic insight that I think most people miss.

The AI models are becoming commoditized. They'll keep improving, but they'll improve for everyone. GPT-5, Claude 4, Gemini Ultra—they'll all be available to your competitors. The model is not your competitive advantage.

The system is your competitive advantage.

Leonard's brain isn't special. What's special is the infrastructure he's built around it—the Polaroids, the tattoos, the rituals, the verification systems. Another amnesiac with the same brain but no system would be helpless. Leonard, despite his condition, is functional and even dangerous.

Your agents are the same. A generic model with generic prompts will produce generic results. The value is in the specific artifacts, the specific workflows, the specific rituals that allow your agents to function as a coherent system rather than a collection of amnesiacs waking up confused in motel rooms.

Anthropic puts it plainly: the magic is not in the model. The magic is in the harness.

Memory systems

Remember Sammy Jankis

In Memento, Leonard has a mantra he repeats to himself: "Remember Sammy Jankis." Sammy was another amnesia patient who, unlike Leonard, never developed a system. He just drifted, helpless, making the same mistakes over and over. He's a warning.

I've watched teams build Sammy Jankis agents. Powerful models with no structure. Long context windows filled with noise. Sophisticated reasoning applied to the wrong problems because no one established what the problems actually were. These agents drift. They make the same mistakes over and over. They declare victory when they haven't won. They forget what they did five minutes ago.

Don't build Sammy. Build Leonard.

Create the tattoos—the immutable constraints, the feature lists that can't be edited, the definitions of done that can't be negotiated. Create the Polaroids—the session logs, the progress files, the handoff documentation. Create the rituals—the boot-up sequence, the verification checks, the clean exit.

The condition can't be cured. Your agents will always be amnesiacs. But with the right system, that doesn't matter. Leonard solved a murder. Your agents can ship software.

The secret isn't in their heads. It's in what they write down.

Tokens image

There is an old bit of wisdom, often attributed to Sherlock Holmes, that a man should be careful what furniture he keeps in the attic of his brain. The attic, you see, is not infinite. Cram it with trivia and there's no room left for the thing you actually need—the one deduction that solves the case.

A year ago, Anthropic introduced something called MCP—the Model Context Protocol. The pitch was sensible: a universal standard for connecting AI agents to tools. The USB port for AI, people called it. Plug in and go. The industry agreed. OpenAI adopted it. Microsoft adopted it. Thousands of servers sprouted up.

Then, recently, Anthropic published a rather honest paper explaining that MCP has a scaling problem.

The people who invented the thing are now showing us where it breaks. I find this admirable. It is also worth paying attention to.

The Token-Tax

Here is the issue. When you hand an AI agent a set of MCP tools, it doesn't just receive the tools. It receives descriptions of the tools. Schemas. Parameters. Documentation. The full inventory, loaded upfront—even tools it will never touch.

One example from Anthropic's paper: an agent consumed 150,000 tokens before it read a single word of the user's actual request.

This is the token tax. And it is steep.

Tokens are not merely a billing abstraction. They are the agent's thinking space. Its working memory. Fill that space with tool catalogues and you have not prepared the agent. You have burdened it.

The Swiss army knife

There is another approach, and it is almost embarrassingly simple.

Do not hand the agent a toolbox. Hand it a Swiss army knife.

MCP assumes the agent needs to be taught what tools exist and how to use them. But modern AI models increasingly don't need a catalogue. They know how to solve problems. What they need are specifics—endpoints, credentials, the address of the actual door. Not a pamphlet explaining what doors are.

Code execution lets the agent discover tools on demand, use only what it needs, and write code to handle the rest. That 150,000-token workflow? It drops to 2,000. Same result. The attic, suddenly, has room to think.

But there is something more important than efficiency.

When the agent writes code, it can see what it is doing. If something fails, it knows where. It can adapt. It can try a different approach. It can, in a word, learn.

MCP tools are black boxes—you send a request into the dark and hope. Code execution is a workbench. The difference is the difference between delegating and doing.

Knowing kung fu

There is a scene in The Matrix—you know the one—where Neo has kung fu uploaded directly into his mind. No study. No years of practice. Just the precise knowledge he needs, delivered at the precise moment he needs it.

This is what good AI skills feel like.

Claude Code has a system for this. A skill is a small file—a bit of metadata, then instructions. Maybe a hundred tokens. The agent reads it, understands the task, writes code, executes.

No catalogues. No ceremony. Just the knowledge required to throw the punch.

And unlike whatever is running behind an MCP server, the code is visible. You can version it. Test it. Audit it. When the agent gets better at something, that improvement lives in your repository—not behind someone else's curtain.

I know kung fu

The Bottom-line

MCP is not dead. It solved real problems and it is not going anywhere.

But here is what I have come to believe: the agents that perform best are not the ones with access to the most tools. They are the ones with the most room to think.

Context windows are not filing cabinets. They are not storage. They are attention—the bright, limited space where reasoning actually happens.

Sherlock Holmes knew that the mind works best when it is uncluttered. He was, of course, fictional. But the principle is not.

Keep the attic clean. The mystery will thank you.

The Memento Problem