What Claude Code's architecture teaches us about building production AI agents

A developer at Anthropic made a packaging reference error. Researchers found the source code pack reference, recovered the code, and released it online. People cloned the repository. Anthropic issued a DMCA takedown across all of GitHub, and every cloned repo vanished. The downloaded zip archives did not. Welcome to the world of human mistakes at AI speeds.

The leak itself will be old news soon enough. What won't age out is what the architecture reveals. Claude Code is a production agentic system handling millions of active sessions, and its internals offer the clearest public example we've had of how a well-resourced team solves the engineering problems that every AI agent builder is wrestling with right now.

I've spent the past several days doing a detailed analysis of the recovered codebase, cross-referencing against independent teardowns from Blake Crosley, Verdent AI, Alex Kim, Apidog, and Trevor Fox. What follows is a synthesis of the generalizable engineering patterns, organized by architectural concern. These are infrastructure patterns, not prompt tricks.

Tomorrow I'll publish a companion piece with my own observations and opinions on where these patterns succeed, where they fall short, and what's still missing from the way we build agents. Today is just the analysis.

State management and memory

Production agents need more than a chat transcript. Claude Code's architecture separates state into distinct layers, each with its own lifecycle and purpose.

Sessions are persisted as JSONL files containing messages, token usage, and permission decisions. This isn't just conversation history. It's recoverable state. The engine can fully reconstruct a functional session after a crash, picking up where it left off with the complete decision context intact. That distinction matters enormously for long-running agentic tasks where a dropped connection shouldn't mean lost work.

Workflow state is modeled independently from chat history. Long-running tasks carry explicit states like "planned," "executing," and "awaiting approval" that exist as their own data structures. This prevents a subtle but serious failure mode. If the system crashes mid-tool-execution, the explicit workflow state prevents destructive double-writes that would occur if the agent relied on chat history alone to determine what had already happened.

Memory follows a three-tier architecture. Working context is lightweight and always loaded, typically via a MEMORY.md file that acts as the agent's active scratchpad. Project notes are structured files loaded on demand when relevant context is needed. Session history captures past conversations and is selectively searched rather than wholesale injected. The tiering prevents the classic failure mode of stuffing everything the agent has ever seen into the context window and degrading both performance and accuracy.

Strategic forgetting is treated as a first-class architectural concern. Five distinct compaction strategies handle context growth. Micro-compacting replaces old tool results with compact summaries. Context collapsing condenses broader sections into shorter representations. Session memory extraction offloads important details to persistent files. Full history compaction rewrites the entire conversation into a compressed form. And as a last resort, the oldest messages are simply dropped. The ordering is deliberate. The system preserves the most valuable information and degrades gracefully rather than hitting a wall.

Token budgeting ties the memory system to economics. The system projects token costs on every turn and halts cleanly with a structured reason before any API call if the budget is exceeded. No runaway loops. No surprise bills. If the budget can't accommodate the next turn, the system stops and explains why rather than silently failing or burning through resources.

Tool registry design

The tool layer defines, constrains, and executes the agent's capabilities. Claude Code's approach here is among the most instructive patterns in the codebase.

Tools are defined as metadata first. Each tool is a data structure declaring its name, source hint, and responsibility description before any implementation code runs. The agent can filter and introspect its available tools without triggering side effects. This means the system can reason about what it's capable of, assemble appropriate toolsets for specific tasks, and present accurate capability descriptions to the model, all without executing anything.

Rather than loading every tool for every session, the system performs dynamic pool assembly. A session-specific subset of tools is assembled based on mode flags, deny lists, and the current permission context. This serves two purposes. It reduces the prompt footprint, keeping the tool descriptions that the model sees focused and relevant. And it limits the attack surface by ensuring tools that aren't needed for a given session simply aren't available.

Execution is partitioned by mutation risk. Read-only tools like file reads and searches run concurrently, taking advantage of parallelism for speed. Mutating tools like file writes and shell commands are serialized, preventing race conditions and conflicting state changes. This distinction is explicit in the tool metadata itself, not left to the model's judgment. The system knows which tools are safe to run in parallel because the tools declare it.

Permission gates live at the tool layer rather than at the prompt layer. Each tool carries explicit permission requirements that are checked before execution. The model decides what to attempt. The tool system decides what is permitted. This is a critical architectural separation. Security enforcement doesn't depend on the LLM following instructions, which is the right place to draw that line in any system where the model has access to consequential actions.

Permission and security architecture

Claude Code implements trust as a layered system rather than a binary gate.

Trust is tiered across three levels. Built-in tools receive the highest trust and the lightest permission requirements. Plugins sit at a medium trust level with additional checks. User-defined skills receive the lowest trust, with the most restrictive loading behaviors and permission requirements. Each tier's constraints are enforced structurally, not just through prompting.

The shell execution security layer is extensive. Twenty-three numbered security checks span Zsh-specific attack vectors, Unicode injection, heredoc injection, and compound attacks across over 2,500 lines of dedicated security code. The specificity and depth of these checks suggest real exploitation history. They aren't theoretical threat modeling. They read like responses to actual attack patterns encountered in production.

A predictive auto-permissions system reduces the friction of constant permission prompts. A separate LLM classifier, running Sonnet 4.6, evaluates each tool call against the user's stated intent and automatically approves safe operations while pausing for human approval on potentially dangerous ones. The intent is to reduce alert fatigue while maintaining meaningful security boundaries.

Permission decisions are first-class state objects with a queryable audit trail. The system doesn't just grant or deny. It records what was requested, what was decided, and why, making permission behavior auditable after the fact. Separate permission handlers exist for different operational contexts: human-in-the-loop for interactive sessions, orchestrator agents for coordinated multi-agent work, and autonomous workers for background tasks. Each handler applies permission logic appropriate to its trust context.

Multi-agent orchestration

Claude Code uses a lead-agent-plus-subagents pattern for complex work, and the implementation reveals several patterns worth studying.

Context isolation is enforced at the architectural level. Each subagent runs in its own context window with a restricted tool set defined at spawn time. When a subagent completes its work, it returns only its output. Its working context, intermediate reasoning, and any exploratory dead ends stay contained. This keeps the main agent's context clean and focused while allowing subagents to take exploratory or risky approaches without contaminating the primary workspace.

The most distinctive pattern here is what could be called "prompts as architecture." The multi-agent coordinator is not implemented as code-level orchestration logic. It's implemented entirely as system prompt instructions. The orchestrator model receives a prompt that describes how to delegate tasks, aggregate results, and synthesize findings. Worker agents are simply Claude instances with different system prompts and different tool sets. They aren't separate processes or specialized code paths.

Quality gates are encoded directly in the coordinator prompt. Directives like "Do not rubber-stamp weak work" and "You must understand findings before directing follow-up work" require the orchestrator to actually engage with subagent results before deciding next steps. The orchestrator must synthesize, not relay.

Workers are constrained by type. An "explore" agent can read files and search but cannot edit. A "plan" agent can outline approaches but cannot execute code. A "verify" agent can run tests but not modify source. These constraints are enforced through tool set restriction at spawn time, not through instructions to the model. The agent literally cannot access tools outside its type because they aren't in its tool set.

Prompt cache sharing makes this parallelism economical. Multiple subagents can share the same prompt cache, which means the cost of spinning up additional workers scales efficiently rather than linearly with each new context window.

Git worktree isolation provides a final layer of separation. Worker agents get their own isolated Git worktrees, giving each one a branch that can't conflict with the other workers or the main workspace. Changes are merged back only when the orchestrator approves them.

Observability and reliability

The observability patterns in Claude Code reflect hard-won lessons about operating agentic systems at scale.

Structured streaming events allow the model to communicate its internal state during execution. Typed events for tool matches, message starts, and crash reasons are emitted during streaming, before the model acts. This gives the consuming application real-time visibility into what the agent is considering, not just what it produces.

System event logging is deliberately separated from conversation transcripts. A structured history log captures routing decisions, context loading events, and permission outcomes independently. This makes it possible to reconstruct exactly what happened during an agentic run without parsing conversation text for clues, which anyone who's tried to debug a multi-step agent failure will appreciate.

Circuit breakers protect against runaway costs. The most illustrative example from the codebase is an autocompact failure that was burning 250,000 wasted API calls per day before someone added a MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3 check. Three lines of code, a quarter million dollars a day in waste. At scale, any retry loop without a circuit breaker is a potential cost bomb. The fix is simple. The cost of not having it is not.

Cache economics drive real architectural decisions. Prompt caching tracks fourteen break vectors with "sticky latches" that prevent unnecessary cache invalidation. When you're paying per token, cache invalidation becomes an accounting problem as much as an engineering one. This concern drives specific architectural choices like keeping system prompts stable across turns, segregating mutable configuration from immutable configuration, and designing tool definitions to minimize prompt churn. These aren't optimizations added after launch. They're architectural decisions that shaped the system's design from the ground up.

What practitioners can take from this

Not all of these patterns are necessary for smaller-scale agent systems, but the architectural instincts behind them apply broadly.

Separating workflow state from conversation state prevents a class of bugs that are nearly impossible to diagnose when they occur. Defining tools as data before code enables introspection and dynamic assembly that becomes essential as tool sets grow. Enforcing permissions at the tool layer rather than the prompt layer means security doesn't depend on the model behaving correctly. Budgeting tokens explicitly and halting cleanly prevents the cost surprises that have killed more than a few agent projects. Adding circuit breakers to retry loops is basic reliability engineering that's easy to forget until it's expensive. Restricting subagent tool sets by type prevents cross-contamination between agents with different responsibilities. And treating cache invalidation as a first-class engineering concern saves real money at any non-trivial scale.

These patterns emerged from a system handling millions of sessions. The scale is unusual, but the problems aren't. Anyone building agents that persist state, execute tools, coordinate work across multiple contexts, or operate under budget constraints will recognize these challenges. The value of the Claude Code architecture isn't that it invented new solutions. It's that it validated and refined patterns that the broader community has been developing independently, and it did so under the pressure of production scale.

Tomorrow I'll share my own observations on where these patterns succeed, where they fall short, and what I think is still fundamentally missing from the way we build agentic systems.