Dive into Claude Code: Design Space of Today’s and Future AI Agent Systems - FeynmanWiki

CONTENTS

Bookmark this paper

Save for later reading

Dive into Claude Code: Design Space of Today’s and Future AI Agent Systems

1. From autocomplete to autonomous coding agents

We can now make the central shift more concrete: the leap from prediction to agency is not just a matter of adding a better prompt. It is a change in what the system is responsible for. An autocomplete model is optimized to complete the next token or a local span; an IDE copilot expands that into a suggestion interface; but an autonomous coding agent must own a task from start to finish. In other words, the unit of work stops being “the next likely continuation” and becomes something closer to a goal LLL, such as fix the failing test in auth.test.ts.
That distinction matters because once the model is allowed to act through tools, its output is no longer only text. It becomes part of a control loop:
L→  Q  M→p→TL \xrightarrow{\;Q\;} M \rightarrow p \rightarrow TLQ​M→p→T
Here, the task LLL enters an agent loop QQQ, which assembles context, invokes the model MMM, produces a plan or action proposal ppp, and then executes tool calls TTT that may edit files, run tests, inspect logs, or search the codebase. The key point is that the model is not “done” after one generation. Its output changes the environment, and the environment feeds back into the next cycle. That feedback is what turns isolated language modeling into iterative problem solving.
This is also why the old question, “How do we prompt the model?”, is too narrow. Prompting matters, but it is only one component in a much larger architecture. In practice, a coding agent must decide:
What context to collect and retain,
Which tools it may call and when,
How much autonomy to grant before asking for help,
How to delegate subtasks or parallel work,
How to persist state across turns or sessions,
How to enforce safety when actions can modify real code.
A useful way to think about the system is that the raw model output ppp is not yet an action. It becomes useful only after a policy layer interprets it against the current resources and history:
P(p,R,H)→u\mathcal{P}(p, \mathcal{R}, \mathcal{H}) \rightarrow uP(p,R,H)→u
where R\mathcal{R}R denotes available resources and tools, H\mathcal{H}H denotes history or memory, and P\mathcal{P}P is the policy that turns a proposal into an executed update uuu. This is the architectural seam where most of the interesting design trade-offs live. Two agents can use the same underlying model and behave very differently depending on how they constrain or enrich this policy.
That is why “autonomy” is best understood as a systems property, not a model property. If the loop is too loose, the agent becomes brittle: it may over-edit, waste tokens, or drift from the task. If the loop is too tight, it degenerates into a fancy autocomplete that cannot pursue longer-horizon goals. Production systems therefore have to balance initiative with governance. The failure modes are familiar:
too little context, and the agent makes locally plausible but globally wrong changes;
too much context, and the agent becomes slow, noisy, or distracted;
too much freedom, and it may perform unsafe actions;
too much supervision, and it loses the very advantage of agency.
The running example — “Fix the failing test in auth.test.ts” — is especially illustrative because it forces every part of the stack to matter. The agent needs to inspect the failure, infer whether the bug is in the test or in the implementation, edit files, rerun the test, and decide whether the result is actually resolved. That task cannot be solved by token completion alone. It requires a loop that can observe, act, and revise. In that sense, a coding agent is less like a text generator and more like a small operating system for model-driven work.
The visual below compresses this progression into three stages. On the left, autocomplete is shown as a thin assistive layer: useful, but fundamentally local. In the middle, the IDE copilot broadens the interface to suggestions and edits, yet the human still steers the process. On the right, the agentic loop makes the shift explicit: a task enters QQQ, the model proposes ppp, tools TTT act on the world, and the result feeds back into the loop. The diagram is not merely illustrative; it captures the architectural claim that the rest of the lecture will unpack.
The small callouts about safety, context, extensibility, delegation, and persistence are not side issues. They are the surrounding design space that determines whether a loop like L→  Q  M→p→TL \xrightarrow{\;Q\;} M \rightarrow p \rightarrow TLQ​M→p→T is useful in practice or merely impressive in a demo. Once you see that structure, the rest of the lecture becomes a study of how Claude Code organizes those controls around the agent loop — and what that implies for future systems.

CONTENTS

Bookmark this paper

Save for later reading

Dive into Claude Code: Design Space of Today’s and Future AI Agent Systems

1. From autocomplete to autonomous coding agents

We can now make the central shift more concrete: the leap from prediction to agency is not just a matter of adding a better prompt. It is a change in what the system is responsible for. An autocomplete model is optimized to complete the next token or a local span; an IDE copilot expands that into a suggestion interface; but an autonomous coding agent must own a task from start to finish. In other words, the unit of work stops being “the next likely continuation” and becomes something closer to a goal LLL, such as fix the failing test in auth.test.ts.
That distinction matters because once the model is allowed to act through tools, its output is no longer only text. It becomes part of a control loop:
L→  Q  M→p→TL \xrightarrow{\;Q\;} M \rightarrow p \rightarrow TLQ​M→p→T
Here, the task LLL enters an agent loop QQQ, which assembles context, invokes the model MMM, produces a plan or action proposal ppp, and then executes tool calls TTT that may edit files, run tests, inspect logs, or search the codebase. The key point is that the model is not “done” after one generation. Its output changes the environment, and the environment feeds back into the next cycle. That feedback is what turns isolated language modeling into iterative problem solving.
This is also why the old question, “How do we prompt the model?”, is too narrow. Prompting matters, but it is only one component in a much larger architecture. In practice, a coding agent must decide:
What context to collect and retain,
Which tools it may call and when,
How much autonomy to grant before asking for help,
How to delegate subtasks or parallel work,
How to persist state across turns or sessions,
How to enforce safety when actions can modify real code.
A useful way to think about the system is that the raw model output ppp is not yet an action. It becomes useful only after a policy layer interprets it against the current resources and history:
P(p,R,H)→u\mathcal{P}(p, \mathcal{R}, \mathcal{H}) \rightarrow uP(p,R,H)→u
where R\mathcal{R}R denotes available resources and tools, H\mathcal{H}H denotes history or memory, and P\mathcal{P}P is the policy that turns a proposal into an executed update uuu. This is the architectural seam where most of the interesting design trade-offs live. Two agents can use the same underlying model and behave very differently depending on how they constrain or enrich this policy.
That is why “autonomy” is best understood as a systems property, not a model property. If the loop is too loose, the agent becomes brittle: it may over-edit, waste tokens, or drift from the task. If the loop is too tight, it degenerates into a fancy autocomplete that cannot pursue longer-horizon goals. Production systems therefore have to balance initiative with governance. The failure modes are familiar:
too little context, and the agent makes locally plausible but globally wrong changes;
too much context, and the agent becomes slow, noisy, or distracted;
too much freedom, and it may perform unsafe actions;
too much supervision, and it loses the very advantage of agency.
The running example — “Fix the failing test in auth.test.ts” — is especially illustrative because it forces every part of the stack to matter. The agent needs to inspect the failure, infer whether the bug is in the test or in the implementation, edit files, rerun the test, and decide whether the result is actually resolved. That task cannot be solved by token completion alone. It requires a loop that can observe, act, and revise. In that sense, a coding agent is less like a text generator and more like a small operating system for model-driven work.
The visual below compresses this progression into three stages. On the left, autocomplete is shown as a thin assistive layer: useful, but fundamentally local. In the middle, the IDE copilot broadens the interface to suggestions and edits, yet the human still steers the process. On the right, the agentic loop makes the shift explicit: a task enters QQQ, the model proposes ppp, tools TTT act on the world, and the result feeds back into the loop. The diagram is not merely illustrative; it captures the architectural claim that the rest of the lecture will unpack.
The small callouts about safety, context, extensibility, delegation, and persistence are not side issues. They are the surrounding design space that determines whether a loop like L→  Q  M→p→TL \xrightarrow{\;Q\;} M \rightarrow p \rightarrow TLQ​M→p→T is useful in practice or merely impressive in a demo. Once you see that structure, the rest of the lecture becomes a study of how Claude Code organizes those controls around the agent loop — and what that implies for future systems.

2. The core tension: autonomy versus human control

Building on the autonomous loop, the central question is no longer whether an agent can act, but how much control the user should retain over the action path. That distinction matters because “autonomous” systems are only useful when they can keep making progress without micromanagement, yet “fully autonomous” behavior is unacceptable if it can quietly cross boundaries around safety, privacy, or intent. The design problem, then, is not a binary choice between control and freedom; it is a mechanism design problem for agent behavior.
A good way to frame the tension is to think of Claude Code as operating inside two coupled objectives:
autonomyandhuman control\text{autonomy} \quad \text{and} \quad \text{human control}autonomyandhuman control
These are not opposites in the abstract. In practice, they conflict along specific axes: when should the model proceed without interruption, when should it pause, what must it ask permission for, and how much context should the user need to inspect before deciding? A system that over-optimizes autonomy becomes brittle or risky; a system that over-optimizes control degenerates into a verbose assistant that constantly asks for approval and loses momentum. The interesting part of the architecture is therefore the boundary layer between model initiative and user oversight.
Claude Code’s motivating values make that boundary explicit. The paper grounds the system in five priorities: human decision authority, safety, security, and privacy, reliable execution, capability amplification, and contextual adaptability. Notice that these are not merely ethical slogans; they imply concrete engineering constraints. For example, if human authority matters, then the system must preserve meaningful opportunities for intervention. If reliable execution matters, then it must continue making progress across long tasks. If contextual adaptability matters, then the control policy cannot be a fixed on/off switch, because the right degree of supervision depends on the task, the environment, and the user’s current attention.
This is where naive permission gating breaks down. The paper reports an approval rate of about 93% for tool requests, which is surprisingly high but also revealing: most confirmations are eventually accepted. If a system surfaces a dialog for every meaningful step, the user is not really exercising fine-grained oversight; instead, they are being converted into a repetitive rubber stamp. That produces approval fatigue, and approval fatigue is a subtle failure mode because it looks like oversight while actually weakening it. The user learns to click through prompts, which reduces the signal value of the permission boundary exactly when it is most needed.
A more robust architecture therefore has to distinguish between two kinds of control:
continuous autonomy for routine progress, gathering context, and low-risk actions;
selective human intervention for actions that are irreversible, sensitive, or outside the user’s implicit intent.
This separation is important because it reframes control as a policy over action types, not as a blanket interruption mechanism. The agent should keep moving when the user is inattentive, but it should still enforce meaningful boundaries on action classes that matter. In other words, the system needs to be both fast and constrained, and those constraints must be enforced at the level of tool use, permissions, and execution flow rather than through occasional admonitions from the model.
Mathematically, the tension can be read as an optimization problem with constraints: maximize useful progress subject to a control policy that preserves user authority and system safety. If the policy is too permissive, the agent becomes unsafe; if it is too strict, the agent loses the very autonomy that makes it valuable. So the real design space is not “can the model act?” but what controls shape the action path. That includes permission prompts, escalation rules, context visibility, and the exact moments when the loop pauses versus continues.
The visual below compresses that argument into a single glance. The scale makes the core trade-off legible: one side is autonomy, the other is human control, and Claude Code sits in the middle because its usefulness depends on holding both in tension rather than collapsing into either extreme. The compact list of motivating values explains why the system cannot simply pick one side of the scale, while the approval-fault loop on the right turns the abstract concern into a concrete failure mode: if almost every request is approved, then too many prompts become noise, and control degrades into fatigue.
Seen that way, the diagram is not just decorative summary. It is a compact statement of the architectural thesis that follows: Claude Code must be designed so the agent can keep acting, but every important action still passes through mechanisms that preserve meaningful, not merely ceremonial, human control.

3. Five values, thirteen design principles

To understand Claude Code’s architecture, it helps to begin one level above implementation and ask a more basic question: what values should an agent system optimize for in the first place? The answer matters because agent design is not just a matter of stacking models, tools, and prompts. Every concrete mechanism—how much autonomy the system has, how often it asks permission, how it preserves context, when it delegates, and what it remembers—quietly reflects a value judgment about the kind of collaborator we want the system to be.
Claude Code’s design starts from a small set of five values that act like policy constraints on the rest of the system. In practice, these values are meant to keep the agent useful without becoming brittle, reckless, or opaque. The important thing is that these values are not just branding language; they are meant to compile down into engineering decisions. A system that values user control will look very different from one that values raw autonomy. A system that values clarity will structure its memory and tool calls differently from one that values maximal throughput. A system that values safety will spend more of its budget on checks, confirmations, and constrained actions.
From those values, Claude Code derives thirteen design principles. You can think of these as the bridge between abstract aspiration and concrete behavior. A principle is more operational than a value: it tells the designer what to do when values collide. For example, if the agent can either proceed quickly or pause to reduce risk, a principle might favor default caution with explicit escalation. If it can either hide intermediate steps or surface them, a principle might favor transparency of reasoning and action. These principles are the reason the architecture feels cohesive rather than ad hoc.
This is an important distinction: values are the why, principles are the how. Without values, the architecture risks becoming a bag of heuristics optimized for benchmark scores. Without principles, the values stay inspirational but toothless. The real design work happens in the mapping between them. That mapping determines whether the agent is allowed to act independently, how it negotiates uncertainty, and how it maintains the user’s mental model over long tasks.
A useful way to read the system is to see the principles as constraints along several recurring axes:
Agency: when the model may act on its own versus when it must ask.
Visibility: how much of the plan, tool use, and state is exposed to the user.
Recoverability: how easily the system can correct mistakes or roll back.
Continuity: how it preserves context across steps and sessions.
Extensibility: how new tools or workflows get added without breaking the core.
Delegation: when a subtask should be handed off to a separate process or agent.
These axes matter because agent systems fail in characteristic ways. Too much autonomy produces silent errors that propagate. Too much control produces a system that behaves like a glorified autocomplete and never reaches the interesting parts of the task. Too much hidden context makes the agent seem magical until it suddenly contradicts itself. Too little persistence forces the system to relearn the same local facts again and again. The principles are an attempt to balance these failure modes rather than eliminate them outright.
What makes this especially relevant for production coding agents is that code is a hostile domain for vague design. Code has syntax, state, dependencies, side effects, and irreversible operations. A coding agent that is merely “smart” but not principled will eventually surprise its user in the worst way: by making changes that are locally plausible but globally wrong. So the system’s values need to shape mechanisms like permission gates, context assembly, tool invocation, and memory updates. Otherwise, the architecture may look agentic while behaving unpredictably.
Another subtle point is that these principles are not independent. They interact. For example, stronger permissioning can support safety, but if it is too coarse it can also destroy flow and lead to user fatigue. Better context pipelines improve autonomy because the model sees more relevant state, but they can also introduce noise and stale information. More aggressive delegation can improve parallelism, but only if the system can merge results coherently. A good agent design is therefore not a single best setting; it is a carefully negotiated compromise among interacting controls.
That is why the values-and-principles layer is worth pausing on before diving into components. It tells us that Claude Code is not merely a model wrapped around tools. It is a normative system: a set of preferences about how an AI collaborator should behave when everything is messy, incomplete, or risky. Once you see that, the rest of the architecture becomes easier to interpret. The agent loop, permission system, context pipeline, extensibility stack, delegation, and persistence are not isolated features; they are implementation answers to a prior question about what kind of agent this should be.
The visual below compresses that logic into a compact hierarchy. It is meant to make the causal chain feel obvious: values at the top, principles in the middle, and system behavior at the bottom. If the diagram reads as simple, that is the point—the complexity is not in the drawing itself, but in how much architectural judgment is hidden inside each arrow from value to principle to mechanism.

4. The seven-component system view

After the values and principles, the next question is not what should the system want? but what pieces must exist for those values to become real behavior? That is where Claude Code becomes easier to reason about as a system rather than as a prompt. The important shift is to stop imagining a single monolithic “agent” and instead separate the moving parts that jointly produce one turn of behavior.
A useful abstraction is that Claude Code decomposes into seven components with a strict dataflow. The user’s task LLL enters through an interface III, which may be an interactive CLI, a headless CLI, an SDK, or an IDE/browser surface. That interface does not itself do the reasoning; it mainly determines how the request is presented, how results are shown, and what operational affordances are available. From there the request enters the shared query loop QQQ, which is the recurring orchestration mechanism for a single turn.
Inside that loop sits the model call MMM, but the model is only one stage in a larger pipeline. A compact way to express the turn is
Q(L,s,h,T)↦(c,p,s′,h′)Q(L, s, h, T) \mapsto (c, p, s', h')Q(L,s,h,T)↦(c,p,s′,h′)
where the loop consumes the user task LLL, the current session state sss, the transcript history hhh, and the available toolset TTT, then emits a candidate control decision ccc, proposed tool actions ppp, and updated state and history (s′,h′)(s', h')(s′,h′). The exact symbols are less important than the architectural message: the loop is stateful, tool-aware, and iterative, not a one-shot prompt wrapped around a model completion.
What happens after the model proposes an action is where the design becomes production-grade. The proposal ppp is not executed directly. It passes through the permission system P\mathcal{P}P, which filters or gates tool usage before any side effect occurs. Only approved actions reach the tool pool TTT, and only then do they interact with the execution environment to produce outcomes xxx. This is the crucial control point: the agent is not defined by what it wants to do, but by the fact that desire, proposal, and execution are separated by policy. In practice, this is what makes the system safer, more legible, and more recoverable.
That separation also clarifies a subtle failure mode in agent design. If the model, tools, and execution environment are collapsed into one opaque loop, then every mistake looks like “the model failed,” when in reality the failure may come from permission mismatch, stale context, poor tool composition, or brittle recovery logic. By splitting the architecture into components, Claude Code makes these failure modes diagnosable. The loop can be too eager, the permissions too strict, the toolset too narrow, or the persistence layer too lossy — and those are different engineering problems with different fixes.
The persistence layer matters for the same reason. The state object sss and append-only transcript hhh are not decorative bookkeeping; they are the system’s memory substrate. They allow the loop to resume after interruptions, preserve a trace of prior decisions, and reconstruct why a tool call happened. In an agentic system, memory is not just about recall; it is about continuity of control. Without reliable persistence, a system may appear intelligent in one turn but become incoherent across turns because its own prior commitments vanish.
This is also why Claude Code’s architecture supports multiple interfaces without fragmenting into multiple agent engines. The design choice is to keep one shared loop QQQ and let different surfaces feed into it. That means the core behavior is consistent whether the request arrives from an interactive terminal, an SDK call, or an IDE integration. The interface can change the ergonomics, but the underlying control structure stays the same. In other words, Claude Code is not “a prompt for the CLI” plus “a different prompt for the SDK”; it is one agent loop wrapped in different operational skins.
A few consequences follow naturally:
Reasoning is thin, orchestration is thick. The model is central but not sufficient.
Permissions are first-class. Safety and usability emerge from gating, not after-the-fact supervision.
Persistence is architectural, not incidental. The transcript and session state are part of the control system.
Interfaces converge. Surface diversity does not imply a different agent core.
The visual below compresses exactly this argument into a single left-to-right flow. The important thing to look for is not just the arrows, but the boundaries: request, interface, query loop, model, permission gate, tools, and execution are distinct stages, while state and history sit underneath as the memory that stabilizes the whole process. That arrangement makes the claim visible: Claude Code is best understood as one shared loop surrounded by operational infrastructure, not as one giant prompt pretending to be a product.

5. The reactive agent loop

Claude Code’s central operational principle is surprisingly simple: think, act, observe, repeat. But in an agentic coding system, that loop is not just a control-flow convenience; it is the organizing abstraction that determines how the model spends context, when it can safely modify files, how it reacts to tool failures, and how much autonomy it can be trusted to exercise. Once you shift from a one-shot completion model to a reactive agent, the interesting question is no longer “can the model generate code?” but rather “how does it remain coherent while continuously revising its plan under partial information?”
The key idea is that the loop is reactive rather than deliberative in the classical planning sense. A fully deliberative planner would try to construct an exhaustive plan before acting, but software tasks are too underspecified, too environment-dependent, and too brittle for that to work well in practice. A reactive loop instead treats each model step as a local decision conditioned on the current state of the workspace: what files exist, what edits have been made, what commands just failed, what the user clarified, and what the tool outputs revealed. In other words, the agent does not merely execute a plan; it continually re-derives the next move from updated evidence.
This matters because coding is full of hidden state. The repository may contain non-obvious build rules, test fixtures may fail for reasons unrelated to the target bug, and a seemingly local change can have cascading effects elsewhere. A reactive loop is therefore designed around closed-loop feedback, where every action is immediately turned into new context. Formally, you can think of the agent as operating over a state sts_tst​ that includes the prompt, recent messages, tool outputs, and workspace observations, choosing an action ata_tat​, then receiving an updated observation ot+1o_{t+1}ot+1​ that reshapes the next state. The behavior is less like compiling a static plan and more like navigating with a live map.
That feedback structure also explains why the loop must be lightweight. If the agent spends too much time elaborating an internal plan, it can become stale before it is used. If it acts too quickly without reflection, it risks thrashing: editing files repeatedly, running the wrong command, or overfitting to noisy tool outputs. Production systems sit in the middle and rely on a few stable habits:
inspect before edit when the task is ambiguous,
test after meaningful change to confirm the effect,
replan after surprises rather than forcing the original intention,
stop early when the uncertainty is already resolved.
A subtle but crucial assumption is that the environment is partially observable. The agent never has perfect knowledge of the codebase; it only sees what it has searched, opened, or executed. That means the loop must preserve enough state to avoid re-discovering the same facts every turn, while also respecting context limits so the prompt does not balloon indefinitely. In practice, this creates a tension between memory of the recent past and freshness of the next decision. A good reactive loop carries forward just enough trace of prior actions to maintain coherence, but not so much that irrelevant history dominates the next step.
Failure modes emerge precisely when that balance breaks. If the loop overweights recent tool output, it can get stuck in a local patching pattern—fixing one error only to generate another. If it overweights its earlier intent, it may ignore evidence that the task has changed. And if the system does not distinguish between observation and commitment, then a speculative idea can be treated as if it were already validated. The architecture therefore needs a disciplined distinction between three roles: the model’s internal proposal, the tool’s external evidence, and the agent’s next committed action.
This is also where permissions and the reactive loop become inseparable. A loop that can inspect, edit, and execute commands must be able to decide when to request user approval and when to proceed autonomously. The agent’s turn is not just “what should I do next?” but also “is this action safe, reversible, or high-impact enough to require confirmation?” That permission boundary is part of the control policy itself, not an afterthought. Without it, the loop either becomes too timid to be useful or too aggressive to be trusted.
The visual below compresses that logic into a compact control structure: a task enters the loop, the model proposes an action, tools return observations, and the resulting state feeds the next turn. The point of the diagram is not simply to show a cycle, but to emphasize that each iteration is a reconstruction of intent under new evidence. The repetition is what gives the system robustness; the re-evaluation is what keeps it from becoming blind automation.
Seen this way, the diagram is really a summary of the most important design claim in agentic coding systems: intelligence is not only in the quality of a single answer, but in the quality of the transition between answers. That transition is where Claude Code spends its effort—updating context, deciding whether to act, and turning each tool result into a better next move.

6. queryLoop(): the agentic turn engine

Once we move from the high-level reactive agent loop to the mechanics of a production system, the central question becomes: what exactly happens during one agentic turn? Claude Code’s answer is not “run a model and hope for the best,” but a carefully staged control routine—queryLoop()—that turns user intent, current context, tool state, and policy into a single decision about the next action.
At a conceptual level, queryLoop() is the turn engine. It is the piece that repeatedly asks: given the current conversation, the repository state, the previously observed tool outputs, and the active permissions, what should the agent do next? That “what next?” is deceptively simple. In practice it hides several coupled subproblems: deciding whether the model has enough context to respond, whether it should call a tool, whether it should ask for confirmation, and whether it should stop and yield control back to the user. The turn engine is therefore less about text generation than about control-flow arbitration.
A useful way to think about this is that the loop maintains a latent state sts_tst​ representing the agent’s working situation at turn ttt: task progress, relevant memory, tool results, and policy constraints. The model proposes an action ata_tat​, but the system is the one that commits to the transition
st+1=f(st,at,environment feedback).s_{t+1} = f(s_t, a_t, \text{environment feedback}).st+1​=f(st​,at​,environment feedback).
This distinction matters. Many failures in agent systems come from conflating proposal with execution. The model may be capable of suggesting a useful edit, but the runtime still has to decide whether that edit is safe, whether it should be streamed incrementally, whether it triggers a follow-up search, and whether the current turn should continue or terminate. queryLoop() is the place where those commitments are made.
That also means the loop is doing more than planning. It is the boundary where reasoning meets policy. A turn can only proceed if the system can justify spending more context and more actions on the problem. If the task is underspecified, the loop may deliberately ask a clarifying question. If the model is confident but the action is sensitive, the loop may route through permission checks. If tool output has already resolved the uncertainty, the loop may end early. In other words, queryLoop() is a small but crucial example of a broader design principle in production agents: the model is not the orchestrator; the runtime is.
This design has an important implication for robustness. A naive agent often fails by entering either of two modes:
Over-eager execution: it keeps calling tools even when the answer is already available.
Premature termination: it stops before the task is genuinely complete.
queryLoop() exists to balance these modes. It repeatedly reassesses whether the current state still warrants more action. That reassessment is especially valuable in coding workflows, where a single tool call can radically change the situation—searching a file may uncover a symbol definition, an edit may invalidate a previous plan, or a test failure may reveal a deeper dependency. The loop must therefore be adaptive, not just iterative.
One subtle but important assumption is that the agent’s state is partially observable. The model never sees the whole repository or the whole environment at once; it sees a curated context window and a stream of tool observations. That means queryLoop() is not merely managing computation, but also managing attention budget. Each turn decides what information to surface, what to preserve, and what to omit. This is why the turn engine sits downstream of the context pipeline yet upstream of execution: it is the point where selected evidence becomes an actionable next step.
It is also where Claude Code’s design philosophy becomes visible. Rather than treating the agent as an autonomous black box, the system treats agency as a sequence of auditable turns. That makes the behavior easier to inspect, easier to interrupt, and easier to integrate with permissions and recovery. In practice, this modularity is what lets a coding agent remain useful in the messy real world: the loop can pause, resume, stream, backtrack, or hand control back to the human without collapsing the entire interaction model.
The visual below is helpful because it compresses this control logic into a few moving parts. Once you have the mental model of queryLoop() as the turn engine, the arrows and boxes become less like generic workflow decoration and more like evidence for a specific claim: each agentic turn is a disciplined cycle of observe, decide, act, and re-evaluate. The diagram makes the orchestration visible—especially the way context, permissions, and tool feedback all converge before the next action is chosen.
It also sets up the next step naturally. If queryLoop() decides what kind of action should happen, then the next question is how that action is actually carried out: how tool calls are dispatched, how outputs are streamed back, and how the system recovers when an execution path fails. That is where the control engine hands off to the execution machinery.

7. Tool dispatch, streaming execution, and recovery

Once the agent loop has produced a candidate action, the key question is no longer what to do, but how to dispatch it without stalling the whole system. In Claude Code, tool execution is deliberately streaming-first: the moment the model emits a tool plan ppp, the harness tries to start useful work immediately rather than waiting for the full turn to finish. That design matters because the dominant cost in real agentic coding is often not model reasoning alone, but the latency introduced by serializing every tool call behind a monolithic “wait until complete” boundary.
The first subtlety is that not all tool calls deserve the same scheduling policy. Claude Code partitions the tool set into two broad classes:
T=Tsafe∪Texclusive.T = T_{\mathrm{safe}} \cup T_{\mathrm{exclusive}}.T=Tsafe​∪Texclusive​.
Here, concurrent-safe tools can overlap in time because they do not interfere with one another’s effects, while exclusive tools must be serialized because they contend for shared state, shared filesystem regions, or other mutable resources. This is not merely an optimization detail; it is a correctness boundary. If we treated every call as parallelizable, we would gain throughput but lose determinism and risk self-induced races. If we treated every call as exclusive, we would preserve safety but give up much of the responsiveness that makes a coding agent feel interactive.
The streaming executor therefore behaves like a small online scheduler. When a compatible tool call arrives, it can be launched immediately, and later calls may be admitted in parallel only if their interference profile permits it. When a tool is classified as exclusive, the executor serializes it even if other work is available. The practical effect is that the system is trying to maximize overlap subject to a safety relation, not simply maximize concurrency. A concise way to think about it is:
safe tools: overlap is allowed, so latency can be hidden;
exclusive tools: overlap is forbidden, so order must be respected;
mixed batches: the batch must be partitioned before dispatch.
That partitioning step also reveals why Claude Code needs two execution paths. The ideal path uses StreamingToolExecutor, which begins dispatch as soon as the model’s partial output ppp is available. But streaming is not always possible: the model backend, the transport, or the current turn state may force the system to collect a full tool plan before execution. In that case, the harness falls back to partitionToolCalls() and runTools(), which execute the same logical set of calls synchronously. This fallback is important because it preserves correctness even when the low-latency path is unavailable; the architecture is designed so that “streaming-first” is an optimization, not a requirement for progress.
Coordination becomes more interesting once the executor is allowed to overlap work. Two signals govern the control flow. The first is a sibling abort controller, which lets the system cancel overlapping work when one branch makes another branch obsolete or risky. The second is a progress-available signal, which tells dependent steps that enough upstream output has accumulated for them to continue. Together, these signals encode a common agentic pattern: some tasks should be killed when they become redundant, while others should be unblocked as soon as the world state is sufficiently known. This prevents the executor from getting trapped in a brittle “fire and forget” regime where everything keeps running even after the logical branch has changed.
The other source of fragility is not tool contention but output pressure. If the model runs out of room while producing the turn, Claude Code does not simply fail the request and ask the user to retry. Instead, it first escalates max_output_tokens, trying to give the model more space to finish the current reasoning or tool plan. If that is still insufficient, the harness invokes the compaction operator C(c,h,W)→c′\mathcal{C}(c, h, W) \rightarrow c'C(c,h,W)→c′, which rewrites the conversation state under the working window WWW into a smaller but still usable context c′c'c′. The important point is that compaction is not just truncation; it is a controlled state transformation intended to preserve the information most relevant to continuing execution.
When the system encounters prompt_too_long, it still does not stop at a single recovery action. It can compact and retry, switch to a streaming fallback, or even hand control to a fallback model if the current route cannot sustain progress. That layered response makes the loop robust under two distinct pressures: tool pressure, where multiple actions compete for execution, and context pressure, where the model cannot fit its next move into the available window. In both cases, the architecture prefers progress-preserving adaptation over hard failure.
This is why the overall loop should be read as progress-oriented orchestration rather than simple event handling. Claude Code is not just reacting to model output; it is continuously reshaping that output into an execution plan that can survive partial availability, interference, and memory limits. The design trades a little complexity in the harness for a lot more resilience in practice.
The visual below compactly summarizes that strategy. On the left, the main execution lane makes the central idea concrete: p∈Tp \in Tp∈T, then tools are split into the safe and exclusive bands, with the streaming executor trying to start work early and the fallback path preserving correctness when streaming cannot be used. The coordination markers—the abort controller and the progress signal—show that concurrency is managed, not merely enabled.
On the right, the recovery stack captures the second half of the story: when the bottleneck is not tool interference but context exhaustion, the system escalates token budget, compacts via C\mathcal{C}C, retries on prompt_too_long, and finally switches models if needed. Read together, the two columns show a single architectural principle: Claude Code keeps moving by treating both execution and recovery as first-class parts of the agent loop.

8. Theorem: permission and safety are enforced in layers

Building on the execution path, the key security idea is that a tool request is not an action. It is only a proposal emitted by the model inside the shared query loop, and the runtime decides whether that proposal ever becomes a real side effect. This distinction matters because the model is optimized for producing useful continuations, not for being a security boundary. In other words, Claude Code treats model output as untrusted intent until the harness has explicitly authorized it.
Mathematically, we can think of a proposed invocation ppp as flowing through an authorization pipeline P\mathcal{P}P, which maps it to one of a few outcomes:
p→Pu∈{allow,ask,deny}p \xrightarrow{\mathcal{P}} u \in \{\text{allow}, \text{ask}, \text{deny}\}pP​u∈{allow,ask,deny}
The important part is not just the output label, but the fact that uuu is decided by the pipeline, not by the model call MMM. So even if the model emits a perfectly well-formed tool-use block, that block still has to survive the surrounding control system before it can reach execution.
This layered design is a direct response to a familiar failure mode in agent systems: single-point authorization. If one component is asked to decide everything, then any bug, prompt injection, or overly permissive policy can collapse the whole safety story. Claude Code avoids that by making authorization compositional. Each layer is narrower than the whole, but together they enforce a stronger invariant:
p⇏x unless p passes all applicable layers in Pp \not\Rightarrow x \text{ unless } p \text{ passes all applicable layers in } \mathcal{P}p⇒x unless p passes all applicable layers in P
Here xxx is the external effect — the actual approved tool action — and the arrow is intentionally conditional. The agent may want to act, but the system only lets it act if every relevant gate agrees.
The first gate is tool pre-filtering over the assembled tool pool TTT. This is easy to underestimate, but it eliminates an entire class of failures before “authorization” even begins: if a tool is unavailable, hidden, or not assembled into the current context, then the model cannot reliably invoke it in the first place. That means safety is not only about saying “no” at runtime; it also includes shrinking the action space so the model never sees certain paths as executable options.
Next comes the deny-first rule layer R\mathcal{R}R. This is subtle and important: deny-first means the system does not start from a blanket assumption that actions are allowed and then try to catch a few bad cases. It starts from caution, and any matching restriction blocks approval immediately. That gives the policy language real force. A declarative rule is not advisory text for the model; it is an operational constraint that can terminate the request before the agent ever reaches execution.
After that, the current permission mode encodes the trust posture γ\gammaγ. This is where the user’s chosen workflow matters: even a semantically acceptable request may still require confirmation, especially when the posture is conservative. So the system can return ask rather than allow, preserving a human-in-the-loop checkpoint when the risk model demands it. If auto-mode is enabled, a classifier may add another rejection surface, which is useful precisely because it catches cases that are too dynamic or context-dependent for static policy alone. The result is not redundancy for its own sake, but defense in depth.
Two more layers make the boundary even harder to bypass. Shell sandboxing constrains the effect of an approved command, so the capability is still limited at execution time. And resume logic does not silently restore permissions that were previously absent, which prevents “state drift” from accidentally reintroducing authority later in the session. Finally, hooks through H\mathcal{H}H can intercept at multiple phases — before tool use, during permission requests, after denials, and after use — which means the enforcement surface is not a single decision point but a programmable interception stack.
This is why the theorem is best read as an architectural claim, not just a safety slogan. Claude Code separates proposal generation from authorization, and then further decomposes authorization into layers that can each veto, defer, or restrict the action. That separation explains both the robustness of the system and the trade-off it embraces: the agent becomes less “magically autonomous,” but much more predictable and governable.
The visual below compresses that logic into a security-stack diagram. The left side represents the model’s output ppp: a request that exists only as intent. The middle layers show how that request is screened by pre-filtering, deny-first policy, permission posture, auto-mode checks, sandboxing, resume state, and hooks. The right side appears only if the entire pipeline clears the action, making the final xxx feel earned rather than assumed.
Read that picture as a compact proof sketch. The stacked bands are not merely a list; they are the reason the equation p→Pup \xrightarrow{\mathcal{P}} upP​u has semantic weight. If any layer says deny or ask, the proposal stops being a direct action and reverts to a controlled decision. That layered structure is the foundation for everything that follows, especially the stronger deny-first argument in the next section.

9. Proof of layered safety and deny-first control

Having established the theorem, the remaining task is to make the safety claim operationally believable: not just that access control exists, but that it is layered in a way that composes correctly under failure. That is the key distinction in agent systems. A single monolithic “permission check” is brittle because it assumes one place in the stack can see every relevant fact, while in practice the model, the policy engine, the hook pipeline, and the execution environment each observe different parts of the action.
The proposed tool call ppp is the useful object to track because it makes the argument concrete. We begin with the weakest possible statement: if p∈Tp \in Tp∈T, then it is only a candidate for execution, not an entitlement. In other words,
p∈T  ⇒  p survives only if it passes P.p \in T \;\Rightarrow\; p \text{ survives only if it passes } \mathcal{P}.p∈T⇒p survives only if it passes P.
That implication already encodes an important design principle: the model may be able to mention a tool, but the system does not yet owe it execution. The tool pool TTT is itself filtered before the model ever reasons over it, so a blanket-denied tool can be removed from consideration entirely. This is stronger than post-hoc rejection, because it prevents the agent from even constructing a plan around an unavailable capability.
The next layer is where deny-first control matters. Permission evaluation over R\mathcal{R}R is not a symmetric vote between allow and deny rules; it is intentionally asymmetric. If any rule returns deny, that decision dominates. This avoids a subtle but common failure mode in policy composition: if an allow rule can “outweigh” a deny rule, then a local exception can accidentally override a global safeguard. The correct mental model is therefore not “does the policy approve?” in the abstract, but rather “is there any earlier reason to stop?” Formally,
P(p;R,H,γ)=allow  only if no earlier layer returns deny.\mathcal{P}(p; \mathcal{R}, \mathcal{H}, \gamma) = \text{allow} \;\text{only if no earlier layer returns deny}.P(p;R,H,γ)=allowonly if no earlier layer returns deny.
That phrasing is important because it makes the layering explicit: P\mathcal{P}P is not a single test, but an ordered pipeline of veto points.
After policy evaluation come the interceptors: the hook pipeline H\mathcal{H}H and the auto-mode classifier. These are not redundant with R\mathcal{R}R; they protect against different classes of risk. Policy rules express relatively stable, declarative intent, while hooks can inspect richer runtime context, and the classifier can act as a dynamic guardrail when the model’s behavior looks risky or ambiguous. In some cases they do not merely deny; they can also rewrite ppp, changing the action into a safer equivalent. That distinction matters because real systems often need both blocking and shaping. A denial alone is a hard stop, but a rewrite can preserve utility while reducing exposure.
Then comes the execution boundary, which is where many toy explanations become misleading. Even after a tool call is approved, the actual shell action xxx still runs inside a sandbox. That means permission is not the same as full ambient authority. A successful approval only says the action is eligible; the sandbox determines what the action can actually touch. This separation is a classic defense-in-depth move: if any earlier layer misses something, the final environment can still limit blast radius. In practice, this is what keeps a mistaken approval from becoming a system-wide failure.
The session boundary closes a particularly easy-to-overlook loophole. When a session is resumed, prior session-scoped permissions are not simply restored as if trust were permanent. The trust posture γ\gammaγ does not leak across sessions, which prevents a one-time grant from silently becoming durable authority. This is especially important for interactive coding agents, where “temporary convenience” can quickly become de facto persistence if the system is not careful. The security property here is not only what happens now, but also what is forgotten later.
What emerges is a proof by execution path: the model proposes, but multiple layers can independently stop the action before or after proposal. That is why the architecture is robust even when individual components are imperfect. The guarantee does not depend on one flawless decision-maker; it depends on the fact that no single code path both reasons about and enforces access. Each layer sees only a slice of the problem, and each layer is empowered to veto.
The visual below compresses that logic into a compact pipeline. The stacked checkpoints are not decorative—they are the argument. Reading top to bottom, they show how a tool call can be eliminated before the model reasons about it, rejected by deny-first policy, intercepted by runtime guards, constrained in the sandbox, or invalidated at session resume. The green path matters only as the exceptional case: it appears if and only if every gate remains silent.
Seen this way, the displayed equations are not merely notation; they are the distilled summary of the whole proof. The first says that membership in the tool pool is insufficient without surviving P\mathcal{P}P. The second sharpens the conclusion: approval is the end of a chain of potential denials, not the outcome of a single optimistic check.

10. Permission modes, rules, and hooks

Once we move from abstract safety claims to an actual coding agent, the interesting question is no longer whether the system is “aligned” in the abstract, but how the agent is allowed to act in the world. For Claude Code, that means asking what kinds of operations should happen automatically, what should be gated behind confirmation, and what should be impossible unless the user explicitly changes policy. This is where permission modes, rules, and hooks become the core control surface.
The easiest way to think about this layer is as a hierarchy of decision points. Every external effect produced by the agent—editing files, running shell commands, network access, invoking tools—can be classified along two axes:
Intrinsic risk: how destructive or irreversible the action could be.
Contextual trust: whether the user has already authorized that class of action in the current session or repository.
A deny-first system does not try to prove every action safe. Instead, it assumes actions are unsafe unless policy says otherwise. That matters because coding agents operate in a messy environment: repositories contain secrets, scripts have side effects, package managers can mutate the filesystem, and “just run this command” can mean anything from harmless inspection to a production outage.
Claude Code’s permission design is therefore not just a UI feature; it is a control plane for action selection. The agent loop proposes an action, but the permission layer decides whether the proposal may proceed, must be escalated, or must be rejected. In formal terms, if the policy function is P(a,c)P(a, c)P(a,c) for action aaa under context ccc, then the agent’s autonomy is bounded by a check like:
execute(a)  ⟺  P(a,c)=allow\text{execute}(a) \iff P(a, c) = \text{allow}execute(a)⟺P(a,c)=allow
That simple gate hides a lot of nuance. The hard part is defining ccc: current directory, repository trust, command type, file path, user preferences, and session state all matter. A command like ls in a sandboxed repo is fundamentally different from rm -rf in a writable workspace, even if both are just “shell commands.” The permission system has to encode these distinctions without becoming so brittle that the agent is unusable.
This is where permission modes and rules separate. Permission modes are coarse-grained posture settings: they define the overall level of autonomy the agent starts with. Rules are finer-grained exceptions or constraints that override the default posture for specific patterns. In practice, that means the agent can be configured to behave conservatively by default while still granting targeted allowances for repeatable, low-risk workflows. The result is a policy stack that is both usable and auditable.
A subtle failure mode appears when permission systems become too modal. If the system asks the user for confirmation too often, the agent loses its productivity advantage and the user starts rubber-stamping prompts. If it asks too rarely, the system becomes indistinguishable from an unchecked script runner. Good design therefore aims for a middle regime where the most dangerous actions are blocked early, but common benign operations are streamlined. The key idea is not “maximize autonomy” but allocate autonomy where the risk/benefit ratio is favorable.
Hooks extend this same philosophy into the lifecycle of events. Instead of treating the agent as a sealed black box, hooks let the environment react to important transitions: before a tool call, after a tool call, on permission checks, on file changes, or at other integration points. Conceptually, hooks are policy-adjacent automation: they do not replace the permission system, but they can annotate, log, block, enrich, or redirect behavior. That makes them useful for teams that want to enforce local conventions, repository-specific guardrails, or lightweight observability.
There is also an important systems lesson here. A permission system alone is reactive: it can say yes or no at the moment of execution. Hooks make the system programmable around the decision boundary. That matters because many real failures are not single forbidden actions; they are sequences of individually permitted actions that collectively produce an unsafe outcome. Hooks give the architecture a chance to inspect those sequences, add context, or encode organizational policy without modifying the core model loop.
Taken together, these mechanisms create a layered control story:
Permission modes set the default autonomy level.
Rules add scoped exceptions or constraints.
Hooks integrate environment-specific checks and reactions.
The agent loop remains capable of planning and proposing actions, but never bypasses policy.
The visual below condenses that layered structure into a compact flow: the model proposes, policy evaluates, hooks intervene at the edges, and only then does execution happen. Read it as a summary of the key architectural claim of this section: safety is not a single guardrail but a set of interacting mechanisms that shape what the agent can do, when it must ask, and how it can be customized without rewriting the core system.

11. Extensibility at different context costs

Building on the loop and permission pipeline, the next question is not whether Claude Code can be extended, but where an extension should enter the system. That choice matters because every integration point pays a different price: some mechanisms are almost free in context, while others reshape the visible action space of the agent itself. In production agents, extensibility is never just a feature list; it is a budget allocation problem across prompt space, tool space, and execution space.
A useful way to think about this is to separate three resources the agent consumes:
context c,tool set T,execution path\text{context } c,\quad \text{tool set } T,\quad \text{execution path}context c,tool set T,execution path
An extension can inject instructions into ccc, add or alter tools in TTT, or intervene at execution time without changing either. These are qualitatively different interventions. They affect not only capability, but also reliability, safety, and how much of the system’s “brain” is exposed to the model at once. Claude Code’s design is notable because it does not collapse these into one generic plugin mechanism; instead, it deliberately spreads them across layers with increasing cost.
At the cheapest end are hooks. Hooks operate at execution time: they can observe, block, annotate, or redirect actions after the agent has already formed an intent. Because they do not need to be loaded into the model’s prompt, they have essentially zero context cost:
κ(hooks)=zero\kappa(\text{hooks}) = \text{zero}κ(hooks)=zero
This is a powerful property. Hooks can enforce operational policies, trigger logging, or mediate approvals without polluting the working context. The trade-off is that they are mostly reactive rather than generative: they shape what happens, but they do not directly teach the model new knowledge or new procedures.
Skills sit one layer higher. They are instruction bundles injected into the current context ccc, so they consume prompt budget, but only modestly. Their role is to enrich the agent with task-specific know-how: conventions, workflows, and local guidance that should be “in mind” while the agent reasons. That is why their cost is low rather than zero:
κ(skills)=low\kappa(\text{skills}) = \text{low}κ(skills)=low
This is often the sweet spot for reusable expertise. Skills are expressive enough to change the model’s behavior in a durable way during a task, yet lightweight enough to remain practical. Still, they are not free. As with any context injection, too many skills can crowd out the task itself, and the benefit depends on the model actually attending to the added guidance rather than treating it as background noise.
Plugins move beyond pure instruction injection into packaging-oriented extensibility. They contribute capabilities as a more structured bundle, which means more overhead than a skill but less exposure than exposing a large external tool universe. In the design space, this is a medium-cost layer:
κ(plugins)=medium\kappa(\text{plugins}) = \text{medium}κ(plugins)=medium
The conceptual difference is subtle but important. A plugin is not just “more text”; it is a reusable unit that may combine configuration, metadata, and capability wiring. That makes it more powerful for distribution and composition, but also more expensive in the engineering and context sense. Plugins are attractive when you want a coherent extension surface rather than a one-off instruction snippet.
At the expensive end are MCP servers, which contribute tool schemas into the reachable tool set TTT. Unlike hooks or skills, MCP changes what actions the agent can even attempt. That makes it the highest-cost mechanism here:
κ(MCP)=high\kappa(\text{MCP}) = \text{high}κ(MCP)=high
The reason is not just that tools are “bigger” than instructions. It is that tools alter the agent’s action manifold: they create new callable operations, new preconditions, new failure modes, and new permission surfaces. If a tool is reachable, it can be selected; if it is selected, it may need permissions, error handling, and feedback integration. This is capability gain, but it also increases the complexity of planning and safety management.
This layering becomes especially clear in the way Claude Code assembles its tool pool. The reachable tools are not simply “all available tools”; they are filtered and normalized through a pipeline:
T=dedup(MCP(filter(base tools,R)))T = \text{dedup}\big(\text{MCP}(\text{filter}(\text{base tools}, \mathcal{R}))\big)T=dedup(MCP(filter(base tools,R)))
The order matters. First, the base tools are reduced according to mode-specific constraints and deny rules R\mathcal{R}R. Then MCP integrations expand or refine the candidate set. Finally, duplicate entries are removed so the final action space remains coherent. This is a classic systems insight: extensibility must be designed together with gating and deduplication, or else “more capability” becomes “more ambiguity.”
There is another distinction that is easy to miss but crucial for understanding the architecture. Some mechanisms shape the current context ccc: CLAUDE.md\mathrm{CLAUDE.md}CLAUDE.md, skills, and plugins all influence what the model sees and how it reasons. But an Agent tool call does something else entirely: it creates a delegation request ddd and launches a separate subagent SSS with its own working window WWW.
d⇒S with its own Wd \Rightarrow S \text{ with its own } Wd⇒S with its own W
That means delegation is not merely another extension mechanism; it is a form of execution isolation. The subagent has its own local context and can pursue a subtask without consuming the parent agent’s entire working memory. This is why delegation belongs in the same architectural conversation as extensibility, even though it is not an extension in the narrow sense. It gives the system a way to scale work without flattening everything into one giant prompt.
So the key design principle is not “support as many extensions as possible.” It is to assign each extension mechanism a different role and a different cost profile:
Hooks: intervene without context overhead
Skills: add lightweight task guidance
Plugins: package structured capability with moderate overhead
MCP: expand the tool universe, but at high cost
Delegation: spawn isolated work units rather than inflating the parent context
That division of labor is exactly what makes the architecture interesting. It gives Claude Code a way to remain extensible without turning every extension into prompt bloat or every capability into a global tool. The visual below is useful because it compresses this entire argument into a single comparison: one axis of where the extension enters, one axis of how much context it costs, and one axis of what it changes. The table makes the qualitative hierarchy explicit, while the small tool-assembly strip reminds us that MCP tools are not inserted naively; they pass through filtering and deduplication before they become available.
Just as importantly, the compact callout distinguishes context injectors from delegated execution. That difference will matter in the next section, where context construction, compaction, and persistence determine how much of the agent’s working state survives over time, and how delegation can remain tractable instead of exploding the prompt.

12. Context construction, compaction, delegation, and persistence

Building on the shared loop QQQ and the extension stack τ\tauτ, the next question is no longer which capabilities the agent has, but what state it is actually allowed to hold onto while it reasons and acts. In Claude Code, that state is not an amorphous “prompt”; it is a bounded working set ccc living inside a fixed context window WWW, with the relationship c⊆Wc \subseteq Wc⊆W. That inequality is not just bookkeeping. It encodes the central systems constraint of production agents: every additional token is a trade-off against recall, latency, cost, and the risk of burying the truly relevant evidence under accumulated chatter.
The first subtlety is that context is constructed, not passively received. Claude Code assembles the live window from multiple sources with different trust and persistence profiles: the system prompt, environment information, the CLAUDE.md\mathrm{CLAUDE.md}CLAUDE.md hierarchy, path-scoped rules, auto memory, tool metadata, the conversation history h=⟨x1,x2,…,xℓ⟩h=\langle x_1,x_2,\dots,x_\ell\rangleh=⟨x1​,x2​,…,xℓ​⟩, raw tool results xxx, and compact summaries. These ingredients are not interchangeable. Some are instructions, some are observations, some are derived artifacts, and some are long-lived policy hints. The agent’s job is to merge them into a coherent short-term state without letting any one class of information dominate simply because it was mentioned recently.
That distinction matters because recency is not relevance. A coding agent frequently sees long tool traces, intermediate failures, repeated confirmations, and stale hypotheses that no longer belong in the current reasoning frame. If all of that were left untouched, the model would pay for it in two ways: the important facts would become harder to retrieve from attention, and the prompt would become increasingly brittle to minor changes in the task. So Claude Code treats context as an actively managed resource, not a dump. The design goal is to preserve decision-critical structure while discarding or compressing everything that has already served its purpose.
Once the live window approaches its limit, Claude Code does not simply truncate indiscriminately. It applies a staged compaction pipeline,
C=budget reduction→snip→microcompact→context collapse→auto-compact.\mathcal{C} = \text{budget reduction} \rightarrow \text{snip} \rightarrow \text{microcompact} \rightarrow \text{context collapse} \rightarrow \text{auto-compact}.C=budget reduction→snip→microcompact→context collapse→auto-compact.
This ordering is important because it reflects a gradient from cheap, local cleanup to more aggressive semantic summarization. Budget reduction trims easy excess first. Snip removes low-value spans. Microcompact condenses smaller stretches of history. Context collapse performs a more global reduction when the window is under genuine pressure. Auto-compact is the last resort, producing a broader summary that keeps the trajectory of the session while abandoning fine-grained detail.
A useful way to think about this pipeline is that it tries to preserve the invariants of the session while sacrificing the accidents of the transcript. The invariants are things like the current task goal, the active file, the failure mode being debugged, and any constraints that would materially change the next action. The accidents are redundant confirmations, now-irrelevant tool outputs, or exploratory branches that did not alter the final state. Of course, this is imperfect: compaction can oversummarize, lose a subtle precondition, or erase evidence needed to debug a later mistake. That failure mode is why compaction must be explicit and auditable rather than hidden inside a proprietary memory layer.
Delegation adds a second axis of control. A delegation request ddd maps to a subagent SSS,
d↦S,d \mapsto S,d↦S,
and the key design choice is that SSS receives its own isolated context window and tool set. This is not merely parallelism for speed; it is a way to localize reasoning so that a subtask can explore without contaminating the parent’s working state. The parent loop gets back only a summary, not a full replay. That “summary-only” return is a structural boundary: it keeps the main agent from inheriting a flood of transient subagent chatter, but it also means that delegation is inherently lossy. The subagent may discover useful details that never make it back unless they are intentionally distilled.
That lossy boundary is actually part of the architecture’s discipline. Claude Code prefers small, explicit transfers of meaning over implicit sharing of entire traces. In practice, that makes delegation resemble a controlled compression protocol: a child agent can spend context generously on one subproblem, while the parent remains compact and task-directed. The trade-off is familiar from distributed systems:
Benefit: local exploration without bloating the main loop.
Cost: information can be dropped or flattened in the handoff.
Risk: the summary may omit exactly the nuance needed for the next step.
Persistence closes the loop by separating the ephemeral working window from the durable record. The live context is temporary, but the session transcript hhh is stored as append-only JSONL, which means the system can resume or fork from an audit-friendly log rather than from an opaque internal database. This is a very deliberate choice. It makes the agent’s history inspectable, replayable, and easier to reason about when something goes wrong. At the same time, persistence is not identity: resuming from the log does not restore session-scoped permissions P\mathcal{P}P or trust posture γ\gammaγ. That boundary is crucial, because safety state and authorization state should not be inferred from transcript text alone.
The phrase “plain text and user-visible” is doing real architectural work here. CLAUDE.md\mathrm{CLAUDE.md}CLAUDE.md can shape behavior because it is part of the constructed context, but it does not function like hidden memory. This means the system’s operating assumptions remain legible to the user, while the short-term window remains bounded by policy rather than by whatever happened to be in the last conversation. The result is a layered memory model: visible rules, ephemeral working state, compacted summaries, and durable logs all coexist, but they do not all carry the same authority.
The visual below condenses that whole story into three interacting mechanisms: construction, compaction, and control transfer. The left side gathers heterogeneous sources into c⊆Wc \subseteq Wc⊆W, making the bounded-window idea concrete. The middle ladder makes the reduction pipeline C\mathcal{C}C feel like a sequence of increasingly aggressive filters rather than a single opaque truncation step. And the right side separates delegation from persistence: the subagent SSS receives a fresh local workspace via ddd, while the append-only hhh reminds us that durable history lives outside the live window. Together, those panels summarize the core lesson: Claude Code does not merely “remember less”; it manages context as an explicit, layered, and auditable resource.

13. Claude Code versus OpenClaw

After tracing how Claude Code builds context, decides when to ask for permission, and survives long-running work through compaction and persistence, the natural next question is: what kind of system is this, architecturally? One useful way to answer that is to compare Claude Code with OpenClaw, a representative agent framework that makes many design choices more explicit and, in some dimensions, more configurable. The point is not that one is “better” in a vacuum. Rather, they sit at different points in the design space, and the contrast clarifies which decisions are policy versus mechanism.
At a high level, Claude Code behaves like a productized coding agent: it is optimized for a smooth developer experience, strong default behavior, and tight integration across the whole workflow. OpenClaw is closer to a research and engineering framework: it exposes more of the control surface, making it easier to inspect, swap, or extend components. That difference matters because agent systems are not just a model plus tools; they are a stack of coupled choices about agency, memory, tool use, user control, and extensibility. The same underlying model can feel radically different depending on where those choices are anchored.
A helpful way to compare the two systems is along recurring dimensions that show up in nearly every production agent design:
Control loop: who owns the cadence of reasoning, tool use, and stopping?
Permissions: when can the agent act autonomously versus when must it ask?
Context pipeline: how is the working set of information assembled, compressed, and refreshed?
Extensibility: how easy is it to add tools, hooks, or custom workflows?
Delegation: can the system split work across subtasks or specialized agents?
Persistence: what survives across turns, sessions, or tasks?
Claude Code tends to fuse these dimensions into a cohesive product experience. OpenClaw, by contrast, more often surfaces them as modular primitives. That difference creates a classic trade-off: integration versus exposure. Integration gives you fewer seams to manage and often better ergonomics. Exposure gives you more room to experiment, but also more ways for the system to become inconsistent or brittle.
The most important architectural distinction is probably the agent loop. In Claude Code, the loop is strongly shaped by the product’s values: the model should be useful, but it should also remain legible and safe enough for a developer to trust. The loop is therefore not a free-running planner with unrestricted action; it is a guided cycle of reading context, proposing a next step, checking whether the step is allowed, and then revising as needed. In OpenClaw-like systems, the loop is often easier to customize directly, which is powerful if you want to study alternative planning or scheduling strategies. But that flexibility comes with a subtle cost: if the loop is too configurable, the system may stop feeling like one agent and start feeling like a collection of loosely coordinated components.
That same pattern appears in permissioning. Claude Code’s permission system is part of the product’s core safety and UX story, so it is tightly interwoven with execution. The agent does not merely “know” what to do; it knows when it is allowed to do it. OpenClaw-style systems often separate those concerns more explicitly, making permission checks an external policy layer or a replaceable module. This can be excellent for research, because it lets you test different trust models. But it also makes it easier to accidentally create gaps between what the planner believes is possible and what the runtime will actually permit.
The context pipeline is another sharp point of comparison. Claude Code uses a curated pipeline: retrieve relevant history, compact what matters, keep the working set within budget, and preserve enough structure that subsequent reasoning remains coherent. OpenClaw frameworks often make the individual stages more visible or swappable, which is ideal if you want to test retrieval strategies, summary operators, or memory policies. The trade-off is that context management is not just an implementation detail; it is part of the agent’s intelligence. If the pipeline is weak, even a strong model will behave forgetfully or myopically.
Extensibility and delegation separate the two systems even further. Claude Code emphasizes a curated surface area: enough extension points to be useful, but not so many that the user has to assemble an agent platform by hand. OpenClaw is typically better suited to composability: you can wire together custom tools, policy layers, and delegated subtasks more directly. That makes it attractive when the research question is about how to decompose work, not just whether the final answer is good. But a more composable stack can also become more fragile, because each extra abstraction introduces another place where state, permissions, and context can drift apart.
Persistence is where the long-horizon nature of these systems becomes especially visible. Claude Code treats persistence as part of the lived workflow: prior decisions, summaries, and task state matter because coding is rarely a one-shot activity. OpenClaw often makes persistence a more explicit design variable, which is useful for experiments on memory architectures. The catch is that persistence is not only about storage; it is about what the system believes is worth remembering. If that policy is under-specified, the agent may retain noise, lose commitments, or repeatedly rediscover the same local facts.
The upshot is that Claude Code and OpenClaw are best understood as different answers to the same question: how much of the agent should be opinionated product design, and how much should be programmable infrastructure? Claude Code pushes toward a reliable default experience with carefully chosen constraints. OpenClaw pushes toward a more inspectable and modifiable substrate. Neither removes the underlying difficulty of agent design; each simply places the difficulty in a different layer of the stack.
The visual below condenses that comparison into a compact architectural map. It is useful not because it lists every feature, but because it makes the recurring dimensions visible at once: where Claude Code tends to integrate policy into the product, and where OpenClaw tends to expose the underlying machinery as modules or knobs. That bird’s-eye view is the real point of the section, because once you see the system along these six axes, the later discussion of predictions and evidence becomes much easier to interpret.

14. What the architecture predicts and what the evidence suggests

Up to this point, the architecture has looked like a bundle of engineering choices: a loop that plans and acts, a permission model that gates side effects, a context pipeline that decides what the model can “see,” and layers for tools, delegation, and persistence. The interesting question now is not just what Claude Code does, but what those choices imply about the behavior we should expect from a production coding agent.
That matters because architecture is a theory about failure modes. If a system is built around short, repeated model calls with careful state management, we should expect better controllability than from an unconstrained, monolithic agent. If it also supports long-lived memory and tool-driven edits, we should expect more continuity across tasks—but also more opportunities for stale context, partial execution, or accidental overreach. In other words, the design does not merely enable capability; it predicts a characteristic envelope of strengths and weaknesses.
A useful way to reason about this is to separate the agent’s behavior into a few coupled dynamics:
Local competence: can it solve the next step with the context it has?
Global coherence: can it preserve intent across many steps?
Safety and reversibility: can it avoid or contain bad side effects?
Scalability of supervision: can a human still intervene without micromanaging?
Claude Code’s architecture pushes hard on the first three while preserving the fourth. The agent loop gives it repeated opportunities to re-evaluate, the permission system constrains risky actions, and the context pipeline filters what matters into each step. The result is not “perfect autonomy,” but something more practically interesting: an agent that can behave like a competent collaborator under bounded trust. That is exactly the kind of system the production coding setting rewards.
The architecture also suggests where performance should degrade. Once tasks become too long, too branching, or too dependent on undocumented project conventions, the context pipeline becomes the bottleneck. Even with retrieval, summarization, and persistence, the model can still lose latent constraints—details that were never written down cleanly enough to be recovered later. Likewise, delegation helps with decomposition, but every handoff introduces the risk of inconsistent subgoals. So the system’s reliability is not just about model quality; it is about how well the surrounding mechanisms preserve the task manifold as work expands over time.
This leads to a deeper claim: good agent architecture is less about maximizing raw autonomy than about shaping the probability distribution over actions. In a coding agent, the most important event is often not the final answer, but the sequence of intermediate moves—read, inspect, patch, test, ask, retry, summarize. A well-designed system makes the “safe and useful” trajectory more likely and the “irreversible mistake” trajectory less likely. That is why permission boundaries, state checkpoints, and explicit tool routing are not peripheral details; they are the core mechanism by which the system’s values become operational behavior.
The evidence we have from systems like Claude Code tends to support this view. In practice, users often report that constrained, tool-aware agents are less flashy than fully autonomous ones, but more dependable on real repositories. They are better at staying aligned with the current codebase, less likely to hallucinate file contents when they can inspect them directly, and more effective when they can iterate through a tight loop of edit-and-verify. At the same time, they still fail in familiar ways: they can overfit to recent context, miss project-wide invariants, or drift when the task requires long-range planning without enough explicit scaffolding.
So the architecture predicts a very specific pattern of outcomes:
Strong at iterative, inspectable coding tasks
Strong at bounded delegation and partial automation
Weak at deeply open-ended goals with sparse grounding
Weak when context growth outpaces summarization quality
Strongest when human oversight is available but not intrusive
That is the practical meaning of “design space” here. The system is not one point on a capability curve; it is a bundle of trade-offs that place it in a region of high usefulness for software work. If the earlier sections explained how the pieces fit, the natural next question is whether the observed behavior matches the theory. The visual below is useful precisely because it compresses that argument into a compact map: one side summarizes the architectural mechanisms, the other side summarizes the behavioral predictions they imply.
Read that image as a bridge between mechanism and evidence. It is not just a summary of components; it is a hypothesis about how those components should show up in practice. The arrows and grouped labels help make the causal story explicit: loop → iteration, permissions → safety, context pipeline → grounding, delegation/persistence → scale. That causal structure is the real takeaway, because it is what lets us compare today’s agent systems and ask which trade-offs are likely to matter for the next generation.

15. Unifying summary: design questions, answers, and trade-offs

After tracing Claude Code’s loop, permissions, context handling, delegation, and persistence as separate mechanisms, the natural question is how these pieces fit together as a system-level philosophy. The answer is that they are not independent tricks; they are coordinated responses to a small set of recurring design questions that every production coding agent must answer one way or another.
At the highest level, Claude Code pushes reasoning into the model call while keeping control in the harness. If we write the interaction abstractly, the agent receives a context ccc assembled from long-lived state LLL, current instructions III, and session state sss, then invokes the model MMM, which returns proposed actions and artifacts:
Q(L,I,s)→M(c)→{p,x}.Q(L, I, s) \rightarrow M(c) \rightarrow \{p, x\}.Q(L,I,s)→M(c)→{p,x}.
Here ppp is the policy-relevant action proposal and xxx is the tool-produced or environment-produced result. The architectural point is subtle: Claude Code does not try to build a fully explicit planner outside the model, but it also does not let the model directly own execution. Instead, the surrounding loop mediates what the model sees, what it may do, and what gets written back.
That choice creates a recurring pattern across the stack. The turn structure is deliberately thin and reactive: the system alternates between prompting, tool execution, and observation, rather than elaborating a heavyweight search process in the harness. In other words, the agent behaves like a disciplined ReAct loop, where the conversation and tool outcomes accumulate in an append-only history hhh, and each new step is formed from the current bounded context rather than from a global symbolic plan:
c⊆W,h=h∪{x}.c \subseteq W, \qquad h = h \cup \{x\}.c⊆W,h=h∪{x}.
This keeps the control flow simple and robust, but it also means the surrounding infrastructure must do more work to preserve useful information, manage drift, and recover from mistakes.
Safety follows the same philosophy. Claude Code is deny-first by default: permissions P\mathcal{P}P and human escalation H\mathcal{H}H sit between the model’s intent and the system’s effects. The model can ask, suggest, and justify, but the harness decides whether the action crosses the boundary. That arrangement is a practical form of defense in depth. It reduces the risk that a mistaken or manipulated prompt can directly cause irreversible side effects, but the price is friction—especially in workflows that require frequent filesystem, network, or shell operations.
Extensibility is also partitioned rather than centralized. Hooks, skills, plugins, and MCP integrations live in different places for a reason: each one answers a different architectural question about where policy belongs, where behavior is customized, and what should remain external. This modularity makes the system more composable and easier to integrate into real developer workflows, yet it increases the number of moving parts a user or platform engineer must understand. The result is not a single monolithic agent runtime, but a stack of narrowly scoped mechanisms that cooperate through stable interfaces.
Context management tells the same story in miniature. Claude Code treats context as a scarce resource bounded by WWW, then uses compaction C\mathcal{C}C, retrieval-like prioritization, and repository guidance such as CLAUDE.md\mathrm{CLAUDE.md}CLAUDE.md to keep the window aligned with the task. The virtue of this design is clarity: the model is fed a curated working set rather than an ever-growing transcript. The failure mode is equally clear: compression can discard details that later matter, so the system must trade completeness for relevance and rely on the user or the harness to reintroduce missing structure when needed.
Delegation and persistence round out the picture. When Claude Code spawns a subagent, it isolates work by delegating ddd to a separate SSS, which helps contain errors and enables parallelism. But isolation also means weaker shared memory; subagents often return summaries rather than fully merged internal state, so the system sacrifices some global coherence for containment and speed. Persistence is similarly pragmatic: append-only transcripts, session state, and resume/fork behavior provide durability and auditability without requiring a heavyweight database-backed planner. That makes sessions recoverable and inspectable, though not infinitely rich.
Seen together, these are not six unrelated features but six answers to the same design problem. In compact form, Claude Code’s architecture says:
Reason in the model, control in the harness
Keep turns shallow and reactive
Treat permissions as a first-class safety layer
Expose extensibility through explicit boundaries
Manage context as a bounded, curated resource
Isolate subagents and persist sessions with lightweight state
The visual below is useful because it compresses that argument into a single grid of questions, answers, principles, and trade-offs. Rather than memorizing each mechanism separately, the table makes the pattern visible: every design choice buys a concrete operational benefit and simultaneously introduces a corresponding cost. That symmetry is the real lesson.
The final takeaway is therefore architectural, not merely descriptive. Claude Code maximizes model autonomy inside a rich deterministic harness, and that trade is what lets it feel capable in practice while remaining governable in production. The table serves as a compact summary of that thesis: a map of the recurring design questions, the answers Claude Code chooses, and the price paid for each choice.

2. The core tension: autonomy versus human control

Building on the autonomous loop, the central question is no longer whether an agent can act, but how much control the user should retain over the action path. That distinction matters because “autonomous” systems are only useful when they can keep making progress without micromanagement, yet “fully autonomous” behavior is unacceptable if it can quietly cross boundaries around safety, privacy, or intent. The design problem, then, is not a binary choice between control and freedom; it is a mechanism design problem for agent behavior.
A good way to frame the tension is to think of Claude Code as operating inside two coupled objectives:
autonomyandhuman control\text{autonomy} \quad \text{and} \quad \text{human control}autonomyandhuman control
These are not opposites in the abstract. In practice, they conflict along specific axes: when should the model proceed without interruption, when should it pause, what must it ask permission for, and how much context should the user need to inspect before deciding? A system that over-optimizes autonomy becomes brittle or risky; a system that over-optimizes control degenerates into a verbose assistant that constantly asks for approval and loses momentum. The interesting part of the architecture is therefore the boundary layer between model initiative and user oversight.
Claude Code’s motivating values make that boundary explicit. The paper grounds the system in five priorities: human decision authority, safety, security, and privacy, reliable execution, capability amplification, and contextual adaptability. Notice that these are not merely ethical slogans; they imply concrete engineering constraints. For example, if human authority matters, then the system must preserve meaningful opportunities for intervention. If reliable execution matters, then it must continue making progress across long tasks. If contextual adaptability matters, then the control policy cannot be a fixed on/off switch, because the right degree of supervision depends on the task, the environment, and the user’s current attention.
This is where naive permission gating breaks down. The paper reports an approval rate of about 93% for tool requests, which is surprisingly high but also revealing: most confirmations are eventually accepted. If a system surfaces a dialog for every meaningful step, the user is not really exercising fine-grained oversight; instead, they are being converted into a repetitive rubber stamp. That produces approval fatigue, and approval fatigue is a subtle failure mode because it looks like oversight while actually weakening it. The user learns to click through prompts, which reduces the signal value of the permission boundary exactly when it is most needed.
A more robust architecture therefore has to distinguish between two kinds of control:
continuous autonomy for routine progress, gathering context, and low-risk actions;
selective human intervention for actions that are irreversible, sensitive, or outside the user’s implicit intent.
This separation is important because it reframes control as a policy over action types, not as a blanket interruption mechanism. The agent should keep moving when the user is inattentive, but it should still enforce meaningful boundaries on action classes that matter. In other words, the system needs to be both fast and constrained, and those constraints must be enforced at the level of tool use, permissions, and execution flow rather than through occasional admonitions from the model.
Mathematically, the tension can be read as an optimization problem with constraints: maximize useful progress subject to a control policy that preserves user authority and system safety. If the policy is too permissive, the agent becomes unsafe; if it is too strict, the agent loses the very autonomy that makes it valuable. So the real design space is not “can the model act?” but what controls shape the action path. That includes permission prompts, escalation rules, context visibility, and the exact moments when the loop pauses versus continues.
The visual below compresses that argument into a single glance. The scale makes the core trade-off legible: one side is autonomy, the other is human control, and Claude Code sits in the middle because its usefulness depends on holding both in tension rather than collapsing into either extreme. The compact list of motivating values explains why the system cannot simply pick one side of the scale, while the approval-fault loop on the right turns the abstract concern into a concrete failure mode: if almost every request is approved, then too many prompts become noise, and control degrades into fatigue.
Seen that way, the diagram is not just decorative summary. It is a compact statement of the architectural thesis that follows: Claude Code must be designed so the agent can keep acting, but every important action still passes through mechanisms that preserve meaningful, not merely ceremonial, human control.

3. Five values, thirteen design principles

To understand Claude Code’s architecture, it helps to begin one level above implementation and ask a more basic question: what values should an agent system optimize for in the first place? The answer matters because agent design is not just a matter of stacking models, tools, and prompts. Every concrete mechanism—how much autonomy the system has, how often it asks permission, how it preserves context, when it delegates, and what it remembers—quietly reflects a value judgment about the kind of collaborator we want the system to be.
Claude Code’s design starts from a small set of five values that act like policy constraints on the rest of the system. In practice, these values are meant to keep the agent useful without becoming brittle, reckless, or opaque. The important thing is that these values are not just branding language; they are meant to compile down into engineering decisions. A system that values user control will look very different from one that values raw autonomy. A system that values clarity will structure its memory and tool calls differently from one that values maximal throughput. A system that values safety will spend more of its budget on checks, confirmations, and constrained actions.
From those values, Claude Code derives thirteen design principles. You can think of these as the bridge between abstract aspiration and concrete behavior. A principle is more operational than a value: it tells the designer what to do when values collide. For example, if the agent can either proceed quickly or pause to reduce risk, a principle might favor default caution with explicit escalation. If it can either hide intermediate steps or surface them, a principle might favor transparency of reasoning and action. These principles are the reason the architecture feels cohesive rather than ad hoc.
This is an important distinction: values are the why, principles are the how. Without values, the architecture risks becoming a bag of heuristics optimized for benchmark scores. Without principles, the values stay inspirational but toothless. The real design work happens in the mapping between them. That mapping determines whether the agent is allowed to act independently, how it negotiates uncertainty, and how it maintains the user’s mental model over long tasks.
A useful way to read the system is to see the principles as constraints along several recurring axes:
Agency: when the model may act on its own versus when it must ask.
Visibility: how much of the plan, tool use, and state is exposed to the user.
Recoverability: how easily the system can correct mistakes or roll back.
Continuity: how it preserves context across steps and sessions.
Extensibility: how new tools or workflows get added without breaking the core.
Delegation: when a subtask should be handed off to a separate process or agent.
These axes matter because agent systems fail in characteristic ways. Too much autonomy produces silent errors that propagate. Too much control produces a system that behaves like a glorified autocomplete and never reaches the interesting parts of the task. Too much hidden context makes the agent seem magical until it suddenly contradicts itself. Too little persistence forces the system to relearn the same local facts again and again. The principles are an attempt to balance these failure modes rather than eliminate them outright.
What makes this especially relevant for production coding agents is that code is a hostile domain for vague design. Code has syntax, state, dependencies, side effects, and irreversible operations. A coding agent that is merely “smart” but not principled will eventually surprise its user in the worst way: by making changes that are locally plausible but globally wrong. So the system’s values need to shape mechanisms like permission gates, context assembly, tool invocation, and memory updates. Otherwise, the architecture may look agentic while behaving unpredictably.
Another subtle point is that these principles are not independent. They interact. For example, stronger permissioning can support safety, but if it is too coarse it can also destroy flow and lead to user fatigue. Better context pipelines improve autonomy because the model sees more relevant state, but they can also introduce noise and stale information. More aggressive delegation can improve parallelism, but only if the system can merge results coherently. A good agent design is therefore not a single best setting; it is a carefully negotiated compromise among interacting controls.
That is why the values-and-principles layer is worth pausing on before diving into components. It tells us that Claude Code is not merely a model wrapped around tools. It is a normative system: a set of preferences about how an AI collaborator should behave when everything is messy, incomplete, or risky. Once you see that, the rest of the architecture becomes easier to interpret. The agent loop, permission system, context pipeline, extensibility stack, delegation, and persistence are not isolated features; they are implementation answers to a prior question about what kind of agent this should be.
The visual below compresses that logic into a compact hierarchy. It is meant to make the causal chain feel obvious: values at the top, principles in the middle, and system behavior at the bottom. If the diagram reads as simple, that is the point—the complexity is not in the drawing itself, but in how much architectural judgment is hidden inside each arrow from value to principle to mechanism.

4. The seven-component system view

After the values and principles, the next question is not what should the system want? but what pieces must exist for those values to become real behavior? That is where Claude Code becomes easier to reason about as a system rather than as a prompt. The important shift is to stop imagining a single monolithic “agent” and instead separate the moving parts that jointly produce one turn of behavior.
A useful abstraction is that Claude Code decomposes into seven components with a strict dataflow. The user’s task LLL enters through an interface III, which may be an interactive CLI, a headless CLI, an SDK, or an IDE/browser surface. That interface does not itself do the reasoning; it mainly determines how the request is presented, how results are shown, and what operational affordances are available. From there the request enters the shared query loop QQQ, which is the recurring orchestration mechanism for a single turn.
Inside that loop sits the model call MMM, but the model is only one stage in a larger pipeline. A compact way to express the turn is
Q(L,s,h,T)↦(c,p,s′,h′)Q(L, s, h, T) \mapsto (c, p, s', h')Q(L,s,h,T)↦(c,p,s′,h′)
where the loop consumes the user task LLL, the current session state sss, the transcript history hhh, and the available toolset TTT, then emits a candidate control decision ccc, proposed tool actions ppp, and updated state and history (s′,h′)(s', h')(s′,h′). The exact symbols are less important than the architectural message: the loop is stateful, tool-aware, and iterative, not a one-shot prompt wrapped around a model completion.
What happens after the model proposes an action is where the design becomes production-grade. The proposal ppp is not executed directly. It passes through the permission system P\mathcal{P}P, which filters or gates tool usage before any side effect occurs. Only approved actions reach the tool pool TTT, and only then do they interact with the execution environment to produce outcomes xxx. This is the crucial control point: the agent is not defined by what it wants to do, but by the fact that desire, proposal, and execution are separated by policy. In practice, this is what makes the system safer, more legible, and more recoverable.
That separation also clarifies a subtle failure mode in agent design. If the model, tools, and execution environment are collapsed into one opaque loop, then every mistake looks like “the model failed,” when in reality the failure may come from permission mismatch, stale context, poor tool composition, or brittle recovery logic. By splitting the architecture into components, Claude Code makes these failure modes diagnosable. The loop can be too eager, the permissions too strict, the toolset too narrow, or the persistence layer too lossy — and those are different engineering problems with different fixes.
The persistence layer matters for the same reason. The state object sss and append-only transcript hhh are not decorative bookkeeping; they are the system’s memory substrate. They allow the loop to resume after interruptions, preserve a trace of prior decisions, and reconstruct why a tool call happened. In an agentic system, memory is not just about recall; it is about continuity of control. Without reliable persistence, a system may appear intelligent in one turn but become incoherent across turns because its own prior commitments vanish.
This is also why Claude Code’s architecture supports multiple interfaces without fragmenting into multiple agent engines. The design choice is to keep one shared loop QQQ and let different surfaces feed into it. That means the core behavior is consistent whether the request arrives from an interactive terminal, an SDK call, or an IDE integration. The interface can change the ergonomics, but the underlying control structure stays the same. In other words, Claude Code is not “a prompt for the CLI” plus “a different prompt for the SDK”; it is one agent loop wrapped in different operational skins.
A few consequences follow naturally:
Reasoning is thin, orchestration is thick. The model is central but not sufficient.
Permissions are first-class. Safety and usability emerge from gating, not after-the-fact supervision.
Persistence is architectural, not incidental. The transcript and session state are part of the control system.
Interfaces converge. Surface diversity does not imply a different agent core.
The visual below compresses exactly this argument into a single left-to-right flow. The important thing to look for is not just the arrows, but the boundaries: request, interface, query loop, model, permission gate, tools, and execution are distinct stages, while state and history sit underneath as the memory that stabilizes the whole process. That arrangement makes the claim visible: Claude Code is best understood as one shared loop surrounded by operational infrastructure, not as one giant prompt pretending to be a product.

5. The reactive agent loop

Claude Code’s central operational principle is surprisingly simple: think, act, observe, repeat. But in an agentic coding system, that loop is not just a control-flow convenience; it is the organizing abstraction that determines how the model spends context, when it can safely modify files, how it reacts to tool failures, and how much autonomy it can be trusted to exercise. Once you shift from a one-shot completion model to a reactive agent, the interesting question is no longer “can the model generate code?” but rather “how does it remain coherent while continuously revising its plan under partial information?”
The key idea is that the loop is reactive rather than deliberative in the classical planning sense. A fully deliberative planner would try to construct an exhaustive plan before acting, but software tasks are too underspecified, too environment-dependent, and too brittle for that to work well in practice. A reactive loop instead treats each model step as a local decision conditioned on the current state of the workspace: what files exist, what edits have been made, what commands just failed, what the user clarified, and what the tool outputs revealed. In other words, the agent does not merely execute a plan; it continually re-derives the next move from updated evidence.
This matters because coding is full of hidden state. The repository may contain non-obvious build rules, test fixtures may fail for reasons unrelated to the target bug, and a seemingly local change can have cascading effects elsewhere. A reactive loop is therefore designed around closed-loop feedback, where every action is immediately turned into new context. Formally, you can think of the agent as operating over a state sts_tst​ that includes the prompt, recent messages, tool outputs, and workspace observations, choosing an action ata_tat​, then receiving an updated observation ot+1o_{t+1}ot+1​ that reshapes the next state. The behavior is less like compiling a static plan and more like navigating with a live map.
That feedback structure also explains why the loop must be lightweight. If the agent spends too much time elaborating an internal plan, it can become stale before it is used. If it acts too quickly without reflection, it risks thrashing: editing files repeatedly, running the wrong command, or overfitting to noisy tool outputs. Production systems sit in the middle and rely on a few stable habits:
inspect before edit when the task is ambiguous,
test after meaningful change to confirm the effect,
replan after surprises rather than forcing the original intention,
stop early when the uncertainty is already resolved.
A subtle but crucial assumption is that the environment is partially observable. The agent never has perfect knowledge of the codebase; it only sees what it has searched, opened, or executed. That means the loop must preserve enough state to avoid re-discovering the same facts every turn, while also respecting context limits so the prompt does not balloon indefinitely. In practice, this creates a tension between memory of the recent past and freshness of the next decision. A good reactive loop carries forward just enough trace of prior actions to maintain coherence, but not so much that irrelevant history dominates the next step.
Failure modes emerge precisely when that balance breaks. If the loop overweights recent tool output, it can get stuck in a local patching pattern—fixing one error only to generate another. If it overweights its earlier intent, it may ignore evidence that the task has changed. And if the system does not distinguish between observation and commitment, then a speculative idea can be treated as if it were already validated. The architecture therefore needs a disciplined distinction between three roles: the model’s internal proposal, the tool’s external evidence, and the agent’s next committed action.
This is also where permissions and the reactive loop become inseparable. A loop that can inspect, edit, and execute commands must be able to decide when to request user approval and when to proceed autonomously. The agent’s turn is not just “what should I do next?” but also “is this action safe, reversible, or high-impact enough to require confirmation?” That permission boundary is part of the control policy itself, not an afterthought. Without it, the loop either becomes too timid to be useful or too aggressive to be trusted.
The visual below compresses that logic into a compact control structure: a task enters the loop, the model proposes an action, tools return observations, and the resulting state feeds the next turn. The point of the diagram is not simply to show a cycle, but to emphasize that each iteration is a reconstruction of intent under new evidence. The repetition is what gives the system robustness; the re-evaluation is what keeps it from becoming blind automation.
Seen this way, the diagram is really a summary of the most important design claim in agentic coding systems: intelligence is not only in the quality of a single answer, but in the quality of the transition between answers. That transition is where Claude Code spends its effort—updating context, deciding whether to act, and turning each tool result into a better next move.

6. queryLoop(): the agentic turn engine

Once we move from the high-level reactive agent loop to the mechanics of a production system, the central question becomes: what exactly happens during one agentic turn? Claude Code’s answer is not “run a model and hope for the best,” but a carefully staged control routine—queryLoop()—that turns user intent, current context, tool state, and policy into a single decision about the next action.
At a conceptual level, queryLoop() is the turn engine. It is the piece that repeatedly asks: given the current conversation, the repository state, the previously observed tool outputs, and the active permissions, what should the agent do next? That “what next?” is deceptively simple. In practice it hides several coupled subproblems: deciding whether the model has enough context to respond, whether it should call a tool, whether it should ask for confirmation, and whether it should stop and yield control back to the user. The turn engine is therefore less about text generation than about control-flow arbitration.
A useful way to think about this is that the loop maintains a latent state sts_tst​ representing the agent’s working situation at turn ttt: task progress, relevant memory, tool results, and policy constraints. The model proposes an action ata_tat​, but the system is the one that commits to the transition
st+1=f(st,at,environment feedback).s_{t+1} = f(s_t, a_t, \text{environment feedback}).st+1​=f(st​,at​,environment feedback).
This distinction matters. Many failures in agent systems come from conflating proposal with execution. The model may be capable of suggesting a useful edit, but the runtime still has to decide whether that edit is safe, whether it should be streamed incrementally, whether it triggers a follow-up search, and whether the current turn should continue or terminate. queryLoop() is the place where those commitments are made.
That also means the loop is doing more than planning. It is the boundary where reasoning meets policy. A turn can only proceed if the system can justify spending more context and more actions on the problem. If the task is underspecified, the loop may deliberately ask a clarifying question. If the model is confident but the action is sensitive, the loop may route through permission checks. If tool output has already resolved the uncertainty, the loop may end early. In other words, queryLoop() is a small but crucial example of a broader design principle in production agents: the model is not the orchestrator; the runtime is.
This design has an important implication for robustness. A naive agent often fails by entering either of two modes:
Over-eager execution: it keeps calling tools even when the answer is already available.
Premature termination: it stops before the task is genuinely complete.
queryLoop() exists to balance these modes. It repeatedly reassesses whether the current state still warrants more action. That reassessment is especially valuable in coding workflows, where a single tool call can radically change the situation—searching a file may uncover a symbol definition, an edit may invalidate a previous plan, or a test failure may reveal a deeper dependency. The loop must therefore be adaptive, not just iterative.
One subtle but important assumption is that the agent’s state is partially observable. The model never sees the whole repository or the whole environment at once; it sees a curated context window and a stream of tool observations. That means queryLoop() is not merely managing computation, but also managing attention budget. Each turn decides what information to surface, what to preserve, and what to omit. This is why the turn engine sits downstream of the context pipeline yet upstream of execution: it is the point where selected evidence becomes an actionable next step.
It is also where Claude Code’s design philosophy becomes visible. Rather than treating the agent as an autonomous black box, the system treats agency as a sequence of auditable turns. That makes the behavior easier to inspect, easier to interrupt, and easier to integrate with permissions and recovery. In practice, this modularity is what lets a coding agent remain useful in the messy real world: the loop can pause, resume, stream, backtrack, or hand control back to the human without collapsing the entire interaction model.
The visual below is helpful because it compresses this control logic into a few moving parts. Once you have the mental model of queryLoop() as the turn engine, the arrows and boxes become less like generic workflow decoration and more like evidence for a specific claim: each agentic turn is a disciplined cycle of observe, decide, act, and re-evaluate. The diagram makes the orchestration visible—especially the way context, permissions, and tool feedback all converge before the next action is chosen.
It also sets up the next step naturally. If queryLoop() decides what kind of action should happen, then the next question is how that action is actually carried out: how tool calls are dispatched, how outputs are streamed back, and how the system recovers when an execution path fails. That is where the control engine hands off to the execution machinery.

7. Tool dispatch, streaming execution, and recovery

Once the agent loop has produced a candidate action, the key question is no longer what to do, but how to dispatch it without stalling the whole system. In Claude Code, tool execution is deliberately streaming-first: the moment the model emits a tool plan ppp, the harness tries to start useful work immediately rather than waiting for the full turn to finish. That design matters because the dominant cost in real agentic coding is often not model reasoning alone, but the latency introduced by serializing every tool call behind a monolithic “wait until complete” boundary.
The first subtlety is that not all tool calls deserve the same scheduling policy. Claude Code partitions the tool set into two broad classes:
T=Tsafe∪Texclusive.T = T_{\mathrm{safe}} \cup T_{\mathrm{exclusive}}.T=Tsafe​∪Texclusive​.
Here, concurrent-safe tools can overlap in time because they do not interfere with one another’s effects, while exclusive tools must be serialized because they contend for shared state, shared filesystem regions, or other mutable resources. This is not merely an optimization detail; it is a correctness boundary. If we treated every call as parallelizable, we would gain throughput but lose determinism and risk self-induced races. If we treated every call as exclusive, we would preserve safety but give up much of the responsiveness that makes a coding agent feel interactive.
The streaming executor therefore behaves like a small online scheduler. When a compatible tool call arrives, it can be launched immediately, and later calls may be admitted in parallel only if their interference profile permits it. When a tool is classified as exclusive, the executor serializes it even if other work is available. The practical effect is that the system is trying to maximize overlap subject to a safety relation, not simply maximize concurrency. A concise way to think about it is:
safe tools: overlap is allowed, so latency can be hidden;
exclusive tools: overlap is forbidden, so order must be respected;
mixed batches: the batch must be partitioned before dispatch.
That partitioning step also reveals why Claude Code needs two execution paths. The ideal path uses StreamingToolExecutor, which begins dispatch as soon as the model’s partial output ppp is available. But streaming is not always possible: the model backend, the transport, or the current turn state may force the system to collect a full tool plan before execution. In that case, the harness falls back to partitionToolCalls() and runTools(), which execute the same logical set of calls synchronously. This fallback is important because it preserves correctness even when the low-latency path is unavailable; the architecture is designed so that “streaming-first” is an optimization, not a requirement for progress.
Coordination becomes more interesting once the executor is allowed to overlap work. Two signals govern the control flow. The first is a sibling abort controller, which lets the system cancel overlapping work when one branch makes another branch obsolete or risky. The second is a progress-available signal, which tells dependent steps that enough upstream output has accumulated for them to continue. Together, these signals encode a common agentic pattern: some tasks should be killed when they become redundant, while others should be unblocked as soon as the world state is sufficiently known. This prevents the executor from getting trapped in a brittle “fire and forget” regime where everything keeps running even after the logical branch has changed.
The other source of fragility is not tool contention but output pressure. If the model runs out of room while producing the turn, Claude Code does not simply fail the request and ask the user to retry. Instead, it first escalates max_output_tokens, trying to give the model more space to finish the current reasoning or tool plan. If that is still insufficient, the harness invokes the compaction operator C(c,h,W)→c′\mathcal{C}(c, h, W) \rightarrow c'C(c,h,W)→c′, which rewrites the conversation state under the working window WWW into a smaller but still usable context c′c'c′. The important point is that compaction is not just truncation; it is a controlled state transformation intended to preserve the information most relevant to continuing execution.
When the system encounters prompt_too_long, it still does not stop at a single recovery action. It can compact and retry, switch to a streaming fallback, or even hand control to a fallback model if the current route cannot sustain progress. That layered response makes the loop robust under two distinct pressures: tool pressure, where multiple actions compete for execution, and context pressure, where the model cannot fit its next move into the available window. In both cases, the architecture prefers progress-preserving adaptation over hard failure.
This is why the overall loop should be read as progress-oriented orchestration rather than simple event handling. Claude Code is not just reacting to model output; it is continuously reshaping that output into an execution plan that can survive partial availability, interference, and memory limits. The design trades a little complexity in the harness for a lot more resilience in practice.
The visual below compactly summarizes that strategy. On the left, the main execution lane makes the central idea concrete: p∈Tp \in Tp∈T, then tools are split into the safe and exclusive bands, with the streaming executor trying to start work early and the fallback path preserving correctness when streaming cannot be used. The coordination markers—the abort controller and the progress signal—show that concurrency is managed, not merely enabled.
On the right, the recovery stack captures the second half of the story: when the bottleneck is not tool interference but context exhaustion, the system escalates token budget, compacts via C\mathcal{C}C, retries on prompt_too_long, and finally switches models if needed. Read together, the two columns show a single architectural principle: Claude Code keeps moving by treating both execution and recovery as first-class parts of the agent loop.

8. Theorem: permission and safety are enforced in layers

Building on the execution path, the key security idea is that a tool request is not an action. It is only a proposal emitted by the model inside the shared query loop, and the runtime decides whether that proposal ever becomes a real side effect. This distinction matters because the model is optimized for producing useful continuations, not for being a security boundary. In other words, Claude Code treats model output as untrusted intent until the harness has explicitly authorized it.
Mathematically, we can think of a proposed invocation ppp as flowing through an authorization pipeline P\mathcal{P}P, which maps it to one of a few outcomes:
p→Pu∈{allow,ask,deny}p \xrightarrow{\mathcal{P}} u \in \{\text{allow}, \text{ask}, \text{deny}\}pP​u∈{allow,ask,deny}
The important part is not just the output label, but the fact that uuu is decided by the pipeline, not by the model call MMM. So even if the model emits a perfectly well-formed tool-use block, that block still has to survive the surrounding control system before it can reach execution.
This layered design is a direct response to a familiar failure mode in agent systems: single-point authorization. If one component is asked to decide everything, then any bug, prompt injection, or overly permissive policy can collapse the whole safety story. Claude Code avoids that by making authorization compositional. Each layer is narrower than the whole, but together they enforce a stronger invariant:
p⇏x unless p passes all applicable layers in Pp \not\Rightarrow x \text{ unless } p \text{ passes all applicable layers in } \mathcal{P}p⇒x unless p passes all applicable layers in P
Here xxx is the external effect — the actual approved tool action — and the arrow is intentionally conditional. The agent may want to act, but the system only lets it act if every relevant gate agrees.
The first gate is tool pre-filtering over the assembled tool pool TTT. This is easy to underestimate, but it eliminates an entire class of failures before “authorization” even begins: if a tool is unavailable, hidden, or not assembled into the current context, then the model cannot reliably invoke it in the first place. That means safety is not only about saying “no” at runtime; it also includes shrinking the action space so the model never sees certain paths as executable options.
Next comes the deny-first rule layer R\mathcal{R}R. This is subtle and important: deny-first means the system does not start from a blanket assumption that actions are allowed and then try to catch a few bad cases. It starts from caution, and any matching restriction blocks approval immediately. That gives the policy language real force. A declarative rule is not advisory text for the model; it is an operational constraint that can terminate the request before the agent ever reaches execution.
After that, the current permission mode encodes the trust posture γ\gammaγ. This is where the user’s chosen workflow matters: even a semantically acceptable request may still require confirmation, especially when the posture is conservative. So the system can return ask rather than allow, preserving a human-in-the-loop checkpoint when the risk model demands it. If auto-mode is enabled, a classifier may add another rejection surface, which is useful precisely because it catches cases that are too dynamic or context-dependent for static policy alone. The result is not redundancy for its own sake, but defense in depth.
Two more layers make the boundary even harder to bypass. Shell sandboxing constrains the effect of an approved command, so the capability is still limited at execution time. And resume logic does not silently restore permissions that were previously absent, which prevents “state drift” from accidentally reintroducing authority later in the session. Finally, hooks through H\mathcal{H}H can intercept at multiple phases — before tool use, during permission requests, after denials, and after use — which means the enforcement surface is not a single decision point but a programmable interception stack.
This is why the theorem is best read as an architectural claim, not just a safety slogan. Claude Code separates proposal generation from authorization, and then further decomposes authorization into layers that can each veto, defer, or restrict the action. That separation explains both the robustness of the system and the trade-off it embraces: the agent becomes less “magically autonomous,” but much more predictable and governable.
The visual below compresses that logic into a security-stack diagram. The left side represents the model’s output ppp: a request that exists only as intent. The middle layers show how that request is screened by pre-filtering, deny-first policy, permission posture, auto-mode checks, sandboxing, resume state, and hooks. The right side appears only if the entire pipeline clears the action, making the final xxx feel earned rather than assumed.
Read that picture as a compact proof sketch. The stacked bands are not merely a list; they are the reason the equation p→Pup \xrightarrow{\mathcal{P}} upP​u has semantic weight. If any layer says deny or ask, the proposal stops being a direct action and reverts to a controlled decision. That layered structure is the foundation for everything that follows, especially the stronger deny-first argument in the next section.

9. Proof of layered safety and deny-first control

Having established the theorem, the remaining task is to make the safety claim operationally believable: not just that access control exists, but that it is layered in a way that composes correctly under failure. That is the key distinction in agent systems. A single monolithic “permission check” is brittle because it assumes one place in the stack can see every relevant fact, while in practice the model, the policy engine, the hook pipeline, and the execution environment each observe different parts of the action.
The proposed tool call ppp is the useful object to track because it makes the argument concrete. We begin with the weakest possible statement: if p∈Tp \in Tp∈T, then it is only a candidate for execution, not an entitlement. In other words,
p∈T  ⇒  p survives only if it passes P.p \in T \;\Rightarrow\; p \text{ survives only if it passes } \mathcal{P}.p∈T⇒p survives only if it passes P.
That implication already encodes an important design principle: the model may be able to mention a tool, but the system does not yet owe it execution. The tool pool TTT is itself filtered before the model ever reasons over it, so a blanket-denied tool can be removed from consideration entirely. This is stronger than post-hoc rejection, because it prevents the agent from even constructing a plan around an unavailable capability.
The next layer is where deny-first control matters. Permission evaluation over R\mathcal{R}R is not a symmetric vote between allow and deny rules; it is intentionally asymmetric. If any rule returns deny, that decision dominates. This avoids a subtle but common failure mode in policy composition: if an allow rule can “outweigh” a deny rule, then a local exception can accidentally override a global safeguard. The correct mental model is therefore not “does the policy approve?” in the abstract, but rather “is there any earlier reason to stop?” Formally,
P(p;R,H,γ)=allow  only if no earlier layer returns deny.\mathcal{P}(p; \mathcal{R}, \mathcal{H}, \gamma) = \text{allow} \;\text{only if no earlier layer returns deny}.P(p;R,H,γ)=allowonly if no earlier layer returns deny.
That phrasing is important because it makes the layering explicit: P\mathcal{P}P is not a single test, but an ordered pipeline of veto points.
After policy evaluation come the interceptors: the hook pipeline H\mathcal{H}H and the auto-mode classifier. These are not redundant with R\mathcal{R}R; they protect against different classes of risk. Policy rules express relatively stable, declarative intent, while hooks can inspect richer runtime context, and the classifier can act as a dynamic guardrail when the model’s behavior looks risky or ambiguous. In some cases they do not merely deny; they can also rewrite ppp, changing the action into a safer equivalent. That distinction matters because real systems often need both blocking and shaping. A denial alone is a hard stop, but a rewrite can preserve utility while reducing exposure.
Then comes the execution boundary, which is where many toy explanations become misleading. Even after a tool call is approved, the actual shell action xxx still runs inside a sandbox. That means permission is not the same as full ambient authority. A successful approval only says the action is eligible; the sandbox determines what the action can actually touch. This separation is a classic defense-in-depth move: if any earlier layer misses something, the final environment can still limit blast radius. In practice, this is what keeps a mistaken approval from becoming a system-wide failure.
The session boundary closes a particularly easy-to-overlook loophole. When a session is resumed, prior session-scoped permissions are not simply restored as if trust were permanent. The trust posture γ\gammaγ does not leak across sessions, which prevents a one-time grant from silently becoming durable authority. This is especially important for interactive coding agents, where “temporary convenience” can quickly become de facto persistence if the system is not careful. The security property here is not only what happens now, but also what is forgotten later.
What emerges is a proof by execution path: the model proposes, but multiple layers can independently stop the action before or after proposal. That is why the architecture is robust even when individual components are imperfect. The guarantee does not depend on one flawless decision-maker; it depends on the fact that no single code path both reasons about and enforces access. Each layer sees only a slice of the problem, and each layer is empowered to veto.
The visual below compresses that logic into a compact pipeline. The stacked checkpoints are not decorative—they are the argument. Reading top to bottom, they show how a tool call can be eliminated before the model reasons about it, rejected by deny-first policy, intercepted by runtime guards, constrained in the sandbox, or invalidated at session resume. The green path matters only as the exceptional case: it appears if and only if every gate remains silent.
Seen this way, the displayed equations are not merely notation; they are the distilled summary of the whole proof. The first says that membership in the tool pool is insufficient without surviving P\mathcal{P}P. The second sharpens the conclusion: approval is the end of a chain of potential denials, not the outcome of a single optimistic check.

10. Permission modes, rules, and hooks

Once we move from abstract safety claims to an actual coding agent, the interesting question is no longer whether the system is “aligned” in the abstract, but how the agent is allowed to act in the world. For Claude Code, that means asking what kinds of operations should happen automatically, what should be gated behind confirmation, and what should be impossible unless the user explicitly changes policy. This is where permission modes, rules, and hooks become the core control surface.
The easiest way to think about this layer is as a hierarchy of decision points. Every external effect produced by the agent—editing files, running shell commands, network access, invoking tools—can be classified along two axes:
Intrinsic risk: how destructive or irreversible the action could be.
Contextual trust: whether the user has already authorized that class of action in the current session or repository.
A deny-first system does not try to prove every action safe. Instead, it assumes actions are unsafe unless policy says otherwise. That matters because coding agents operate in a messy environment: repositories contain secrets, scripts have side effects, package managers can mutate the filesystem, and “just run this command” can mean anything from harmless inspection to a production outage.
Claude Code’s permission design is therefore not just a UI feature; it is a control plane for action selection. The agent loop proposes an action, but the permission layer decides whether the proposal may proceed, must be escalated, or must be rejected. In formal terms, if the policy function is P(a,c)P(a, c)P(a,c) for action aaa under context ccc, then the agent’s autonomy is bounded by a check like:
execute(a)  ⟺  P(a,c)=allow\text{execute}(a) \iff P(a, c) = \text{allow}execute(a)⟺P(a,c)=allow
That simple gate hides a lot of nuance. The hard part is defining ccc: current directory, repository trust, command type, file path, user preferences, and session state all matter. A command like ls in a sandboxed repo is fundamentally different from rm -rf in a writable workspace, even if both are just “shell commands.” The permission system has to encode these distinctions without becoming so brittle that the agent is unusable.
This is where permission modes and rules separate. Permission modes are coarse-grained posture settings: they define the overall level of autonomy the agent starts with. Rules are finer-grained exceptions or constraints that override the default posture for specific patterns. In practice, that means the agent can be configured to behave conservatively by default while still granting targeted allowances for repeatable, low-risk workflows. The result is a policy stack that is both usable and auditable.
A subtle failure mode appears when permission systems become too modal. If the system asks the user for confirmation too often, the agent loses its productivity advantage and the user starts rubber-stamping prompts. If it asks too rarely, the system becomes indistinguishable from an unchecked script runner. Good design therefore aims for a middle regime where the most dangerous actions are blocked early, but common benign operations are streamlined. The key idea is not “maximize autonomy” but allocate autonomy where the risk/benefit ratio is favorable.
Hooks extend this same philosophy into the lifecycle of events. Instead of treating the agent as a sealed black box, hooks let the environment react to important transitions: before a tool call, after a tool call, on permission checks, on file changes, or at other integration points. Conceptually, hooks are policy-adjacent automation: they do not replace the permission system, but they can annotate, log, block, enrich, or redirect behavior. That makes them useful for teams that want to enforce local conventions, repository-specific guardrails, or lightweight observability.
There is also an important systems lesson here. A permission system alone is reactive: it can say yes or no at the moment of execution. Hooks make the system programmable around the decision boundary. That matters because many real failures are not single forbidden actions; they are sequences of individually permitted actions that collectively produce an unsafe outcome. Hooks give the architecture a chance to inspect those sequences, add context, or encode organizational policy without modifying the core model loop.
Taken together, these mechanisms create a layered control story:
Permission modes set the default autonomy level.
Rules add scoped exceptions or constraints.
Hooks integrate environment-specific checks and reactions.
The agent loop remains capable of planning and proposing actions, but never bypasses policy.
The visual below condenses that layered structure into a compact flow: the model proposes, policy evaluates, hooks intervene at the edges, and only then does execution happen. Read it as a summary of the key architectural claim of this section: safety is not a single guardrail but a set of interacting mechanisms that shape what the agent can do, when it must ask, and how it can be customized without rewriting the core system.

11. Extensibility at different context costs

Building on the loop and permission pipeline, the next question is not whether Claude Code can be extended, but where an extension should enter the system. That choice matters because every integration point pays a different price: some mechanisms are almost free in context, while others reshape the visible action space of the agent itself. In production agents, extensibility is never just a feature list; it is a budget allocation problem across prompt space, tool space, and execution space.
A useful way to think about this is to separate three resources the agent consumes:
context c,tool set T,execution path\text{context } c,\quad \text{tool set } T,\quad \text{execution path}context c,tool set T,execution path
An extension can inject instructions into ccc, add or alter tools in TTT, or intervene at execution time without changing either. These are qualitatively different interventions. They affect not only capability, but also reliability, safety, and how much of the system’s “brain” is exposed to the model at once. Claude Code’s design is notable because it does not collapse these into one generic plugin mechanism; instead, it deliberately spreads them across layers with increasing cost.
At the cheapest end are hooks. Hooks operate at execution time: they can observe, block, annotate, or redirect actions after the agent has already formed an intent. Because they do not need to be loaded into the model’s prompt, they have essentially zero context cost:
κ(hooks)=zero\kappa(\text{hooks}) = \text{zero}κ(hooks)=zero
This is a powerful property. Hooks can enforce operational policies, trigger logging, or mediate approvals without polluting the working context. The trade-off is that they are mostly reactive rather than generative: they shape what happens, but they do not directly teach the model new knowledge or new procedures.
Skills sit one layer higher. They are instruction bundles injected into the current context ccc, so they consume prompt budget, but only modestly. Their role is to enrich the agent with task-specific know-how: conventions, workflows, and local guidance that should be “in mind” while the agent reasons. That is why their cost is low rather than zero:
κ(skills)=low\kappa(\text{skills}) = \text{low}κ(skills)=low
This is often the sweet spot for reusable expertise. Skills are expressive enough to change the model’s behavior in a durable way during a task, yet lightweight enough to remain practical. Still, they are not free. As with any context injection, too many skills can crowd out the task itself, and the benefit depends on the model actually attending to the added guidance rather than treating it as background noise.
Plugins move beyond pure instruction injection into packaging-oriented extensibility. They contribute capabilities as a more structured bundle, which means more overhead than a skill but less exposure than exposing a large external tool universe. In the design space, this is a medium-cost layer:
κ(plugins)=medium\kappa(\text{plugins}) = \text{medium}κ(plugins)=medium
The conceptual difference is subtle but important. A plugin is not just “more text”; it is a reusable unit that may combine configuration, metadata, and capability wiring. That makes it more powerful for distribution and composition, but also more expensive in the engineering and context sense. Plugins are attractive when you want a coherent extension surface rather than a one-off instruction snippet.
At the expensive end are MCP servers, which contribute tool schemas into the reachable tool set TTT. Unlike hooks or skills, MCP changes what actions the agent can even attempt. That makes it the highest-cost mechanism here:
κ(MCP)=high\kappa(\text{MCP}) = \text{high}κ(MCP)=high
The reason is not just that tools are “bigger” than instructions. It is that tools alter the agent’s action manifold: they create new callable operations, new preconditions, new failure modes, and new permission surfaces. If a tool is reachable, it can be selected; if it is selected, it may need permissions, error handling, and feedback integration. This is capability gain, but it also increases the complexity of planning and safety management.
This layering becomes especially clear in the way Claude Code assembles its tool pool. The reachable tools are not simply “all available tools”; they are filtered and normalized through a pipeline:
T=dedup(MCP(filter(base tools,R)))T = \text{dedup}\big(\text{MCP}(\text{filter}(\text{base tools}, \mathcal{R}))\big)T=dedup(MCP(filter(base tools,R)))
The order matters. First, the base tools are reduced according to mode-specific constraints and deny rules R\mathcal{R}R. Then MCP integrations expand or refine the candidate set. Finally, duplicate entries are removed so the final action space remains coherent. This is a classic systems insight: extensibility must be designed together with gating and deduplication, or else “more capability” becomes “more ambiguity.”
There is another distinction that is easy to miss but crucial for understanding the architecture. Some mechanisms shape the current context ccc: CLAUDE.md\mathrm{CLAUDE.md}CLAUDE.md, skills, and plugins all influence what the model sees and how it reasons. But an Agent tool call does something else entirely: it creates a delegation request ddd and launches a separate subagent SSS with its own working window WWW.
d⇒S with its own Wd \Rightarrow S \text{ with its own } Wd⇒S with its own W
That means delegation is not merely another extension mechanism; it is a form of execution isolation. The subagent has its own local context and can pursue a subtask without consuming the parent agent’s entire working memory. This is why delegation belongs in the same architectural conversation as extensibility, even though it is not an extension in the narrow sense. It gives the system a way to scale work without flattening everything into one giant prompt.
So the key design principle is not “support as many extensions as possible.” It is to assign each extension mechanism a different role and a different cost profile:
Hooks: intervene without context overhead
Skills: add lightweight task guidance
Plugins: package structured capability with moderate overhead
MCP: expand the tool universe, but at high cost
Delegation: spawn isolated work units rather than inflating the parent context
That division of labor is exactly what makes the architecture interesting. It gives Claude Code a way to remain extensible without turning every extension into prompt bloat or every capability into a global tool. The visual below is useful because it compresses this entire argument into a single comparison: one axis of where the extension enters, one axis of how much context it costs, and one axis of what it changes. The table makes the qualitative hierarchy explicit, while the small tool-assembly strip reminds us that MCP tools are not inserted naively; they pass through filtering and deduplication before they become available.
Just as importantly, the compact callout distinguishes context injectors from delegated execution. That difference will matter in the next section, where context construction, compaction, and persistence determine how much of the agent’s working state survives over time, and how delegation can remain tractable instead of exploding the prompt.

12. Context construction, compaction, delegation, and persistence

Building on the shared loop QQQ and the extension stack τ\tauτ, the next question is no longer which capabilities the agent has, but what state it is actually allowed to hold onto while it reasons and acts. In Claude Code, that state is not an amorphous “prompt”; it is a bounded working set ccc living inside a fixed context window WWW, with the relationship c⊆Wc \subseteq Wc⊆W. That inequality is not just bookkeeping. It encodes the central systems constraint of production agents: every additional token is a trade-off against recall, latency, cost, and the risk of burying the truly relevant evidence under accumulated chatter.
The first subtlety is that context is constructed, not passively received. Claude Code assembles the live window from multiple sources with different trust and persistence profiles: the system prompt, environment information, the CLAUDE.md\mathrm{CLAUDE.md}CLAUDE.md hierarchy, path-scoped rules, auto memory, tool metadata, the conversation history h=⟨x1,x2,…,xℓ⟩h=\langle x_1,x_2,\dots,x_\ell\rangleh=⟨x1​,x2​,…,xℓ​⟩, raw tool results xxx, and compact summaries. These ingredients are not interchangeable. Some are instructions, some are observations, some are derived artifacts, and some are long-lived policy hints. The agent’s job is to merge them into a coherent short-term state without letting any one class of information dominate simply because it was mentioned recently.
That distinction matters because recency is not relevance. A coding agent frequently sees long tool traces, intermediate failures, repeated confirmations, and stale hypotheses that no longer belong in the current reasoning frame. If all of that were left untouched, the model would pay for it in two ways: the important facts would become harder to retrieve from attention, and the prompt would become increasingly brittle to minor changes in the task. So Claude Code treats context as an actively managed resource, not a dump. The design goal is to preserve decision-critical structure while discarding or compressing everything that has already served its purpose.
Once the live window approaches its limit, Claude Code does not simply truncate indiscriminately. It applies a staged compaction pipeline,
C=budget reduction→snip→microcompact→context collapse→auto-compact.\mathcal{C} = \text{budget reduction} \rightarrow \text{snip} \rightarrow \text{microcompact} \rightarrow \text{context collapse} \rightarrow \text{auto-compact}.C=budget reduction→snip→microcompact→context collapse→auto-compact.
This ordering is important because it reflects a gradient from cheap, local cleanup to more aggressive semantic summarization. Budget reduction trims easy excess first. Snip removes low-value spans. Microcompact condenses smaller stretches of history. Context collapse performs a more global reduction when the window is under genuine pressure. Auto-compact is the last resort, producing a broader summary that keeps the trajectory of the session while abandoning fine-grained detail.
A useful way to think about this pipeline is that it tries to preserve the invariants of the session while sacrificing the accidents of the transcript. The invariants are things like the current task goal, the active file, the failure mode being debugged, and any constraints that would materially change the next action. The accidents are redundant confirmations, now-irrelevant tool outputs, or exploratory branches that did not alter the final state. Of course, this is imperfect: compaction can oversummarize, lose a subtle precondition, or erase evidence needed to debug a later mistake. That failure mode is why compaction must be explicit and auditable rather than hidden inside a proprietary memory layer.
Delegation adds a second axis of control. A delegation request ddd maps to a subagent SSS,
d↦S,d \mapsto S,d↦S,
and the key design choice is that SSS receives its own isolated context window and tool set. This is not merely parallelism for speed; it is a way to localize reasoning so that a subtask can explore without contaminating the parent’s working state. The parent loop gets back only a summary, not a full replay. That “summary-only” return is a structural boundary: it keeps the main agent from inheriting a flood of transient subagent chatter, but it also means that delegation is inherently lossy. The subagent may discover useful details that never make it back unless they are intentionally distilled.
That lossy boundary is actually part of the architecture’s discipline. Claude Code prefers small, explicit transfers of meaning over implicit sharing of entire traces. In practice, that makes delegation resemble a controlled compression protocol: a child agent can spend context generously on one subproblem, while the parent remains compact and task-directed. The trade-off is familiar from distributed systems:
Benefit: local exploration without bloating the main loop.
Cost: information can be dropped or flattened in the handoff.
Risk: the summary may omit exactly the nuance needed for the next step.
Persistence closes the loop by separating the ephemeral working window from the durable record. The live context is temporary, but the session transcript hhh is stored as append-only JSONL, which means the system can resume or fork from an audit-friendly log rather than from an opaque internal database. This is a very deliberate choice. It makes the agent’s history inspectable, replayable, and easier to reason about when something goes wrong. At the same time, persistence is not identity: resuming from the log does not restore session-scoped permissions P\mathcal{P}P or trust posture γ\gammaγ. That boundary is crucial, because safety state and authorization state should not be inferred from transcript text alone.
The phrase “plain text and user-visible” is doing real architectural work here. CLAUDE.md\mathrm{CLAUDE.md}CLAUDE.md can shape behavior because it is part of the constructed context, but it does not function like hidden memory. This means the system’s operating assumptions remain legible to the user, while the short-term window remains bounded by policy rather than by whatever happened to be in the last conversation. The result is a layered memory model: visible rules, ephemeral working state, compacted summaries, and durable logs all coexist, but they do not all carry the same authority.
The visual below condenses that whole story into three interacting mechanisms: construction, compaction, and control transfer. The left side gathers heterogeneous sources into c⊆Wc \subseteq Wc⊆W, making the bounded-window idea concrete. The middle ladder makes the reduction pipeline C\mathcal{C}C feel like a sequence of increasingly aggressive filters rather than a single opaque truncation step. And the right side separates delegation from persistence: the subagent SSS receives a fresh local workspace via ddd, while the append-only hhh reminds us that durable history lives outside the live window. Together, those panels summarize the core lesson: Claude Code does not merely “remember less”; it manages context as an explicit, layered, and auditable resource.

13. Claude Code versus OpenClaw

After tracing how Claude Code builds context, decides when to ask for permission, and survives long-running work through compaction and persistence, the natural next question is: what kind of system is this, architecturally? One useful way to answer that is to compare Claude Code with OpenClaw, a representative agent framework that makes many design choices more explicit and, in some dimensions, more configurable. The point is not that one is “better” in a vacuum. Rather, they sit at different points in the design space, and the contrast clarifies which decisions are policy versus mechanism.
At a high level, Claude Code behaves like a productized coding agent: it is optimized for a smooth developer experience, strong default behavior, and tight integration across the whole workflow. OpenClaw is closer to a research and engineering framework: it exposes more of the control surface, making it easier to inspect, swap, or extend components. That difference matters because agent systems are not just a model plus tools; they are a stack of coupled choices about agency, memory, tool use, user control, and extensibility. The same underlying model can feel radically different depending on where those choices are anchored.
A helpful way to compare the two systems is along recurring dimensions that show up in nearly every production agent design:
Control loop: who owns the cadence of reasoning, tool use, and stopping?
Permissions: when can the agent act autonomously versus when must it ask?
Context pipeline: how is the working set of information assembled, compressed, and refreshed?
Extensibility: how easy is it to add tools, hooks, or custom workflows?
Delegation: can the system split work across subtasks or specialized agents?
Persistence: what survives across turns, sessions, or tasks?
Claude Code tends to fuse these dimensions into a cohesive product experience. OpenClaw, by contrast, more often surfaces them as modular primitives. That difference creates a classic trade-off: integration versus exposure. Integration gives you fewer seams to manage and often better ergonomics. Exposure gives you more room to experiment, but also more ways for the system to become inconsistent or brittle.
The most important architectural distinction is probably the agent loop. In Claude Code, the loop is strongly shaped by the product’s values: the model should be useful, but it should also remain legible and safe enough for a developer to trust. The loop is therefore not a free-running planner with unrestricted action; it is a guided cycle of reading context, proposing a next step, checking whether the step is allowed, and then revising as needed. In OpenClaw-like systems, the loop is often easier to customize directly, which is powerful if you want to study alternative planning or scheduling strategies. But that flexibility comes with a subtle cost: if the loop is too configurable, the system may stop feeling like one agent and start feeling like a collection of loosely coordinated components.
That same pattern appears in permissioning. Claude Code’s permission system is part of the product’s core safety and UX story, so it is tightly interwoven with execution. The agent does not merely “know” what to do; it knows when it is allowed to do it. OpenClaw-style systems often separate those concerns more explicitly, making permission checks an external policy layer or a replaceable module. This can be excellent for research, because it lets you test different trust models. But it also makes it easier to accidentally create gaps between what the planner believes is possible and what the runtime will actually permit.
The context pipeline is another sharp point of comparison. Claude Code uses a curated pipeline: retrieve relevant history, compact what matters, keep the working set within budget, and preserve enough structure that subsequent reasoning remains coherent. OpenClaw frameworks often make the individual stages more visible or swappable, which is ideal if you want to test retrieval strategies, summary operators, or memory policies. The trade-off is that context management is not just an implementation detail; it is part of the agent’s intelligence. If the pipeline is weak, even a strong model will behave forgetfully or myopically.
Extensibility and delegation separate the two systems even further. Claude Code emphasizes a curated surface area: enough extension points to be useful, but not so many that the user has to assemble an agent platform by hand. OpenClaw is typically better suited to composability: you can wire together custom tools, policy layers, and delegated subtasks more directly. That makes it attractive when the research question is about how to decompose work, not just whether the final answer is good. But a more composable stack can also become more fragile, because each extra abstraction introduces another place where state, permissions, and context can drift apart.
Persistence is where the long-horizon nature of these systems becomes especially visible. Claude Code treats persistence as part of the lived workflow: prior decisions, summaries, and task state matter because coding is rarely a one-shot activity. OpenClaw often makes persistence a more explicit design variable, which is useful for experiments on memory architectures. The catch is that persistence is not only about storage; it is about what the system believes is worth remembering. If that policy is under-specified, the agent may retain noise, lose commitments, or repeatedly rediscover the same local facts.
The upshot is that Claude Code and OpenClaw are best understood as different answers to the same question: how much of the agent should be opinionated product design, and how much should be programmable infrastructure? Claude Code pushes toward a reliable default experience with carefully chosen constraints. OpenClaw pushes toward a more inspectable and modifiable substrate. Neither removes the underlying difficulty of agent design; each simply places the difficulty in a different layer of the stack.
The visual below condenses that comparison into a compact architectural map. It is useful not because it lists every feature, but because it makes the recurring dimensions visible at once: where Claude Code tends to integrate policy into the product, and where OpenClaw tends to expose the underlying machinery as modules or knobs. That bird’s-eye view is the real point of the section, because once you see the system along these six axes, the later discussion of predictions and evidence becomes much easier to interpret.

14. What the architecture predicts and what the evidence suggests

Up to this point, the architecture has looked like a bundle of engineering choices: a loop that plans and acts, a permission model that gates side effects, a context pipeline that decides what the model can “see,” and layers for tools, delegation, and persistence. The interesting question now is not just what Claude Code does, but what those choices imply about the behavior we should expect from a production coding agent.
That matters because architecture is a theory about failure modes. If a system is built around short, repeated model calls with careful state management, we should expect better controllability than from an unconstrained, monolithic agent. If it also supports long-lived memory and tool-driven edits, we should expect more continuity across tasks—but also more opportunities for stale context, partial execution, or accidental overreach. In other words, the design does not merely enable capability; it predicts a characteristic envelope of strengths and weaknesses.
A useful way to reason about this is to separate the agent’s behavior into a few coupled dynamics:
Local competence: can it solve the next step with the context it has?
Global coherence: can it preserve intent across many steps?
Safety and reversibility: can it avoid or contain bad side effects?
Scalability of supervision: can a human still intervene without micromanaging?
Claude Code’s architecture pushes hard on the first three while preserving the fourth. The agent loop gives it repeated opportunities to re-evaluate, the permission system constrains risky actions, and the context pipeline filters what matters into each step. The result is not “perfect autonomy,” but something more practically interesting: an agent that can behave like a competent collaborator under bounded trust. That is exactly the kind of system the production coding setting rewards.
The architecture also suggests where performance should degrade. Once tasks become too long, too branching, or too dependent on undocumented project conventions, the context pipeline becomes the bottleneck. Even with retrieval, summarization, and persistence, the model can still lose latent constraints—details that were never written down cleanly enough to be recovered later. Likewise, delegation helps with decomposition, but every handoff introduces the risk of inconsistent subgoals. So the system’s reliability is not just about model quality; it is about how well the surrounding mechanisms preserve the task manifold as work expands over time.
This leads to a deeper claim: good agent architecture is less about maximizing raw autonomy than about shaping the probability distribution over actions. In a coding agent, the most important event is often not the final answer, but the sequence of intermediate moves—read, inspect, patch, test, ask, retry, summarize. A well-designed system makes the “safe and useful” trajectory more likely and the “irreversible mistake” trajectory less likely. That is why permission boundaries, state checkpoints, and explicit tool routing are not peripheral details; they are the core mechanism by which the system’s values become operational behavior.
The evidence we have from systems like Claude Code tends to support this view. In practice, users often report that constrained, tool-aware agents are less flashy than fully autonomous ones, but more dependable on real repositories. They are better at staying aligned with the current codebase, less likely to hallucinate file contents when they can inspect them directly, and more effective when they can iterate through a tight loop of edit-and-verify. At the same time, they still fail in familiar ways: they can overfit to recent context, miss project-wide invariants, or drift when the task requires long-range planning without enough explicit scaffolding.
So the architecture predicts a very specific pattern of outcomes:
Strong at iterative, inspectable coding tasks
Strong at bounded delegation and partial automation
Weak at deeply open-ended goals with sparse grounding
Weak when context growth outpaces summarization quality
Strongest when human oversight is available but not intrusive
That is the practical meaning of “design space” here. The system is not one point on a capability curve; it is a bundle of trade-offs that place it in a region of high usefulness for software work. If the earlier sections explained how the pieces fit, the natural next question is whether the observed behavior matches the theory. The visual below is useful precisely because it compresses that argument into a compact map: one side summarizes the architectural mechanisms, the other side summarizes the behavioral predictions they imply.
Read that image as a bridge between mechanism and evidence. It is not just a summary of components; it is a hypothesis about how those components should show up in practice. The arrows and grouped labels help make the causal story explicit: loop → iteration, permissions → safety, context pipeline → grounding, delegation/persistence → scale. That causal structure is the real takeaway, because it is what lets us compare today’s agent systems and ask which trade-offs are likely to matter for the next generation.

15. Unifying summary: design questions, answers, and trade-offs

After tracing Claude Code’s loop, permissions, context handling, delegation, and persistence as separate mechanisms, the natural question is how these pieces fit together as a system-level philosophy. The answer is that they are not independent tricks; they are coordinated responses to a small set of recurring design questions that every production coding agent must answer one way or another.
At the highest level, Claude Code pushes reasoning into the model call while keeping control in the harness. If we write the interaction abstractly, the agent receives a context ccc assembled from long-lived state LLL, current instructions III, and session state sss, then invokes the model MMM, which returns proposed actions and artifacts:
Q(L,I,s)→M(c)→{p,x}.Q(L, I, s) \rightarrow M(c) \rightarrow \{p, x\}.Q(L,I,s)→M(c)→{p,x}.
Here ppp is the policy-relevant action proposal and xxx is the tool-produced or environment-produced result. The architectural point is subtle: Claude Code does not try to build a fully explicit planner outside the model, but it also does not let the model directly own execution. Instead, the surrounding loop mediates what the model sees, what it may do, and what gets written back.
That choice creates a recurring pattern across the stack. The turn structure is deliberately thin and reactive: the system alternates between prompting, tool execution, and observation, rather than elaborating a heavyweight search process in the harness. In other words, the agent behaves like a disciplined ReAct loop, where the conversation and tool outcomes accumulate in an append-only history hhh, and each new step is formed from the current bounded context rather than from a global symbolic plan:
c⊆W,h=h∪{x}.c \subseteq W, \qquad h = h \cup \{x\}.c⊆W,h=h∪{x}.
This keeps the control flow simple and robust, but it also means the surrounding infrastructure must do more work to preserve useful information, manage drift, and recover from mistakes.
Safety follows the same philosophy. Claude Code is deny-first by default: permissions P\mathcal{P}P and human escalation H\mathcal{H}H sit between the model’s intent and the system’s effects. The model can ask, suggest, and justify, but the harness decides whether the action crosses the boundary. That arrangement is a practical form of defense in depth. It reduces the risk that a mistaken or manipulated prompt can directly cause irreversible side effects, but the price is friction—especially in workflows that require frequent filesystem, network, or shell operations.
Extensibility is also partitioned rather than centralized. Hooks, skills, plugins, and MCP integrations live in different places for a reason: each one answers a different architectural question about where policy belongs, where behavior is customized, and what should remain external. This modularity makes the system more composable and easier to integrate into real developer workflows, yet it increases the number of moving parts a user or platform engineer must understand. The result is not a single monolithic agent runtime, but a stack of narrowly scoped mechanisms that cooperate through stable interfaces.
Context management tells the same story in miniature. Claude Code treats context as a scarce resource bounded by WWW, then uses compaction C\mathcal{C}C, retrieval-like prioritization, and repository guidance such as CLAUDE.md\mathrm{CLAUDE.md}CLAUDE.md to keep the window aligned with the task. The virtue of this design is clarity: the model is fed a curated working set rather than an ever-growing transcript. The failure mode is equally clear: compression can discard details that later matter, so the system must trade completeness for relevance and rely on the user or the harness to reintroduce missing structure when needed.
Delegation and persistence round out the picture. When Claude Code spawns a subagent, it isolates work by delegating ddd to a separate SSS, which helps contain errors and enables parallelism. But isolation also means weaker shared memory; subagents often return summaries rather than fully merged internal state, so the system sacrifices some global coherence for containment and speed. Persistence is similarly pragmatic: append-only transcripts, session state, and resume/fork behavior provide durability and auditability without requiring a heavyweight database-backed planner. That makes sessions recoverable and inspectable, though not infinitely rich.
Seen together, these are not six unrelated features but six answers to the same design problem. In compact form, Claude Code’s architecture says:
Reason in the model, control in the harness
Keep turns shallow and reactive
Treat permissions as a first-class safety layer
Expose extensibility through explicit boundaries
Manage context as a bounded, curated resource
Isolate subagents and persist sessions with lightweight state
The visual below is useful because it compresses that argument into a single grid of questions, answers, principles, and trade-offs. Rather than memorizing each mechanism separately, the table makes the pattern visible: every design choice buys a concrete operational benefit and simultaneously introduces a corresponding cost. That symmetry is the real lesson.
The final takeaway is therefore architectural, not merely descriptive. Claude Code maximizes model autonomy inside a rich deterministic harness, and that trade is what lets it feel capable in practice while remaining governable in production. The table serves as a compact summary of that thesis: a map of the recurring design questions, the answers Claude Code chooses, and the price paid for each choice.