Machine-Checkable Termination Guarantees for Bayesian Trust in Multi-Agent Systems - FeynmanWiki

CONTENTS

Bookmark this paper

Save for later reading

Machine-Checkable Termination Guarantees for Bayesian Trust in Multi-Agent Systems

1. Production Agents Act Before Humans Can Inspect

A useful place to begin is with a shift in timing. Many safety discussions implicitly assume that an AI system produces something, a human or another process inspects it, and only then does the world change. That assumption is increasingly false for production agents. Modern agents do not merely draft recommendations; they may initiate payments, modify cloud infrastructure, invoke privileged tools, open tickets, merge code, rotate credentials, schedule jobs, or delegate subtasks to other agents.
The important point is not that every one of these actions is catastrophic. Most are routine. The problem is that they are external effects: once executed, they may create obligations, spend money, mutate state, leak information, trigger downstream workflows, or grant access. In a production setting, the question is no longer only “Did the model say something unsafe?” but “Did an automated transition occur that changed the operational state of the system?”
This changes the governance problem from a slow, retrospective one into a machine-speed control problem. If an agent can call a payment API, terminate a server, or delegate authority in milliseconds, then a human review process that runs minutes later is not a guardrail in the relevant sense. It may be useful for auditing, diagnosis, or remediation, but it is not preventing the transition. By the time the operator notices the anomaly, the action may already have completed, propagated, or become difficult to reverse.
So the governance question shifts:
Post-hoc monitoring: “Did we notice the failure?”
Pre-execution control: “Was this action allowed before it ran?”
Both matter, but they are not interchangeable. Post-hoc monitoring can tell us what happened and support recovery. Pre-execution control decides whether a proposed action is permitted to cross a boundary in the first place. For production agents, that boundary is often the only moment where a system can still cheaply prevent harm.
This lecture is therefore not about proving arbitrary “agent safety” in the broadest behavioral sense. That goal is too large, too underspecified, and usually not machine-checkable. Agents operate in open environments, with incomplete specifications, uncertain observations, and changing tool semantics. A claim such as “the agent will behave safely” is not a bounded technical invariant; it is a broad deployment hope unless it is reduced to precise transition properties.
Instead, the focus is on bounded invariants at governance transition points. A transition point is a place where an agent attempts to move from internal computation to an externally meaningful action: spending budget, calling a tool, assigning trust, escalating privileges, delegating work, or mutating infrastructure. At these points, we can sometimes write down exact conditions that must hold before execution. Those conditions can be checked automatically, and in some cases proved correct in a theorem prover.
This narrower framing is what makes machine-checkable governance possible. We are not asking the system to prove that an agent is wise, honest, aligned, or globally harmless. We are asking whether a particular transition satisfies a formally stated guard. For example: does this financial action preserve a budget invariant? Does this trust-update process terminate once a threshold is inevitably crossed? Does this coordination rule prevent a forbidden transition under the assumptions stated?
That distinction matters because production deployments are outpacing the controls around them. It is easy to add tool access and agent-to-agent delegation faster than we add formal guardrails. But each new capability increases the number of transitions where a model-generated decision can become an operational fact. The governance layer must therefore operate at the same speed as the agent: before the effect, not merely after the log entry.
The visual below compresses this motivation into a single pipeline: a production agent proposes an action, the system reaches a pre-execution gate, and only then does the action either proceed to an external effect or get refused and quarantined. The faded human-review element is deliberately placed after the effect because that is the timing problem: human inspection remains valuable, but it is often too late to serve as the primary safety mechanism.
The central lesson is the banner at the bottom: the goal is machine-checkable bounded invariants at transition points. Everything that follows in the lecture should be read through that lens. We will distinguish proved claims from tested or conjectured ones, and we will prefer precise transition guarantees over sweeping behavioral assurances.

CONTENTS

Bookmark this paper

Save for later reading

Machine-Checkable Termination Guarantees for Bayesian Trust in Multi-Agent Systems

1. Production Agents Act Before Humans Can Inspect

A useful place to begin is with a shift in timing. Many safety discussions implicitly assume that an AI system produces something, a human or another process inspects it, and only then does the world change. That assumption is increasingly false for production agents. Modern agents do not merely draft recommendations; they may initiate payments, modify cloud infrastructure, invoke privileged tools, open tickets, merge code, rotate credentials, schedule jobs, or delegate subtasks to other agents.
The important point is not that every one of these actions is catastrophic. Most are routine. The problem is that they are external effects: once executed, they may create obligations, spend money, mutate state, leak information, trigger downstream workflows, or grant access. In a production setting, the question is no longer only “Did the model say something unsafe?” but “Did an automated transition occur that changed the operational state of the system?”
This changes the governance problem from a slow, retrospective one into a machine-speed control problem. If an agent can call a payment API, terminate a server, or delegate authority in milliseconds, then a human review process that runs minutes later is not a guardrail in the relevant sense. It may be useful for auditing, diagnosis, or remediation, but it is not preventing the transition. By the time the operator notices the anomaly, the action may already have completed, propagated, or become difficult to reverse.
So the governance question shifts:
Post-hoc monitoring: “Did we notice the failure?”
Pre-execution control: “Was this action allowed before it ran?”
Both matter, but they are not interchangeable. Post-hoc monitoring can tell us what happened and support recovery. Pre-execution control decides whether a proposed action is permitted to cross a boundary in the first place. For production agents, that boundary is often the only moment where a system can still cheaply prevent harm.
This lecture is therefore not about proving arbitrary “agent safety” in the broadest behavioral sense. That goal is too large, too underspecified, and usually not machine-checkable. Agents operate in open environments, with incomplete specifications, uncertain observations, and changing tool semantics. A claim such as “the agent will behave safely” is not a bounded technical invariant; it is a broad deployment hope unless it is reduced to precise transition properties.
Instead, the focus is on bounded invariants at governance transition points. A transition point is a place where an agent attempts to move from internal computation to an externally meaningful action: spending budget, calling a tool, assigning trust, escalating privileges, delegating work, or mutating infrastructure. At these points, we can sometimes write down exact conditions that must hold before execution. Those conditions can be checked automatically, and in some cases proved correct in a theorem prover.
This narrower framing is what makes machine-checkable governance possible. We are not asking the system to prove that an agent is wise, honest, aligned, or globally harmless. We are asking whether a particular transition satisfies a formally stated guard. For example: does this financial action preserve a budget invariant? Does this trust-update process terminate once a threshold is inevitably crossed? Does this coordination rule prevent a forbidden transition under the assumptions stated?
That distinction matters because production deployments are outpacing the controls around them. It is easy to add tool access and agent-to-agent delegation faster than we add formal guardrails. But each new capability increases the number of transitions where a model-generated decision can become an operational fact. The governance layer must therefore operate at the same speed as the agent: before the effect, not merely after the log entry.
The visual below compresses this motivation into a single pipeline: a production agent proposes an action, the system reaches a pre-execution gate, and only then does the action either proceed to an external effect or get refused and quarantined. The faded human-review element is deliberately placed after the effect because that is the timing problem: human inspection remains valuable, but it is often too late to serve as the primary safety mechanism.
The central lesson is the banner at the bottom: the goal is machine-checkable bounded invariants at transition points. Everything that follows in the lecture should be read through that lens. We will distinguish proved claims from tested or conjectured ones, and we will prefer precise transition guarantees over sweeping behavioral assurances.

2. Failure Case: Compounding Errors and Specification Boundaries

Once agents are allowed to act at production speed, the core safety problem changes shape. The question is no longer merely “is each agent usually good?” but “what happens when many usually-good agents are coupled together, each making decisions that can trigger downstream actions before a human can intervene?” In that setting, even small local error rates can become large system-level risks.
A simple idealized model makes the danger visible. Suppose a task requires nnn agents or agentic steps to all behave correctly, and suppose each step succeeds independently with probability RRR. Then the reliability of the whole chain is
system reliability=Rn.\text{system reliability}=R^n .system reliability=Rn.
This is intentionally simplified: real agents are not perfectly independent, tasks are not always symmetric, and some failures are recoverable. But the model captures an important monotonic fact. If R<1R<1R<1, then RnR^nRn decreases as nnn grows. Adding more agents, tools, approvals, or handoffs can decrease reliability unless the architecture also adds coordination, containment, or recovery mechanisms.
For example, if each step is 99%99\%99% reliable, then a single step looks excellent. But 505050 all-must-succeed steps have reliability 0.9950≈0.6050.99^{50}\approx 0.6050.9950≈0.605. At 95%95\%95% per step, the same chain gives 0.9550≈0.0770.95^{50}\approx 0.0770.9550≈0.077. The point is not that production systems literally follow this exact formula; the point is that local competence does not compose automatically into global safety.
This is one reason multi-agent governance cannot rest on informal claims like “the agents are aligned,” “the policy says not to overspend,” or “we tested the workflow.” Those statements may be useful engineering evidence, but they are not machine-checkable guarantees about all executions. In a live system, failures can compound through several mechanisms:
handoff ambiguity, where one agent misreads another agent’s intermediate output;
duplicated authority, where multiple agents independently trigger related actions;
delayed feedback, where corrective signals arrive after irreversible effects;
specification drift, where agents optimize locally reasonable goals that conflict globally;
correlated failures, where the same prompt, model weakness, data error, or tool bug affects many agents at once.
The independence model RnR^nRn is actually optimistic in many of these cases. If failures are correlated, then adding agents may not diversify risk at all; it may amplify a shared blind spot. Conversely, coordination layers can improve the situation by preventing every local error from becoming an external action. Centralized planning, layered review, rate limits, budgets, trust thresholds, and typed action contracts all serve the same architectural purpose: they interrupt error propagation before it reaches the world.
But this leads to a crucial specification boundary. We should not pretend to prove arbitrary facts about arbitrary agent behavior. A modern agent is usually a composition of a language model, prompts, tools, memory, retrieval, external APIs, and runtime state. Asking a verifier to decide a nontrivial semantic property of such an arbitrary program quickly runs into the classical Rice-style boundary: in general, nontrivial semantic properties of programs are undecidable. There is no universal procedure that can inspect an arbitrary agent and decide whether it will “always behave safely” in every meaningful future context.
So the paper’s strategy is deliberately narrower and stronger. Instead of claiming unbounded behavioral safety, it proves bounded invariants at explicit governance interfaces. That shift matters. A statement like “the agent will never make a bad decision” is too broad to verify in general. A statement like “this guarded transition cannot execute a financial action that exceeds the remaining budget” is much more precise. It has a defined state, a transition rule, a precondition, and a postcondition.
This is the distinction between trying to verify the whole mind of the agent and verifying the gates through which consequential actions must pass. The paper focuses on boundaries such as:
transition rules, which define when state may change;
thresholds, which determine when trust is sufficient or insufficient;
budgets, which bound financial exposure;
contracts, which specify permitted coordination behavior.
These are not complete guarantees about the moral quality of every model output. They are machine-checkable claims about the behavior of a formalized governance mechanism. That is exactly why they are useful: they remain meaningful even when the agents themselves are messy, learned, probabilistic, or only empirically characterized.
The visual below compresses this motivation into one picture. On the left is the compounding-error intuition: an all-must-succeed chain becomes less reliable as the number of required successful steps grows, summarized by system reliability=Rn\text{system reliability}=R^nsystem reliability=Rn. On the right is the specification boundary: arbitrary agent semantics sit outside what we can generally decide, while the governance boundary contains the smaller objects we can actually verify.
The important takeaway is the contrast. Uncoordinated multi-agent systems can amplify errors, especially when production actions happen faster than human inspection. The paper therefore does not attempt an impossible proof about arbitrary agent behavior. It proves bounded safety properties at the interfaces where actions are authorized, trust is updated, budgets are consumed, and contracts constrain coordination.

3. Claim Map: What Is Proved, Tested, or Assumed?

The failure case we just examined is a useful warning: when agents interact through language, tools, budgets, and other agents, errors do not merely add up locally. They can compound across handoffs, reinterpretations, and retries. That is exactly why this lecture does not try to prove an unrestricted statement like “the multi-agent system behaves safely.” Such a claim would require formalizing too much of the world: user intent, semantic correctness, adversarial context, model behavior, deployment configuration, and every downstream side effect.
Instead, the paper’s strategy is narrower and stronger: identify bounded transition properties that can be stated precisely, checked mechanically, and composed into a governance architecture. A machine-checked theorem is valuable not because it magically covers all behavior, but because it removes ambiguity about a specific claim. If the theorem says a guarded financial transition cannot exceed a budget, then that is what is proved—under the modeled assumptions, for that transition, with the stated preconditions.
This distinction is the organizing principle for the rest of the lecture. Every result should be read with an evidence label. Some claims are proved in Lean 4. Others are tested empirically, implemented as runtime mechanisms, simulated under bounded scenarios, or conjectured because they are specified but not yet closed as theorems. These categories are not interchangeable. A tested latency result is useful, but it is not a formal invariant. An implemented circuit breaker is important engineering infrastructure, but the proof may cover only the abstract transition model, not every line of production code or every deployment condition.
The highest-confidence category is Proved. In this lecture, that means a Lean 4 theorem with no sorry, proving the stated bounded property. Examples include trust termination, budget safety, constrained PoA\mathrm{PoA}PoA, and resource bounds in the RLM setting. The key phrase is the stated bounded property. A proof of trust termination does not prove that an agent will always make good decisions; it proves that, under the formal trust update and threshold assumptions, the modeled process reaches a terminating condition. Likewise, a budget theorem does not prove that all financial behavior is safe in a colloquial sense; it proves that a particular guarded transition refuses actions that would exceed the limit.
A recurring pattern is the pre-execution guard. Before an action mutates state, the system checks whether the action is permitted. For a budget state bbb, action aaa, current committed cost CCC, proposed increment δ\deltaδ, and limit LLL, the transition has the form
applyAction(b,a)={some(b′)C+δ≤L,noneC+δ>L.\mathrm{applyAction}(b,a)=
\begin{cases}
\mathrm{some}(b') & C+\delta \le L,\\
\mathrm{none} & C+\delta > L.
\end{cases}applyAction(b,a)={some(b′)none​C+δ≤L,C+δ>L.​
This small equation captures a major governance idea. The action either produces a new state b′b'b′, or it refuses to execute. The refusal case is not an error-handling afterthought; it is part of the formal transition relation. That matters because safety is enforced before the side effect, not discovered after the budget has already been violated.
Other labels sit lower on the evidence ladder, but they are still meaningful when interpreted correctly. Tested claims refer to empirical validation of an implementation, such as latency measurements or behavior under adversarial blocking. Implemented claims mean the runtime mechanism exists—for example, a circuit breaker with states like CLOSED\mathrm{CLOSED}CLOSED, OPEN\mathrm{OPEN}OPEN, HALF\mbox−OPEN\mathrm{HALF\mbox{-}OPEN}HALF\mbox−OPEN, and TERMINATED\mathrm{TERMINATED}TERMINATED—but the proof usually applies to the modeled transition, not automatically to every implementation detail. Simulated claims explore bounded scenarios, such as coordination outcomes under B=25nB=25nB=25n. Conjectured claims are specified or axiomatized but not yet discharged as closed theorems, such as isolation properties involving canAccess(d,p,pos)\mathrm{canAccess}(d,p,pos)canAccess(d,p,pos).
This classification also prevents a common failure mode in reading formal-methods papers: treating a formal component as if it sanctifies the entire system. A theorem can be perfectly correct and still leave important deployment assumptions outside the proof boundary. For example:
the implementation must faithfully realize the modeled transition;
the guard must run before the side effect;
the parameters CCC, δ\deltaδ, and LLL must represent the relevant resource accurately;
external services must not bypass the guarded path;
semantic notions like “harmful,” “misleading,” or “properly aligned” may remain outside the formal model.
That is not a weakness of the approach; it is the discipline of the approach. The paper’s contribution is not an all-purpose guarantee about agent morality or semantic correctness. It is a framework for separating what is machine-checked, what is empirically observed, what is implemented, and what remains an assumption or conjecture. Once those boundaries are explicit, later results become easier to trust because they are harder to overclaim.
The visual below condenses this evidence discipline into a claim map. The table separates proof-backed results from tested, implemented, simulated, and conjectured claims, while keeping examples attached to each category. The important thing to notice is that the labels are not decorative; they control how strongly we are allowed to interpret each result.
The guarded transition equation beneath the table is the bridge into the next technical sections. It previews the pattern we will reuse for both budget safety and trust termination: formal guarantees attach to sharply defined state transitions. Runtime governance becomes machine-checkable only when the relevant decision point can be expressed as a bounded invariant—success when the guard permits it, refusal when the guard would be violated.

4. Beta Trust State and Circuit Breaker

Having separated proved claims from tested and assumed ones, we can now look at the smallest trust mechanism that is still strong enough to support a machine-checkable governance argument. The important move is to stop treating “trust” as an open-ended behavioral prediction and instead make it a bounded state variable that controls whether an action is allowed to proceed. In other words, trust is not merely an annotation on an agent; it becomes part of the transition system.
The state we track is intentionally simple: two natural-number counters and a threshold. Let α\alphaα count observed successes and β\betaβ count observed failures. From these counters we compute a Laplace-smoothed Beta trust score
T(α,β)=α+1α+β+2.T(\alpha,\beta)=\frac{\alpha+1}{\alpha+\beta+2}.T(α,β)=α+β+2α+1​.
This is the posterior mean of a Bernoulli success probability under a uniform Beta(1,1)\mathrm{Beta}(1,1)Beta(1,1) prior, after observing α\alphaα successes and β\betaβ failures. The smoothing matters operationally: before any observations, the trust score is
T(0,0)=12,T(0,0)=\frac{1}{2},T(0,0)=21​,
rather than being undefined or forced to an extreme value. That makes the mechanism well behaved at initialization, which is exactly the kind of edge case that formal verification tends to expose.
For finite natural-number counters, the score is always strictly between 000 and 111:
0<T(α,β)<1.0<T(\alpha,\beta)<1.0<T(α,β)<1.
The lower bound follows because α+1>0\alpha+1>0α+1>0. The upper bound follows because
α+1<α+β+2\alpha+1<\alpha+\beta+2α+1<α+β+2
for every β∈N\beta\in\mathbb{N}β∈N. This may look mathematically trivial, but it is structurally important. It means the trust value lives in a known bounded interval, so comparisons against a threshold θ\thetaθ can be reasoned about without hidden exceptional cases such as division by zero, negative probabilities, or uninitialized scores.
The governance mechanism then wraps this score in a circuit breaker. The system has states such as
CLOSED,OPEN,HALF\mbox−OPEN,TERMINATED.\mathrm{CLOSED},\quad \mathrm{OPEN},\quad \mathrm{HALF\mbox{-}OPEN},\quad \mathrm{TERMINATED}.CLOSED,OPEN,HALF\mbox−OPEN,TERMINATED.
The naming follows the usual circuit-breaker intuition: when the breaker is CLOSED, execution is permitted; when it is OPEN, execution is blocked. The key transition is the pre-execution guard
CLOSED→  T(α,β)<θ  OPEN.\mathrm{CLOSED}\xrightarrow{\;T(\alpha,\beta)<\theta\;}\mathrm{OPEN}.CLOSEDT(α,β)<θ​OPEN.
So the trust mechanism is not saying, “this agent is unsafe in all possible futures.” It is saying something narrower and more checkable: given the current finite counters, if the smoothed trust score falls below the configured threshold, the state machine must move from an execution-permitting state to a blocking state.
This distinction is central to the paper’s claim-classification discipline. A broad behavioral safety claim would require assumptions about future environments, adversarial behavior, model generalization, and deployment context. By contrast, the circuit-breaker property is a local invariant about a transition rule. It can be encoded as arithmetic over counters and state constructors, which is much closer to what a proof assistant can verify.
There is also an operational asymmetry built into the Beta score. A single failure does not necessarily cause an immediate shutdown, especially if the threshold is modest. For example, with θ=0.3\theta=0.3θ=0.3, the system can tolerate occasional noise or one-off mistakes. But repeated failures increase β\betaβ, enlarging the denominator while leaving the numerator relatively small, so
T(α,β)=α+1α+β+2T(\alpha,\beta)=\frac{\alpha+1}{\alpha+\beta+2}T(α,β)=α+β+2α+1​
moves downward as failures accumulate. This gives the policy a useful shape: it is not brittle under isolated errors, but it is responsive to sustained degradation.
That responsiveness depends on a few assumptions that should be made explicit. The counters must faithfully reflect the events the governance layer cares about; otherwise the formal guarantee applies to the logged abstraction, not to reality. The threshold θ\thetaθ must be chosen within the same semantic scale as the trust score, typically 0<θ<10<\theta<10<θ<1. And the action gate must actually consult the circuit-breaker state before execution. If an implementation bypasses the state machine, the arithmetic proof remains true but no longer protects the deployed system.
The visual below compactly organizes these ideas into two halves. On the left, the Beta trust score is shown as a bounded computation from success and failure counters. On the right, that numeric score is connected to the circuit-breaker transition that matters for governance: once T(α,β)<θT(\alpha,\beta)<\thetaT(α,β)<θ, the system moves from CLOSED to OPEN.
The point of the diagram is not merely to define notation. It emphasizes the architectural pattern: trust is computed before execution, compared against a threshold, and then converted into a discrete control state. That conversion—from a smooth Bayesian estimate to a finite machine state—is what makes the later termination and safety arguments amenable to Lean.

5. Lean Encoding: Remove Rational Arithmetic

The circuit breaker from the previous section gives us the operational story: when the Bayesian trust estimate drops below a threshold, the system should move from CLOSED\mathrm{CLOSED}CLOSED to OPEN\mathrm{OPEN}OPEN. But if we want this to be more than an informal design rule, we need the guard to be stated in a form that a proof assistant can check reliably. The key move is to avoid asking Lean to reason about floating-point or rational-valued trust scores directly. Instead, we translate the threshold comparison into a pure natural-number inequality.
Recall that the Beta trust score is the posterior mean of a Beta distribution with one unit of prior mass on each outcome:
T(α,β)=α+1α+β+2.T(\alpha,\beta)=\frac{\alpha+1}{\alpha+\beta+2}.T(α,β)=α+β+2α+1​.
Here α\alphaα counts successful or trustworthy observations, and β\betaβ counts failures or untrustworthy observations. The +1+1+1 and +2+2+2 terms come from the Beta prior; they also guarantee that the denominator is always positive. That positivity is not a cosmetic detail. It is exactly what makes the algebraic transformation below sound without having to reason about division by zero or sign changes.
The threshold is encoded as a rational number p/qp/qp/q, with the assumption
0<p<q.0 < p < q.0<p<q.
This means the threshold lies strictly between 000 and 111, which is the natural range for a trust probability. Rather than compare
α+1α+β+2<pq,\frac{\alpha+1}{\alpha+\beta+2} < \frac{p}{q},α+β+2α+1​<qp​,
we define the numerator and denominator components separately:
trustNum(α)=α+1,trustDen(α,β)=α+β+2.\mathrm{trustNum}(\alpha)=\alpha+1,
\qquad
\mathrm{trustDen}(\alpha,\beta)=\alpha+\beta+2.trustNum(α)=α+1,trustDen(α,β)=α+β+2.
Since q>0q>0q>0 and trustDen(α,β)>0\mathrm{trustDen}(\alpha,\beta)>0trustDen(α,β)>0, we can safely cross-multiply:
T(α,β)<pq  ⟺  trustNum(α)q<p trustDen(α,β).T(\alpha,\beta)<\frac{p}{q}
\iff
\mathrm{trustNum}(\alpha)q < p\,\mathrm{trustDen}(\alpha,\beta).T(α,β)<qp​⟺trustNum(α)q<ptrustDen(α,β).
Substituting the definitions gives the machine-checkable predicate
trustBelowThreshold(α,β,p,q)≡(α+1)q<p(α+β+2).\mathrm{trustBelowThreshold}(\alpha,\beta,p,q)
\equiv
(\alpha+1)q < p(\alpha+\beta+2).trustBelowThreshold(α,β,p,q)≡(α+1)q<p(α+β+2).
This is the central encoding trick: the trust guard is no longer a statement about rational arithmetic. It is a statement about multiplication and strict inequality over natural numbers.
That matters because theorem provers are much better behaved when the specification avoids unnecessary numeric domains. Rational arithmetic brings extra proof obligations: denominators must be nonzero, inequalities may require normalization, coercions between N\mathbb{N}N, Z\mathbb{Z}Z, and Q\mathbb{Q}Q can obscure the intended argument, and simplification may depend on lemmas that are not automatically applied. By contrast, the natural-number inequality exposes exactly the invariant we care about: the weighted numerator of trust is smaller than the weighted threshold denominator.
There are also important failure modes this encoding prevents. A production implementation might be tempted to compute T(α,β)T(\alpha,\beta)T(α,β) as a floating-point number and compare it to a decimal threshold. That introduces rounding behavior into the safety boundary. A trust score extremely close to the threshold can be classified differently depending on precision, language runtime, or serialization format. The Lean predicate avoids this entirely: the guard is an exact arithmetic statement.
The assumptions should be kept explicit:
q>0q>0q>0 is required so multiplying by qqq preserves inequality.
α+β+2>0\alpha+\beta+2>0α+β+2>0 is guaranteed by the Beta prior and natural-number counts.
0<p<q0<p<q0<p<q ensures the threshold is a meaningful probability threshold, not an arbitrary rational outside the trust range.
Strict inequality matters: the breaker opens only when trust is below the threshold, not merely equal to it.
The payoff is that the circuit-breaker transition
CLOSED→OPEN\mathrm{CLOSED}\to\mathrm{OPEN}CLOSED→OPEN
can be guarded by a predicate Lean can check using ordinary arithmetic over natural numbers. This is a small encoding choice with a large verification consequence: it turns a probabilistic-looking condition into a discrete invariant suitable for mechanized proof.
The visual below condenses this transformation into the core identity. The left side is the conceptual trust comparison, T(α,β)<p/qT(\alpha,\beta)<p/qT(α,β)<p/q. The right side is the proof-engineering form, trustNum(α)q<p trustDen(α,β)\mathrm{trustNum}(\alpha)q < p\,\mathrm{trustDen}(\alpha,\beta)trustNum(α)q<ptrustDen(α,β). The definitions of trustNum\mathrm{trustNum}trustNum and trustDen\mathrm{trustDen}trustDen act as the bridge between the Bayesian model and the natural-number predicate.
The final predicate box represents the form that actually belongs in the Lean development:
trustBelowThreshold(α,β,p,q)≡(α+1)q<p(α+β+2).\mathrm{trustBelowThreshold}(\alpha,\beta,p,q)
\equiv
(\alpha+1)q < p(\alpha+\beta+2).trustBelowThreshold(α,β,p,q)≡(α+1)q<p(α+β+2).
Once the guard has this shape, the CLOSED→OPEN\mathrm{CLOSED}\to\mathrm{OPEN}CLOSED→OPEN transition is no longer justified by an informal rational comparison; it is justified by an exact, machine-checkable inequality.

6. Two Ingredients: Degradation and Reachability

After eliminating rational arithmetic from the Lean encoding, the termination argument has a much sharper shape. We are no longer trying to reason directly about real-valued trust scores or floating-point comparisons. Instead, every claim must be reducible to inequalities over natural numbers. That shift is not just a proof-engineering convenience: it is what makes the guarantee machine-checkable in a small, auditable fragment of arithmetic.
Recall the Bayesian trust score has the form
T(α,β)=α+1α+β+2,T(\alpha,\beta)=\frac{\alpha+1}{\alpha+\beta+2},T(α,β)=α+β+2α+1​,
where α\alphaα counts successful observations and β\betaβ counts failures. In the termination setting, we ask what happens when an agent keeps failing while successes stop accumulating. Intuitively, trust should eventually fall below any fixed rational threshold p/qp/qp/q, assuming 0<p<q0<p<q0<p<q. But intuition is not enough for a verified termination theorem. We need two precise facts:
each additional failure makes trust strictly smaller;
sufficiently many failures can push trust below the threshold.
The first fact is monotonic degradation. Holding α\alphaα fixed, increasing β\betaβ by one increases only the denominator of the trust fraction:
T(α,β+1)<T(α,β).T(\alpha,\beta+1)<T(\alpha,\beta).T(α,β+1)<T(α,β).
Substituting the definition of TTT, this is
α+1α+β+3<α+1α+β+2.\frac{\alpha+1}{\alpha+\beta+3}
<
\frac{\alpha+1}{\alpha+\beta+2}.α+β+3α+1​<α+β+2α+1​.
Over the rationals, this is visually obvious: same positive numerator, larger denominator. But in the Lean proof we want to avoid rational division, so we cross-multiply into a natural-number inequality:
(α+1)(α+β+2)<(α+1)(α+β+3).(\alpha+1)(\alpha+\beta+2)
<
(\alpha+1)(\alpha+\beta+3).(α+1)(α+β+2)<(α+1)(α+β+3).
This inequality is accepted for exactly the reason we expect: α+1>0\alpha+1>0α+1>0, and
α+β+2<α+β+3.\alpha+\beta+2 < \alpha+\beta+3.α+β+2<α+β+3.
The subtle point is that strictness matters. If the numerator could be zero, multiplying both sides might fail to preserve a strict increase. But α+1\alpha+1α+1 is always positive for natural α\alphaα, so every additional failure strictly decreases trust. There is no plateau and no rounding artifact hiding inside the model.
The second fact is threshold reachability. Monotone degradation alone says trust keeps going down, but it does not by itself construct the number of failures needed to cross a particular threshold. For termination, we need an explicit witness: given a rational threshold p/qp/qp/q with 0<p<q0<p<q0<p<q, produce some β∗\beta^*β∗ such that
trustBelowThreshold(α,β∗,p,q)\mathrm{trustBelowThreshold}(\alpha,\beta^*,p,q)trustBelowThreshold(α,β∗,p,q)
holds.
A simple constructive choice is
β∗=q(α+1).\beta^* = q(\alpha+1).β∗=q(α+1).
This may not be the smallest possible number of failures, but minimality is irrelevant for the termination theorem. What matters is that it is easy to verify using only natural-number inequalities. The trust-below-threshold comparison is
α+1α+β∗+2<pq,\frac{\alpha+1}{\alpha+\beta^*+2} < \frac{p}{q},α+β∗+2α+1​<qp​,
which becomes, after cross-multiplication,
(α+1)q<p(α+β∗+2).(\alpha+1)q < p(\alpha+\beta^*+2).(α+1)q<p(α+β∗+2).
With β∗=q(α+1)\beta^*=q(\alpha+1)β∗=q(α+1), the left side is dominated by the denominator term:
(α+1)q<α+q(α+1)+2.(\alpha+1)q
<
\alpha+q(\alpha+1)+2.(α+1)q<α+q(α+1)+2.
And since p>0p>0p>0, multiplying the positive denominator-like term by ppp cannot make it smaller. Thus we get the chain
(α+1)q<α+q(α+1)+2≤p(α+q(α+1)+2)=p(α+β∗+2).(\alpha+1)q
<
\alpha+q(\alpha+1)+2
\le
p\bigl(\alpha+q(\alpha+1)+2\bigr)
=
p(\alpha+\beta^*+2).(α+1)q<α+q(α+1)+2≤p(α+q(α+1)+2)=p(α+β∗+2).
This proves the desired threshold condition:
trustBelowThreshold(α,β∗,p,q).\mathrm{trustBelowThreshold}(\alpha,\beta^*,p,q).trustBelowThreshold(α,β∗,p,q).
The important modeling assumption here is that failures can continue to accumulate while α\alphaα remains fixed. If successes can interleave arbitrarily, the proof must reason about the balance of successes and failures instead. This section isolates the pure failure case: once an agent is in a sustained-failure regime, the trust score is guaranteed to move downward and eventually cross any rational cutoff below 111.
These two ingredients play different roles in the final termination theorem. Degradation gives the local step: one more failure strictly lowers trust. Reachability gives the global witness: there exists a finite failure count that puts trust below the cutoff. Together, they convert a behavioral statement — “continued bad performance should eventually terminate trust” — into a bounded arithmetic fact over natural numbers.
The visual below condenses this structure into two columns. The degradation side tracks the single-step decrease T(α,β+1)<T(α,β)T(\alpha,\beta+1)<T(\alpha,\beta)T(α,β+1)<T(α,β) and its cross-multiplied natural-number form. The reachability side highlights the constructive witness β∗=q(α+1)\beta^*=q(\alpha+1)β∗=q(α+1), then follows the inequality chain that proves the threshold can be crossed.
The point of the picture is not to add a new argument, but to separate the proof obligations cleanly. One lemma says the trust curve moves in the right direction after every failure; the other says the curve cannot stay above a rational threshold forever. Those are precisely the arithmetic lemmas the next theorem will assemble into guaranteed trust termination.

7. Theorem: Guaranteed Trust Termination

With degradation and reachability in place, the informal safety story can now be sharpened into a termination statement. The key question is not merely whether trust decreases after failures. It is whether repeated failures are guaranteed to eventually force the trust estimate below any valid intervention threshold. That distinction matters in production governance: a system that “usually” lowers trust is not enough if some highly trusted agent can remain above threshold forever because of a large prior success history.
The trust model here is the Laplace-smoothed Beta estimate
T(α,β)=α+1α+β+2,T(\alpha,\beta)=\frac{\alpha+1}{\alpha+\beta+2},T(α,β)=α+β+2α+1​,
where α\alphaα counts observed successes and β\betaβ counts observed failures. The +1+1+1 and +2+2+2 terms are the usual smoothing terms: before observing anything, the estimate is 1/21/21/2, rather than being undefined or extreme. Operationally, α\alphaα is the accumulated evidence in favor of the agent, while β\betaβ is accumulated evidence against it.
Now suppose the agent has some finite current state (α,β0)(\alpha,\beta_0)(α,β0​). We then observe a run of consecutive failures, so the failure counter becomes β0+n\beta_0+nβ0​+n. The trust value after those failures is
T(α,β0+n)=α+1α+β0+n+2.T(\alpha,\beta_0+n)
=
\frac{\alpha+1}{\alpha+\beta_0+n+2}.T(α,β0​+n)=α+β0​+n+2α+1​.
The numerator stays fixed because no new successes are occurring. The denominator grows without bound as nnn increases. Intuitively, every additional failure dilutes the fixed evidence of past success. Even if α\alphaα is very large, it is still finite, and a sufficiently long run of failures eventually overwhelms it.
The theorem states this as a reachability guarantee:
Theorem.∀α,β0,p,q.  0<p<q⇒∃n.  T(α,β0+n)<pq.\textbf{Theorem.}\quad
\forall \alpha,\beta_0,p,q.\;0<p<q
\Rightarrow
\exists n.\;T(\alpha,\beta_0+n)<\frac{p}{q}.Theorem.∀α,β0​,p,q.0<p<q⇒∃n.T(α,β0​+n)<qp​.
Here p/qp/qp/q is a rational representation of the shutdown threshold θ\thetaθ, with 0<p<q0<p<q0<p<q, so the threshold lies strictly between 000 and 111. The theorem says: for any finite starting trust counters, and for any valid positive threshold below one, there exists some finite number of future consecutive failures that pushes trust below that threshold.
This is stronger than saying trust merely “tends toward zero.” A limit statement such as
lim⁡n→∞T(α,β0+n)=0\lim_{n\to\infty} T(\alpha,\beta_0+n)=0n→∞lim​T(α,β0​+n)=0
captures the asymptotic intuition, but a termination argument needs a finite witness. The theorem gives exactly that: an nnn after which the guard condition is crossed. In a machine-checked setting, this matters because the circuit breaker is not triggered by a vague trend; it is triggered by a concrete comparison becoming true.
The production consequence is simple but important: no finite prior success count gives permanent immunity. An agent may have an excellent history, encoded by a large α\alphaα, but if it begins failing persistently, the model guarantees that it will eventually fall below any fixed rational threshold. A high prior reputation can delay intervention, but it cannot prevent intervention forever.
There are also assumptions worth keeping explicit. The theorem relies on the specific Beta trust update rule above, including Laplace smoothing. It also assumes sustained observed failures: α\alphaα is held fixed while β\betaβ increases. If successes and failures are interleaved, the theorem does not directly say that trust must cross the threshold, because successes can raise or stabilize the estimate. And if the threshold is invalid—for example, zero or at least one—the statement changes: below zero is impossible, while thresholds at or above one are trivially different from the intended shutdown policy.
In the governance architecture, this mathematical crossing connects directly to the circuit-breaker transition. Once
T(α,β)<θ,T(\alpha,\beta) < \theta,T(α,β)<θ,
the agent’s state can move from CLOSED\mathrm{CLOSED}CLOSED to OPEN\mathrm{OPEN}OPEN, disabling or blocking further risky action. The theorem therefore turns repeated failures into a bounded, machine-checkable termination guarantee: not “the agent is probably unsafe,” but “after finitely many failures, the formal guard must become true.”
The visual below condenses this theorem into its operational shape. The central formula captures the quantified guarantee: for every finite starting state and every rational threshold p/qp/qp/q, there exists a finite failure count nnn that makes trust fall below the threshold. The side annotations separate the mathematical roles of α\alphaα, β0\beta_0β0​, p/qp/qp/q, and nnn from their system meaning.
The circuit-breaker cue is the practical payoff. The inequality is not just an abstract comparison; it is the condition that moves the system from normal operation to intervention. Read this theorem as the bridge between Bayesian evidence accumulation and enforceable governance: sustained failures cannot be hidden behind a finite stockpile of past successes.

8. Proof: Guaranteed Trust Termination

Having stated the termination theorem, the next question is whether it is merely intuitively true or whether we can exhibit a concrete witness that a proof assistant can check. The important shift is from saying “eventually enough failures will reduce trust” to saying: here is an explicit number of future failures nnn, and here is the inequality chain proving that this nnn is sufficient.
Recall the trust score has the Beta-posterior mean form
T(α,β)=α+1α+β+2,T(\alpha,\beta)=\frac{\alpha+1}{\alpha+\beta+2},T(α,β)=α+β+2α+1​,
where α\alphaα counts successes and β\betaβ counts failures, with the +1,+2+1,+2+1,+2 terms coming from the prior. In the termination setting, α\alphaα is held fixed: we are asking what happens if an agent accumulates failures without compensating successes. This is exactly the circuit-breaker regime: once the system is observing repeated adverse outcomes, the failure counter grows, and trust should eventually fall below a rational threshold p/qp/qp/q.
The threshold assumption is
0<p<q,0<p<q,0<p<q,
so p/qp/qp/q is a proper trust cutoff below 111. That matters because the initial Beta trust score is always positive and often may be close to 111 when failures are few. A termination guarantee must therefore prove not just that trust decreases, but that it decreases far enough to cross an arbitrary proper threshold.
The constructive trick is to choose a deliberately large failure count:
β∗=q(α+1),n=β∗.\beta^*=q(\alpha+1),
\qquad
n=\beta^*.β∗=q(α+1),n=β∗.
This β∗\beta^*β∗ is not necessarily the smallest number of failures needed. It is a sufficient bound, chosen because it makes the arithmetic easy and machine-checkable. In formal verification, this is often the right tradeoff: we prefer a simple witness with a short proof over a tight witness that requires delicate algebra.
From the earlier reachability lemma, this choice satisfies
T(α,β∗)<pq.T(\alpha,\beta^*)<\frac{p}{q}.T(α,β∗)<qp​.
Intuitively, β∗\beta^*β∗ makes the denominator large enough compared with the fixed numerator α+1\alpha+1α+1. Since qqq is the denominator of the threshold, choosing failures proportional to q(α+1)q(\alpha+1)q(α+1) guarantees that the posterior mean can be pushed below p/qp/qp/q. This is the core reachability fact: for any fixed success history and any proper rational threshold, there exists a failure count that drives trust below the cutoff.
But the theorem we need is slightly stronger than the reachability lemma. We are not starting from zero failures; the agent may already have some existing failure count β0\beta_0β0​. After choosing n=β∗n=\beta^*n=β∗ future failures, the actual failure counter becomes
β0+n=β0+β∗.\beta_0+n=\beta_0+\beta^*.β0​+n=β0​+β∗.
Since β0≥0\beta_0\ge 0β0​≥0, this is at least β∗\beta^*β∗. Therefore the denominator of the actual trust score is at least as large as the denominator in the reachability lemma:
α+β0+β∗+2≥α+β∗+2.\alpha+\beta_0+\beta^*+2 \ge \alpha+\beta^*+2.α+β0​+β∗+2≥α+β∗+2.
Now the numerator is unchanged: it is still α+1\alpha+1α+1. With a fixed positive numerator, increasing the denominator can only decrease the fraction. Thus
T(α,β0+n)=α+1α+β0+β∗+2≤α+1α+β∗+2=T(α,β∗).T(\alpha,\beta_0+n)
=
\frac{\alpha+1}{\alpha+\beta_0+\beta^*+2}
\le
\frac{\alpha+1}{\alpha+\beta^*+2}
=
T(\alpha,\beta^*).T(α,β0​+n)=α+β0​+β∗+2α+1​≤α+β∗+2α+1​=T(α,β∗).
Combining this monotonicity step with reachability gives the final chained inequality:
T(α,β0+n)≤T(α,β∗)<pq.T(\alpha,\beta_0+n)
\le
T(\alpha,\beta^*)
<
\frac{p}{q}.T(α,β0​+n)≤T(α,β∗)<qp​.
So we have proved the desired existential statement:
∃n.  T(α,β0+n)<pq.\exists n.\;T(\alpha,\beta_0+n)<\frac{p}{q}.∃n.T(α,β0​+n)<qp​.
The subtle but important point is that the proof does not rely on simulation, asymptotics, or an informal appeal to “eventually.” It constructs a specific nnn, namely n=β∗n=\beta^*n=β∗, and then proves that even if the agent already had prior failures β0\beta_0β0​, those failures only help termination by further increasing the denominator. This is why the theorem is robust to the current failure state.
This is also why the result fits the paper’s broader claim-classification discipline. The proved claim is not “the agent will behave safely forever” or “trust estimates are semantically perfect.” The proved claim is narrower and stronger: under the specified update model, if failures continue while successes do not increase, then the trust score must cross the threshold after a bounded number of failures, and that bound is explicitly checkable.
The visual below compresses the proof into the same three moves: first choose the constructive witness β∗=q(α+1)\beta^*=q(\alpha+1)β∗=q(α+1), then import the reachability inequality T(α,β∗)<p/qT(\alpha,\beta^*)<p/qT(α,β∗)<p/q, and finally use monotonicity in the failure counter to transfer that inequality to the actual state β0+n\beta_0+nβ0​+n.
Read the proof ladder from top to bottom: more failures mean a larger denominator; a larger denominator with the same numerator means no larger trust; and no larger trust than an already-below-threshold value is itself below threshold. That ladder is the machine-checkable heart of guaranteed trust termination.

9. Algorithm: Trust Circuit-Breaker Update

Having proved that sustained failures must eventually drive Bayesian trust below a rational threshold, the next question is deliberately mundane: what does the runtime system actually do with that theorem? A termination guarantee is only useful in production governance if it becomes a small, auditable transition rule—something that can run before the next risky action, be checked by integer arithmetic, and fail closed when the trust condition is violated.
The trust circuit-breaker is exactly that operational form. The agent or service maintains a Beta-style evidence state (α,β)(\alpha,\beta)(α,β), where α\alphaα counts observed successes and β\betaβ counts observed failures. Rather than trying to certify a broad behavioral claim like “this agent is safe,” the system maintains a narrow invariant: after each observation, recompute the trust score and decide whether the breaker is CLOSED or OPEN.
The score is the posterior mean under a Beta(α+1,β+1)\mathrm{Beta}(\alpha+1,\beta+1)Beta(α+1,β+1) convention:
T(α,β)=α+1α+β+2.T(\alpha,\beta)=\frac{\alpha+1}{\alpha+\beta+2}.T(α,β)=α+β+2α+1​.
The +1,+1+1,+1+1,+1 terms matter. They encode a uniform prior, avoiding degenerate behavior when no evidence has been observed yet. With α=β=0\alpha=\beta=0α=β=0, the score begins at 1/21/21/2, not at an undefined value. Each success increases α\alphaα; each failure increases β\betaβ. The circuit-breaker then compares the updated score against a policy threshold θ\thetaθ.
In mathematical prose, the update is simple:
if the outcome ooo is a success, move from (α,β)(\alpha,\beta)(α,β) to (α+1,β)(\alpha+1,\beta)(α+1,β);
otherwise, move to (α,β+1)(\alpha,\beta+1)(α,β+1);
compute T(α,β)T(\alpha,\beta)T(α,β);
set the state to OPEN when T(α,β)<θT(\alpha,\beta)<\thetaT(α,β)<θ, and CLOSED otherwise.
The subtle but important implementation detail is that the runtime does not need floating-point division. If the threshold is represented as a rational number θ=p/q\theta=p/qθ=p/q, with q>0q>0q>0, then
α+1α+β+2<pq\frac{\alpha+1}{\alpha+\beta+2} < \frac{p}{q}α+β+2α+1​<qp​
is equivalent to the cross-multiplied integer comparison
(α+1)q<p(α+β+2).(\alpha+1)q < p(\alpha+\beta+2).(α+1)q<p(α+β+2).
In the machine-checked version, this appears as a predicate of the form
trustBelowThreshold(α,β,p,q):trustNum(α) q<p trustDen(α,β).\mathrm{trustBelowThreshold}(\alpha,\beta,p,q):
\quad
\mathrm{trustNum}(\alpha)\,q
<
p\,\mathrm{trustDen}(\alpha,\beta).trustBelowThreshold(α,β,p,q):trustNum(α)q<ptrustDen(α,β).
This is the point where the proof becomes an engineering artifact. The theorem tells us that for any finite amount of prior success, repeated failures eventually make the inequality true. The algorithm turns that theorem into a state-machine guard: once the predicate holds, the breaker opens. Finite historical success can delay opening, but it cannot permanently mask an unbounded run of failures.
There are also important boundaries on what this algorithm claims. It does not prove that the observations are correctly labeled, that the environment is stationary, or that the threshold θ\thetaθ is the right policy choice. It proves a narrower and more valuable production property: given the observed success/failure stream and the specified threshold, the transition to OPEN is mechanically determined and guaranteed under sustained failures. That is the kind of claim that can be embedded in a governance layer without pretending to solve all of behavioral safety.
The visual below compresses this into the runtime pattern: update one Beta counter, evaluate the rational trust predicate, and transition the breaker state. The pseudocode box emphasizes that there is no hidden model inference step at enforcement time; the guard is just a small deterministic transition over (α,β,state)(\alpha,\beta,\mathrm{state})(α,β,state).
The two-state diagram reinforces the operational meaning of the theorem. CLOSED is the normal execution state, while OPEN is the termination or intervention state. The downward transition labeled T(α,β)<θT(\alpha,\beta)<\thetaT(α,β)<θ is the machine-checkable boundary: once the integer predicate is true, the system stops trusting the agent enough to continue unrestricted execution.

10. Algorithm: Guarded Budget Transition

The trust circuit-breaker gives us one kind of machine-checkable control: when evidence degrades far enough, the system stops trusting an agent before the trust score can be used to justify more risky behavior. Budget safety is the same design pattern applied to money. Instead of claiming that agents will “act responsibly” with shared resources, we define a total guarded transition: every attempted financial action either produces a next budget state that satisfies the invariant, or produces no next state at all.
Let the current budget state be bbb, and let the proposed action be aaa. We write
C=b.currentSpent,L=b.aggregateLimit,δ=a.delta,δ≥0.C=b.\mathrm{currentSpent},\qquad
L=b.\mathrm{aggregateLimit},\qquad
\delta=a.\mathrm{delta},\qquad
\delta\ge 0.C=b.currentSpent,L=b.aggregateLimit,δ=a.delta,δ≥0.
Here CCC is the amount already spent, LLL is the maximum aggregate spend allowed, and δ\deltaδ is the additional spend requested by the action. The nonnegativity assumption δ≥0\delta\ge 0δ≥0 is important: this transition is modeling spending increments, not refunds or compensating credits. If negative deltas were allowed, the monotonic structure of the proof would change, because an action could lower the aggregate and then enable later spending in ways that require a richer accounting model.
The core algorithm is deliberately simple. Before executing the action, the system checks whether the proposed post-state would satisfy the budget limit:
C+δ≤L.C+\delta\le L.C+δ≤L.
If the guard is true, the transition returns a new state b′b'b′ with updated spending:
b′.currentSpent=C+δ,b′.aggregateLimit=L.b'.\mathrm{currentSpent}=C+\delta,
\qquad
b'.\mathrm{aggregateLimit}=L.b′.currentSpent=C+δ,b′.aggregateLimit=L.
If the guard is false, the transition returns none\mathrm{none}none. This is the key safety move: unsafe actions do not produce “bad states” that later components must clean up. They fail to produce a next state at all.
Formally, the transition has the shape
applyAction(b,a):Option(BudgetState).\mathrm{applyAction}(b,a) : \mathrm{Option}(\mathrm{BudgetState}).applyAction(b,a):Option(BudgetState).
The use of an option type is not cosmetic. It forces callers, implementations, and proofs to distinguish two cases:
some(b′)\mathrm{some}(b')some(b′): the action was admitted, and there is a valid post-state;
none\mathrm{none}none: the action was rejected before execution.
That means the central fact we want from the algorithm is conditional on successful transition:
applyAction(b,a)=some(b′)⟹b′.currentSpent=C+δ ∧ b′.aggregateLimit=L ∧ C+δ≤L.\mathrm{applyAction}(b,a)=\mathrm{some}(b')
\Longrightarrow
b'.\mathrm{currentSpent}=C+\delta
\ \wedge\
b'.\mathrm{aggregateLimit}=L
\ \wedge\
C+\delta\le L.applyAction(b,a)=some(b′)⟹b′.currentSpent=C+δ ∧ b′.aggregateLimit=L ∧ C+δ≤L.
This implication is what makes the next theorem almost inevitable. The algorithm is not trying to infer safety after the fact; it constructs success only through the safety guard. In proof terms, the guard condition C+δ≤LC+\delta\le LC+δ≤L is not an external assumption we hope remains true. It is part of the branch condition that witnesses the existence of b′b'b′.
There is also an important systems lesson here. In a production multi-agent system, a budget check that is separated from the update can be vulnerable to a check/use race. Two agents might both observe that enough budget remains, both pass their local checks, and then both commit updates that jointly exceed the limit. The guarded transition should therefore be implemented as an atomic conditional update: the same operation that checks C+δ≤LC+\delta\le LC+δ≤L also commits the new value C+δC+\deltaC+δ, or commits nothing.
This is why the budget invariant is stronger than a behavioral promise. We are not saying “agents should avoid overspending.” We are saying the state transition relation has no successful edge into an overspent state, assuming the implementation faithfully mirrors the atomic guarded update. The proof obligation then becomes crisp: inspect the definition of applyAction\mathrm{applyAction}applyAction, split on the guard, and observe that only the guarded branch can return some(b′)\mathrm{some}(b')some(b′).
The visual below is a compact version of this reasoning. The pseudocode box emphasizes that the guard is evaluated before constructing the next state. The green path corresponds to the only successful transition, where C+δ≤LC+\delta\le LC+δ≤L is available as evidence; the red path corresponds to rejection, where the system returns none\mathrm{none}none rather than manufacturing an unsafe state.
The small implementation cue about an atomic conditional update is equally important. The mathematical transition and the runtime mechanism must line up: the proof is about a guarded state transition, so the deployed system must not split the guard and the write into separately interleavable operations. That alignment between formal model and implementation pattern is what lets the following Budget Safety Invariant be more than a paper theorem.

11. Theorem: Budget Safety Invariant

Having defined the guarded transition, we can now ask the question that matters for governance: what does the guard actually buy us? It is not enough to say that the system “checks the budget” before acting. In a production multi-agent setting, especially one involving financial actions, we want a statement that is precise enough to be machine-checked and narrow enough to be true without hidden behavioral assumptions.
The invariant here is simple but important: whenever the modeled transition successfully applies an action, the resulting budget state remains within its aggregate limit. Formally:
Theorem.∀b,a,b′,   applyAction(b,a)=some(b′)⇒b′.currentSpent≤b′.aggregateLimit.\textbf{Theorem.}\quad
\forall b,a,b',\;\
\mathrm{applyAction}(b,a)=\mathrm{some}(b')
\Rightarrow
b'.\mathrm{currentSpent}\le b'.\mathrm{aggregateLimit}.Theorem.∀b,a,b′, applyAction(b,a)=some(b′)⇒b′.currentSpent≤b′.aggregateLimit.
The phrase “successfully applies” is doing real work. The function applyAction\mathrm{applyAction}applyAction does not always return a new budget. It returns an optional result: either some(b′)\mathrm{some}(b')some(b′), meaning the action was accepted and produced a new state, or none\mathrm{none}none, meaning the guard rejected the action. The theorem is therefore not claiming that every attempted action is safe. It claims that every accepted action is safe in the resulting state.
This distinction is central to machine-checkable governance. A vague safety claim might say, “the agent will not overspend.” That is too broad: it could depend on external APIs, adversarial inputs, race conditions, accounting semantics, or human intervention. The theorem instead states a bounded invariant about one modeled transition. If the transition returns a successful output, then the output satisfies the budget bound. That is exactly the kind of claim a proof assistant can verify.
Using the notation from the transition rule, let
C=b.currentSpent,L=b.aggregateLimit,δ=a.δ.C=b.\mathrm{currentSpent},\qquad
L=b.\mathrm{aggregateLimit},\qquad
\delta=a.\delta.C=b.currentSpent,L=b.aggregateLimit,δ=a.δ.
The successful branch is only reachable when the guard has established
C+δ≤L.C+\delta\le L.C+δ≤L.
On that same branch, the transition constructs the new state so that
b′.currentSpent=C+δ,b′.aggregateLimit=L.b'.\mathrm{currentSpent}=C+\delta,
\qquad
b'.\mathrm{aggregateLimit}=L.b′.currentSpent=C+δ,b′.aggregateLimit=L.
Combining these facts gives the invariant almost immediately: since the new current spend is exactly C+δC+\deltaC+δ, and the limit is still LLL, the guard condition C+δ≤LC+\delta\le LC+δ≤L becomes precisely
b′.currentSpent≤b′.aggregateLimit.b'.\mathrm{currentSpent}\le b'.\mathrm{aggregateLimit}.b′.currentSpent≤b′.aggregateLimit.
The theorem is universal:
∀b,a,b′\forall b,a,b'∀b,a,b′
means it ranges over all modeled budgets, all modeled actions, and all successful output states. This is stronger than testing a representative set of examples. A test might show that many accepted transitions preserve the bound; the theorem says that no accepted transition in the model can violate it.
At the same time, the theorem is intentionally limited. It does not prove arbitrary correctness of a financial platform. It does not prove that the real-world bank balance matches the modeled budget, that concurrent agents cannot bypass this function, or that every possible external expense is represented by an action aaa. Those are separate claims, likely requiring additional models, implementation audits, or deployment assumptions. The value of this theorem is that it cleanly classifies one claim as proved: the guarded transition itself preserves the aggregate budget bound whenever it succeeds.
This is the pattern we want throughout the lecture: replace broad behavioral promises with small, compositional, machine-checkable invariants. The budget theorem is a compact example. A guard establishes a precondition, the state update preserves the relevant fields in a controlled way, and the desired postcondition follows from substitution.
The visual below condenses this reasoning into the theorem shape: a successful call to applyAction is the antecedent, the budget safety inequality is the consequent, and the successful branch equations explain why the implication holds. The blue-to-green structure mirrors the logical movement from “accepted transition” to “safe resulting state.”
It also emphasizes the claim boundary. The bottom notes distinguish universal quantification from empirical testing, and distinguish safety of the modeled transition from correctness of an entire financial system. That boundary is not a weakness; it is what makes the result precise enough to prove.

12. Proof: Budget Safety Invariant

Having stated the Budget Safety Invariant, the next question is not whether the property sounds plausible, but why it is forced by the structure of the transition itself. This is the core pattern behind many machine-checkable safety results: we do not try to prove that an agent will “behave responsibly” in every possible economic context. Instead, we prove that the only executable branch of a transition function is guarded by exactly the inequality we want after execution.
The theorem is proved under the successful-execution hypothesis
hexec:applyAction(b,a)=some(b′).h_{\mathrm{exec}}:\mathrm{applyAction}(b,a)=\mathrm{some}(b').hexec​:applyAction(b,a)=some(b′).
This hypothesis is important. The invariant is not claiming that every attempted financial action succeeds. It is claiming something narrower and stronger: if the action transition returns a new budget state b′b'b′, then that returned state respects the aggregate budget limit. Failed actions are allowed; unsafe successful actions are not.
The proof proceeds by unfolding the definition of applyAction(b,a)\mathrm{applyAction}(b,a)applyAction(b,a). Internally, the function computes the would-be new spending total C+δC+\deltaC+δ, where CCC is the current spend and δ\deltaδ is the cost increment induced by the action. It then checks the guard
C+δ≤L,C+\delta\le L,C+δ≤L,
where LLL is the aggregate limit. This guard creates two exhaustive cases: either the proposed spend is within the limit, or it is not. The proof is essentially a case split over that Boolean or decidable proposition.
In the first case, the guard succeeds:
C+δ≤L.C+\delta\le L.C+δ≤L.
By the definition of the transition, applyAction\mathrm{applyAction}applyAction returns
some(b′)withb′.currentSpent=C+δ,b′.aggregateLimit=L.\mathrm{some}(b')
\quad\text{with}\quad
b'.\mathrm{currentSpent}=C+\delta,
\quad
b'.\mathrm{aggregateLimit}=L.some(b′)withb′.currentSpent=C+δ,b′.aggregateLimit=L.
At this point the desired invariant is just substitution. Since b′.currentSpentb'.\mathrm{currentSpent}b′.currentSpent is definitionally the updated spend C+δC+\deltaC+δ, and b′.aggregateLimitb'.\mathrm{aggregateLimit}b′.aggregateLimit is definitionally the same limit LLL, the guard inequality becomes
b′.currentSpent=C+δ≤L=b′.aggregateLimit.b'.\mathrm{currentSpent}=C+\delta\le L=b'.\mathrm{aggregateLimit}.b′.currentSpent=C+δ≤L=b′.aggregateLimit.
So the successful branch carries the safety proof with it: the same comparison that allowed execution is exactly the comparison needed to prove the postcondition.
The second case is where the machine-checked nature of the proof becomes especially clean. Suppose the guard fails:
¬(C+δ≤L).\neg(C+\delta\le L).¬(C+δ≤L).
Then the definition of applyAction\mathrm{applyAction}applyAction says that the function returns none\mathrm{none}none. But the theorem is being proved under the hypothesis that execution succeeded:
hexec:applyAction(b,a)=some(b′).h_{\mathrm{exec}}:\mathrm{applyAction}(b,a)=\mathrm{some}(b').hexec​:applyAction(b,a)=some(b′).
After unfolding the failed branch, this would imply
none=some(b′),\mathrm{none}=\mathrm{some}(b'),none=some(b′),
which is impossible. In Lean, this is not a probabilistic argument, an appeal to intent, or a runtime test. It is a contradiction between two different constructors of an option type. A value cannot simultaneously be absent and present.
This is why the proof is so robust. The unsafe branch does not need a separate numerical argument showing that the invariant holds despite the failed guard. It simply cannot satisfy the successful-execution hypothesis. The transition either returns a safe updated budget state, or it returns no state at all. There is no third path where an over-budget state sneaks through as some(b′)\mathrm{some}(b')some(b′).
The key takeaways are:
Safety is attached to the transition boundary, not inferred from downstream behavior.
The guard and the invariant use the same inequality, so the proof is mostly unfolding and substitution.
Failed actions are represented explicitly as none\mathrm{none}none, which prevents unsafe states from masquerading as successful executions.
The case split is exhaustive, so every successful transition is covered.
The visual below compactly organizes this proof as a guarded branch. At the top is the shared hypothesis hexech_{\mathrm{exec}}hexec​, which says we are only reasoning about successful transitions. The guard C+δ≤LC+\delta\le LC+δ≤L then divides the proof into the only two possible cases: the green branch, where the returned state inherits the desired inequality, and the red branch, where failure would force the contradiction none=some(b′)\mathrm{none}=\mathrm{some}(b')none=some(b′).
Read the picture as a proof skeleton rather than as an operational trace. Its main message is that budget safety is not an emergent behavioral claim; it is a structural consequence of the transition definition. The only branch compatible with “execution happened” is the branch already guarded by the budget inequality, so every successful transition satisfies the invariant.

13. Worked Example: Tight-Budget Coordination

After proving that the guarded transition cannot overspend the budget, there is a natural next question: what happens inside the region that the guard allows? Safety says that forbidden financial transitions are refused before execution. It does not say that every allowed transition is useful, coordinated, or welfare-maximizing. The tight-budget coordination example is meant to separate those two ideas cleanly.
Here the model fixes a population of nnn agents and gives the system a budget exactly proportional to the number of agents:
B=25n,S={MODERATE,CONSERVATIVE,AGGRESSIVE,NONCOMPLIANT}.B = 25n,
\qquad
\mathcal{S}
=
\{
\mathrm{MODERATE},
\mathrm{CONSERVATIVE},
\mathrm{AGGRESSIVE},
\mathrm{NONCOMPLIANT}
\}.B=25n,S={MODERATE,CONSERVATIVE,AGGRESSIVE,NONCOMPLIANT}.
Each agent chooses one strategy s∈Ss \in \mathcal{S}s∈S. A strategy has a cost csc_scs​, a quality contribution QsQ_sQs​, and possibly a penalty PsP_sPs​. The important modeling choice is that MODERATE\mathrm{MODERATE}MODERATE costs exactly 252525, so if every agent chooses MODERATE\mathrm{MODERATE}MODERATE, the group spends exactly
n⋅25=B.n \cdot 25 = B.n⋅25=B.
That makes all-MODERATE\mathrm{MODERATE}MODERATE a very special profile: it fully uses the tight budget without exceeding it.
The alternatives each fail in a different way. CONSERVATIVE\mathrm{CONSERVATIVE}CONSERVATIVE is cheap, but it gives up quality: it spends only 101010 for quality 333, leaving budget unused in a setting where budget can still buy useful output. AGGRESSIVE\mathrm{AGGRESSIVE}AGGRESSIVE produces more quality, 999, but costs 505050, which is twice the per-agent budget allocation and therefore creates coordination pressure. NONCOMPLIANT\mathrm{NONCOMPLIANT}NONCOMPLIANT has relatively high quality, 888, but receives a punishment Ps=−20P_s=-20Ps​=−20, making it unattractive under the modeled payoff rule.
So the example is not just saying “moderation sounds nice.” It encodes a precise tension:
Conservative behavior is safe but inefficient.
Aggressive behavior may be individually tempting in quality terms but is budget-straining.
Noncompliance is discouraged by an explicit penalty.
Moderate behavior exactly matches the per-agent budget share while producing strong quality.
Under the paper’s tight-budget game, for n≤12n \le 12n≤12, the all-MODERATE\mathrm{MODERATE}MODERATE profile is proved to be the unique pure-strategy Nash equilibrium. In equilibrium language, this means that once every agent is choosing MODERATE\mathrm{MODERATE}MODERATE, no single agent can improve its modeled payoff by switching to CONSERVATIVE\mathrm{CONSERVATIVE}CONSERVATIVE, AGGRESSIVE\mathrm{AGGRESSIVE}AGGRESSIVE, or NONCOMPLIANT\mathrm{NONCOMPLIANT}NONCOMPLIANT, holding the other agents fixed. The uniqueness matters because it rules out other pure equilibria such as “everyone is conservative” or “some agents aggressively consume slack while others compensate.”
The welfare conclusion then follows directly. Since all agents choose MODERATE\mathrm{MODERATE}MODERATE, each contributes quality 777, so total welfare is
W=7n.W = 7n.W=7n.
Moreover, under the same constrained optimization model, this equilibrium welfare matches the constrained optimum. Therefore the price of anarchy is
PoA=1.\mathrm{PoA} = 1.PoA=1.
That is a strong efficiency statement: strategic behavior does not degrade welfare relative to the best feasible coordinated outcome, within this model.
But the phrase “within this model” is doing real work. This result depends on the specified strategy set, costs, quality values, penalties, budget rule, and the restriction to n≤12n \le 12n≤12. It is not a universal claim that real agents will always coordinate moderately, nor that learned agents cannot discover weird edge cases. It is also separate from the earlier Budget Safety Invariant. The invariant says unsafe transitions are rejected by applyAction(b,a)\mathrm{applyAction}(b,a)applyAction(b,a); the game-theoretic result says that, among the modeled feasible behaviors, the equilibrium is also efficient.
This separation is central to machine-checkable governance. A runtime guard can enforce a bounded invariant such as “the budget cannot go negative.” A coordination theorem can then analyze what rational agents do inside the permitted region. These are complementary claims, but they are not interchangeable: safety gates prevent invalid execution; equilibrium analysis evaluates modeled behavior after the gates define the feasible space.
The visual below compresses the worked example into its essential moving parts: the tight budget B=25nB=25nB=25n, the four strategies with their costs and quality values, the highlighted all-MODERATE\mathrm{MODERATE}MODERATE equilibrium, and the resulting welfare statement W=7nW=7nW=7n, PoA=1\mathrm{PoA}=1PoA=1. The red penalty on NONCOMPLIANT\mathrm{NONCOMPLIANT}NONCOMPLIANT, the high cost of AGGRESSIVE\mathrm{AGGRESSIVE}AGGRESSIVE, and the low quality of CONSERVATIVE\mathrm{CONSERVATIVE}CONSERVATIVE make the intuition visible at a glance.
Most importantly, the summary keeps the efficiency claim in its proper lane. The conclusion is not “we no longer need safety checks.” It is: given the tight-budget game, the efficient equilibrium is all-MODERATE\mathrm{MODERATE}MODERATE; meanwhile, pre-execution budget guards still remain the mechanism that refuses unsafe financial transitions.

14. Algorithm and Contracts: Recursive Language Model Step

The tight-budget coordination example gives us the right mental model: the system is not trying to predict whether an agent will behave well under every possible future interaction. Instead, it wraps each risky transition in a small, checkable kernel of refusal rules. The recursive language model step is the same pattern in a different costume. It is not a proof that the language model is correct, truthful, harmless, or semantically aligned. It is a proof that a particular bounded transition cannot spend more recursion or budget than the interface permits.

The state of the recursive language model process can be summarized as

$s=(d,d_{\max},m,M,\tau,U,\mathrm{answerReady}).$

Here $d$ is the current recursion depth, $d_{\max}$ is the maximum allowed depth, $m$ is the remaining step budget, $M$ is the initial or global step bound, $\tau$ is the accumulated resource cost, $U$ is the maximum allowed resource budget, and $\mathrm{answerReady}$ records whether the recursive process has terminated with an answer. The important design choice is that these quantities are not merely bookkeeping. They are the variables over which the machine-checked contract is stated.

The transition itself follows the guarded pre-execution pattern:

function rlmStep(s,u)
  require u &gt; 0

  if answerReady then
    return none

  if m = 0 then
    return none

  if tau + u &gt; U then
    return none

  return some(s with
    m := m - 1,
    tau := tau + u,
    answerReady := (m - 1 = 0),
    d, d_max, M, U unchanged)
end function

The key word is pre-execution. The system refuses before consuming the step if any guard fails. If the answer is already ready, there is no further transition. If the remaining step counter is zero, there is no further transition. If the proposed cost $u$ would push the accumulated usage above the global budget $U$ , there is no further transition. In each of these cases, the output is $\mathrm{none}$ , meaning “no state update occurs.”

This is a modest contract, but it is exactly the kind of modesty that makes the guarantee machine-checkable. A successful transition has the form $\mathrm{some}(s')$ , and then the post-state $s'$ differs from $s$ only in controlled ways:

$m' = m-1, \qquad \tau' = \tau+u, \qquad \mathrm{answerReady}' = (m-1=0).$

The depth fields $d,d_{\max}$ , the initial maximum $M$ , and the usage cap $U$ are unchanged by this step. Therefore, if the invariant $d \le d_{\max}$ held before the transition, it still holds after the transition, simply because neither side of the inequality changed. This is not a deep semantic fact about recursion; it is a structural fact about the update.

The resource-budget invariant is only slightly more interesting. The transition guard checks

$\tau + u \le U$

by refusing whenever $\tau+u>U$ . Since a successful transition sets $\tau'=\tau+u$ , every successful transition satisfies

$\tau' \le U.$

That is the whole proof shape: the guard is phrased in exactly the algebraic form needed by the postcondition. This is why guarded transitions are so valuable in high-assurance agent systems. They reduce a potentially messy behavioral claim — “the recursive model will not overuse resources” — into a local preservation lemma about one transition.

There is also a termination-flavored argument hiding in the update to $m$ . On every successful step, $m$ strictly decreases:

$m' = m-1 < m.$

Because the transition refuses when $m=0$ , the process cannot continue taking successful steps forever. The state variable $m$ acts as a ranking measure into the natural numbers. This is the same mathematical pattern that appears in many termination proofs: identify a well-founded quantity, show that every real step decreases it, and prevent steps once the quantity reaches its lower bound.

Notice, however, what the contract does not prove. It does not prove that the model’s answer is correct. It does not prove that the recursive decomposition is useful. It does not prove that the model cannot leak information through some external channel. Those are different claim classes. The RLM step proves bounded transition properties: preserved depth bounds, decreasing step budget, resource-budget safety, and terminal behavior once $\mathrm{answerReady}$ is true. Semantic quality and deployment isolation must be handled elsewhere.

That distinction matters most for the access-control-style statement

$\mathrm{canAccess}(d,p,pos) \Rightarrow p.\mathrm{startPos}\le pos\le p.\mathrm{endPos}.$

This is an isolation axiom or deployment assumption, not a consequence of rlmStep. The step does not manipulate memory pages, sandbox policies, file handles, or process boundaries. So if the paper relies on the claim that a recursive call can only access positions inside an assigned partition $p$ , that claim must be provided by the runtime, sandbox, type discipline, or operating environment. It can be part of the trusted computing base, but it is not derived from the resource transition proof.

The visual below compresses this separation into one interface view: the pseudocode is the checked transition kernel, with refusal branches for terminal state, exhausted recursion budget, and excessive resource use. The successful branch is deliberately narrow: decrement $m$ , increment $\tau$ , possibly mark the answer ready, and leave the structural bounds unchanged.

The contracts panel then separates what follows from the transition itself from what must be assumed externally. The green “proved” side corresponds to preservation and monotonicity lemmas that can be discharged by straightforward transition reasoning. The amber “axiom” side marks the sandbox-style access guarantee: important for deployment safety, but not manufactured by the RLM step. This is the broader lesson of the section: strong systems are built by combining small machine-checkable invariants with clearly labeled assumptions, rather than blurring them into one large behavioral promise.

14. Algorithm and Contracts: Recursive Language Model Step

15. Unifying View: Bounded Invariants in a Defense-in-Depth Kernel

Having treated the recursive language-model step as a contract-bound transition rather than a magical source of correctness, we can now pull the whole architecture into one view. The common pattern is not “prove the agent is safe” in some global behavioral sense. The more modest—and much more machine-checkable—claim is that certain bounded transitions cannot cross specified guardrails when they are executed through the verified kernel.
That distinction matters. Production multi-agent systems are open-ended: agents may reason badly, receive adversarial inputs, call tools in surprising orders, or operate under infrastructure assumptions that are not themselves formalized. A theorem that tries to quantify over all possible future behavior quickly becomes either false, unprovable, or dependent on enormous environmental assumptions. The paper’s strategy is narrower: identify places where the system must pass through a small transition function, prove an invariant about that transition, and make the runtime refuse or terminate when the invariant would be violated.
The budget gate is the cleanest example. The formal guarantee is not that “the organization will never lose money” or that “agents will always act economically.” It is the bounded pre-execution invariant
applyAction(b,a)=some(b′)⇒C+δ≤L.\mathrm{applyAction}(b,a)=\mathrm{some}(b')\Rightarrow C+\delta\le L .applyAction(b,a)=some(b′)⇒C+δ≤L.
Read operationally: if the guarded transition accepts a financial action and returns an updated budget state b′b'b′, then the cost already committed plus the proposed increment remains within the limit. If the action would exceed the limit, the transition returns none\mathrm{none}none before execution. This is a strong property precisely because it is local: every modeled financial action must pass through the same deterministic gate. It is also limited for the same reason: actions outside the modeled financial transition are not covered by this theorem.
The Bayesian trust circuit breaker has a similar shape, but it handles a different failure mode. Budget safety is about known constraints before execution; trust degradation is about observed failures over time. The relevant invariant is a termination or opening condition:
T(α,β)<θ⇒state=OPEN.T(\alpha,\beta)<\theta\Rightarrow \mathrm{state}=\mathrm{OPEN}.T(α,β)<θ⇒state=OPEN.
Here T(α,β)T(\alpha,\beta)T(α,β) is the trust score induced by the Beta belief state, and θ\thetaθ is the threshold below which the breaker must open. The proof depends on the specified update model: failures monotonically increase the failure evidence, the trust comparison can be cross-multiplied into arithmetic over natural or rational quantities, and repeated failures eventually make the threshold reachable. The theorem is therefore not “Bayesian trust is always calibrated in the real world.” It is: under this Beta update rule and this threshold semantics, degradation cannot be ignored indefinitely.
The coordination result occupies a third role in the defense-in-depth picture. Under the paper’s tight budget model, the mechanism obtains
B=25n,PoA=1.B=25n,\qquad \mathrm{PoA}=1 .B=25n,PoA=1.
This says that, for the specified payoff and budget structure, decentralized behavior aligns with the welfare-optimal budget-feasible outcome. Again, the strength comes from the model being sharply specified. The result is valuable because it demonstrates that budget constraints need not merely block bad behavior; in a carefully designed game, they can also align incentives. But it should not be read as a universal claim about arbitrary multi-agent markets, arbitrary utilities, or arbitrary resource coupling.
The recursive language-model contracts then add a termination-oriented layer around inference itself. The system does not prove that the model’s answer is semantically correct. Instead, it proves boundedness properties of the recursive process: a measure mmm decreases, token use satisfies τ≤U\tau\le Uτ≤U, and depth remains bounded by d≤dmax⁡d\le d_{\max}d≤dmax​. These are interface-level claims. They are exactly the kind of thing a proof assistant can certify: recursion cannot continue forever if every accepted step decreases a well-founded measure, and resource consumption cannot exceed the stated bound if every transition preserves the counter invariant.
Finally, deployment isolation sits at the edge of the formal story. The intended sandbox property can be written as
canAccess(d,p,pos)⇒p.startPos≤pos≤p.endPos.\mathrm{canAccess}(d,p,pos)\Rightarrow p.\mathrm{startPos}\le pos\le p.\mathrm{endPos}.canAccess(d,p,pos)⇒p.startPos≤pos≤p.endPos.
This is the right shape for an invariant, but its evidence status is different. If the property depends on operating-system isolation, cloud configuration, container boundaries, or hardware enforcement that are not modeled in Lean, then it is not a proved theorem of the kernel. It may be axiomatized, assumed, tested, or justified by deployment practice—but it belongs to a different claim class. This distinction is not pedantry; it is what prevents a machine-checked theorem from being overmarketed as an end-to-end security guarantee.
So the unifying idea is a defense-in-depth kernel of bounded invariants:
deterministic gates handle known pre-execution constraints;
Bayesian trust handles post-observation degradation;
coordination proofs analyze incentives under a specified budget model;
recursive contracts bound inference steps;
deployment isolation marks the boundary where formal guarantees meet infrastructure assumptions.
The empirical results support this picture without expanding the theorem statements. Sub-millisecond policy evaluation suggests the gates are practical. Blocking 8/8 adversarial scenarios is useful evidence about the implemented system. Zero reported budget violations under concurrency tests increases confidence that the implementation respects the intended transition discipline. But these are still tests and benchmarks, not Lean proofs—especially when they are internal and unaudited by third parties.
The summary visual that follows condenses the architecture along the axis that matters most: mechanism, formal object, evidence level, runtime role, and limitation. Its purpose is not merely to list components, but to keep the claim boundaries visible. A green “Proved” tag means something very different from an orange “Tested” tag or a gray “Conjectured / axiomatized” tag, even when all three appear in the same deployed system.
The most important takeaway is therefore deliberately bounded: the framework provides machine-checkable, pre-execution or transition-level guarantees for selected invariants inside specified models. Empirical tests can support confidence in the implementation, and deployment assumptions can make the system usable in practice, but neither turns a local invariant into a proof about arbitrary agent semantics or the entire surrounding infrastructure.