What is a KillSwitch in an agent framework?

A KillSwitch is a runtime safety mechanism that terminates agent execution when token consumption exceeds configured limits. TPipe's KillSwitch is a 66-line Kotlin file at src/main/kotlin/P2P/KillSwitch.kt that defines a data class with input and output token limits, plus a callback typed (KillSwitchContext) -> Nothing. The Nothing return type is enforced by the Kotlin compiler — the callback cannot return normally. By default it throws KillSwitchException, a RuntimeException that propagates as an uncaught exception through the entire call chain. Termination is structural, not prompt-based.

Why is a token budget alone not enough to prevent runaway costs?

A token budget is an estimate. An LLM call can exceed the estimate. A retry policy on an LLM call can trigger the same call repeatedly. A wrapper that catches the over-budget exception and returns a default value lets the agent continue and accrue more cost. TPipe's KillSwitch bypasses every one of these. It throws a runtime exception. The exception is caught by a specific carve-out in the Splitter that re-throws it before the generic catch can swallow it. The architecture defends the propagation. The agent stops. The cost stops.

How does TPipe's KillSwitch differ from a retry policy?

Retry policies catch the failure, wait, and try again. They are designed to recover from transient errors. KillSwitchException is a runtime exception designed to bypass retry policies. The propagation is the entire point. When the runtime sees KillSwitchException, the call chain unwinds without retry. There is no 'try again' clause. The agent is done. The bill is capped at the threshold that triggered the trip.

What happens when the KillSwitch trips?

The checkKillSwitch function in Pipe.kt evaluates the accumulated input and output token counts against the configured limits. If either is exceeded, the function invokes the kill switch's onTripped callback with a KillSwitchContext carrying the p2pInterface, tokens spent, elapsed time, the reason (input_exceeded, output_exceeded, or input_and_output_exceeded), the accumulated totals from the root agent, and the nesting depth. The default callback throws KillSwitchException. The exception carries the context as a field and a formatted message. The runtime unwinds the call chain.

How does the KillSwitch propagate through multi-agent systems?

Every TPipe container — Pipeline, Manifold, Junction, Splitter, MultiConnector, DistributionGrid — implements the P2PInterface, which exposes the killSwitch property. Setting a killSwitch on any container propagates it to every child pipeline, and each container class runs its own accumulation and check as it executes, propagating the kill through the call chain as it goes. The Splitter is one concrete example of this pattern. It runs branches in parallel, and each branch spends tokens independently. To enforce the budget across the parallel branches, the Splitter accumulates the input and output tokens spent in a running accumulator (Splitter.kt:732 — killSwitchInputAccumulator and killSwitchOutputAccumulator increment after every branch completes) and fires the check on the accumulated totals. The same pattern runs in every container class: running accumulation, check, and propagation through the call chain at every depth of nesting and recursion.

Can you customize the KillSwitch callback?

Yes. The onTripped callback is typed (KillSwitchContext) -> Nothing. Nothing is Kotlin's bottom type — a function typed as returning Nothing cannot return normally. The default is `{ ctx -> throw KillSwitchException(ctx) }`. To add observability, replace the callback with a custom one that logs, fires a metric, or pages an operator, and then throws: `{ ctx -> log(ctx.reason); metrics.counter("killswitch.tripped").increment(1); throw KillSwitchException(ctx) }`. The type system enforces the throw. A callback that logs and returns is not a misconfigured kill switch — it is a compile error. The compiler rejects the code before it runs.

Why is the propagation root-down instead of bottom-up?

The accumulation tracks the total cost across the entire call chain, not the cost at any individual node. If the budget is 100,000 input tokens and three child agents each spend 40,000, no individual agent has exceeded its limit, but the operator has exceeded the budget. Bottom-up checking would let the call chain run to completion and bill the operator 120,000 tokens. Root-down accumulation catches the total before any child finishes. The limit is the operator's limit, not the agent's limit.

The KillSwitch: Token Budgets That Actually Kill the Agent

The billion-token burn

The SDK timed out. AWS did not tell us.

The Bedrock call hung past the configured timeout. The SDK returned a generic exception that, by every convention in every framework we had ever used, was a transient error. The retry policy kicked in. The retry policy tried again. The retry policy tried again. Each retry spent somewhere between 50,000 and 200,000 input tokens. Autogenesis, running headless 24/7 with no human in the loop, kept retrying.

By the time the chaos alarm fired, we had burned close to a billion input tokens. The bill would have been in the thousands of dollars — the kind of number that ends a three-person company before it ends the month. We got lucky. The alarm fired in time. The agent stopped. The invoice never arrived.

Luck is not a safety system. The next outage is a different exception class. The next retry loop lands on a different transient error. The next silent SDK failure bills us before we wake up.

We built the KillSwitch because we were the ones who would pay the bill. The framework we were using did not have one. We could have waited for the framework to add one. We could have asked a vendor to fix it. We could have raised money to pay for the next outage. We had none of those options. We had source code, and we had the kind of anger that produces good engineering.

The KillSwitch is what we built. It is 66 lines of Kotlin. It has been in production for about 6 months. It has never failed to terminate. This post is about what those 66 lines do, why they work, and how the architecture defends them.

What frameworks get wrong

The standard answer to runaway cost in agent systems is a per-call token budget. You set a limit, you check tokens against the limit, you reject the call when the limit is exceeded. This is a budget cap. It assumes the agent cooperates with the cap.

It is wrong in two ways the industry has not caught up to.

A budget cap is a per-call ceiling. It does not see the debt across calls. If each call is allowed to spend up to X input tokens, and the agent retries N times, the agent spends N*X. The budget cap does not see the accumulated debt. The cap is a guardrail on a single trip. It is not a balance sheet across the trip series. An agent that fails fifty times in a row under a per-call cap has spent 50X — and the cap has not blinked.

AWS bills input tokens on crashes. The model does not need to produce output for you to be charged. The prefill happens before the generation. If the SDK times out, if the model returns an error, if the request gets dropped at the load balancer — the input tokens were already counted, and AWS already charged them. You paid for input and got nothing back. This is the worst-case scenario, and it is the default scenario for any retry loop on a transient error. Each crash bills you for the input. Each retry bills you again. At ten crashes per second, the bill grows at ten times the input cost per second — forever, until the loop stops, or your company stops existing.

That is what hit us. A silent SDK timeout, a retry policy that did not know the error was unrecoverable, and AWS happily billing input tokens for every retry. A billion input tokens before the alarm fired. The per-call budget was never exceeded. The accumulated debt was catastrophic. The 66 lines below is the answer to both failure modes.

A budget cap that the agent can work around is a budget cap the agent will work around. The KillSwitch is termination architecture. The runtime kills the call chain before the cost accrues. The agent does not get a chance to spend what it should not.

The difference is structural, not procedural. A budget cap is a per-call value the framework’s call wrapper checks — and the wrapper is also what decides what happens when the cap is exceeded. The wrapper can throw, return a default, log and continue, or hand the failure back to the caller. Propagation is a property the wrapper has to honor. A KillSwitch is a property of the P2PInterface. Every container that implements it — Pipeline, Manifold, Junction, Splitter, MultiConnector, DistributionGrid — runs a check as it executes: the running container captures the token totals, fires the kill, and propagates it through the call chain. Whichever container is currently executing is the active checker. The runtime guarantees propagation at every depth of nesting and recursion by design, with a catch-and-rethrow carve-out for KillSwitchException that prevents any generic catch from swallowing the kill.

The 66 lines

The entire KillSwitch implementation lives at src/main/kotlin/P2P/KillSwitch.kt. Here it is, in full:

package com.TTT.P2P

import com.TTT.P2P.P2PInterface

/**
 * Emergency kill switch that halts agent execution when token consumption exceeds configured limits.
 *
 * When attached to a [P2PInterface], the kill switch monitors input and output token usage and
 * immediately terminates the agent if either limit is exceeded. The termination is absolute —
 * no retry policies, loop re-entry, or generic exception handlers will intercept it.
 */
data class KillSwitch(
    /** Maximum tokens allowed for input (prompt + context). null = no limit. */
    val inputTokenLimit: Int? = null,
    /** Maximum tokens allowed for output (response + reasoning). null = no limit. */
    val outputTokenLimit: Int? = null,
    /** Callback invoked when the kill switch trips. Default throws [KillSwitchException]. */
    val onTripped: (KillSwitchContext) -> Nothing = { ctx -> throw KillSwitchException(ctx) }
)

data class KillSwitchContext(
    val p2pInterface: P2PInterface,
    val inputTokensSpent: Int,
    val outputTokensSpent: Int,
    val elapsedMs: Long,
    val reason: String,
    val accumulatedInputTokens: Int = inputTokensSpent,
    val accumulatedOutputTokens: Int = outputTokensSpent,
    val depth: Int = 0
)

class KillSwitchException(val context: KillSwitchContext) : RuntimeException(
    buildString {
        append("KillSwitch tripped: ${context.reason}")
        append(" | inputTokens=${context.inputTokensSpent}")
        append(" | outputTokens=${context.outputTokensSpent}")
        append(" | elapsedMs=${context.elapsedMs}")
    }
)

That is the entire safety system. One data class. One context class. One exception class. The default callback is { ctx -> throw KillSwitchException(ctx) }. The file ends with a throw.

This is not a coincidence. The throw is the entire architectural argument.

The Nothing type and what it enforces

Look at the signature of the default callback:

onTripped: (KillSwitchContext) -> Nothing = { ctx -> throw KillSwitchException(ctx) }

The return type is Nothing. In Kotlin, Nothing is the bottom type — a type that has no instances. A function typed () -> Nothing cannot return. The only way for such a function to complete is to throw an exception, call error(), call exitProcess(), or call another Nothing-returning function. The compiler enforces this at the type level.

This means you cannot write a misconfigured onTripped callback. The compiler will reject any callback that does not throw, call error(), or call another Nothing-returning function. If your callback types as (KillSwitchContext) -> Nothing, the compiler has verified that it cannot return normally.

The default callback is the simplest possible case: a single throw. The runtime sees a thrown exception, unwinds the call stack, and propagates the exception to the next handler up the chain. The agent stops. The bill stops accruing.

The user can override the callback to add observability. The custom callback can log, send a metric, page an operator, or fire a webhook. The custom callback must end with a throw — and the type system enforces that. A callback that logs and returns is a compilation error. A callback that logs and throws is a working KillSwitch.

Where the check happens

The check lives in Pipe.kt, at line 7622:

protected fun checkKillSwitch(inputTokens: Int, outputTokens: Int, elapsedMs: Long)
{
    killSwitch?.let { ks ->
        val inputLimit = ks.inputTokenLimit
        val outputLimit = ks.outputTokenLimit

        val inputExceeded = inputLimit != null && inputTokens > inputLimit
        val outputExceeded = outputLimit != null && outputTokens > outputLimit

        if (inputExceeded || outputExceeded)
        {
            val reason = when {
                inputExceeded && outputExceeded -> "input_and_output_exceeded"
                inputExceeded -> "input_exceeded"
                else -> "output_exceeded"
            }
            ks.onTripped(com.TTT.P2P.KillSwitchContext(
                p2pInterface = this,
                inputTokensSpent = inputTokens,
                outputTokensSpent = outputTokens,
                elapsedMs = elapsedMs,
                reason = reason
            ))
        }
    }
}

The function is called from the main pipe execution loop at line 6015, after every token-count update. The check pulls the current accumulated totals from pipeTokenUsage or from the parent pipeline, calculates the elapsed time, and asks the KillSwitch whether the limit is exceeded.

The onTripped callback is invoked with a KillSwitchContext that carries the agent that tripped, the tokens spent, the elapsed time, the reason, the accumulated totals from the root, and the nesting depth. The callback’s throw propagates the KillSwitchException up the call stack.

The check fires on every pipe execution. There is no way for the agent to spend tokens without the check seeing them. The check is the cost observability layer. The throw is the enforcement layer.

The catch-and-rethrow carve-out

This is the architectural punchline. Look at Pipeline/Splitter.kt, line 778:

catch(e: com.TTT.P2P.KillSwitchException)
{
    // KillSwitchException must never be caught — it must propagate to terminate the agent
    throw e
}
catch(e: Exception)
{
    //Handle pipeline execution failure by creating error content.
    val errorContent = MultimodalContent("Pipeline execution failed: ${e.message}")

    //Store error result in results collection.
    storeResult(key, pipeline, errorContent)

    if(tracingEnabled)
    {
        trace(TraceEventType.SPLITTER_FAILURE, TracePhase.EXECUTION,
              activatorValue.content,

The Splitter has a generic catch(e: Exception) block. That block turns any exception into an “error content” result and stores it, allowing the next pipeline in the sequence to run. This is normal exception handling. This is the behavior that lets a failed agent be replaced by a working one.

But the generic catch is preceded by a SPECIFIC catch for KillSwitchException. The specific catch re-throws the exception. The comment makes the intent explicit: “KillSwitchException must never be caught — it must propagate to terminate the agent.”

This is the structural defense. The architecture anticipates that a generic exception handler will try to swallow the KillSwitchException and continue execution. The architecture blocks that path with a specific catch that re-throws. The generic catch never sees the KillSwitchException. The propagation is enforced.

Without this carve-out, the KillSwitch would still throw. The exception would still propagate. The Splitter’s generic catch would catch it, turn it into error content, store the result, and the next pipeline in the sequence would run. The agent would not stop. The bill would keep growing. The KillSwitch would be defeated by normal error handling.

The carve-out is the architectural commitment. It says: when the cost limit is hit, the agent stops. Not “the agent logs the failure and tries again.” Not “the agent falls back to a cheaper model.” The agent stops. The carve-out enforces that.

The root-down accumulator

The Splitter runs branches in parallel. Each branch spends tokens. The Splitter needs to know the total spent across all branches to enforce the budget. The accumulator lives in Splitter.kt, at line 732:

if(killSwitch != null)
{
    killSwitchInputAccumulator += pipeline.inputTokensSpent
    killSwitchOutputAccumulator += pipeline.outputTokensSpent
    val elapsedMs = System.currentTimeMillis() - killSwitchExecutionStartTime
    checkKillSwitch(killSwitchInputAccumulator, killSwitchOutputAccumulator, elapsedMs)
}

After every branch completes, the Splitter adds the branch’s spend to the running total. The total includes the spend from every previous branch. The check is performed against the accumulated total.

The accumulation flows root-down, not bottom-up. The limit is the operator’s limit, not the agent’s limit. If the operator sets a budget of 100,000 input tokens, the KillSwitch trips when the entire call chain has spent 100,000 tokens — regardless of which branch spent them.

This catches the case where no individual agent is over budget but the combined spend is. Three child agents each spending 40,000 input tokens will not trip any individual KillSwitch. The Splitter’s accumulated total of 120,000 will trip the root KillSwitch before any of the children can finish their work.

The accumulation is the operator’s safety net. It is the difference between a per-agent budget (which the operator can game) and a per-call-tree budget (which the operator cannot).

Propagation through the container hierarchy

Every container in TPipe implements the P2PInterface, which exposes the killSwitch property. When you set a killSwitch on a container, the container’s setter propagates it to every child pipeline. The Splitter does it explicitly at line 124:

override var killSwitch: com.TTT.P2P.KillSwitch?
    get() = _killSwitch
    set(value) {
        _killSwitch = value
        // Propagate kill switch to all pipelines
        activatorKeys.values.flatMap { it.pipelines }.forEach { pipeline ->
            pipeline.killSwitch = value
        }
    }

Set the killSwitch on a Manifold and it propagates to the manager and every worker. Set it on a Junction and it propagates to the moderator and every participant. Set it on a DistributionGrid and it propagates to the router and every node. Set it on a Splitter and it propagates to every branch.

The propagation is the design pattern. The operator sets the budget at the highest level that maps to the budget they want to enforce. The runtime walks the container hierarchy and applies the budget to every leaf. The leaves spend tokens. The Splitter accumulates. The root checks the total. The budget is enforced.

The DSL builders wrap this pattern in a clean API. From ManifoldDsl.kt, line 154:

fun killSwitch(inputTokenLimit: Int? = null, outputTokenLimit: Int? = null, onTripped: ((KillSwitchContext) -> Nothing)? = null): ManifoldBuilder<S>
{
    killSwitchConfiguration = if(onTripped != null) {
        KillSwitch(inputTokenLimit = inputTokenLimit, outputTokenLimit = outputTokenLimit, onTripped = onTripped)
    } else {
        KillSwitch(inputTokenLimit = inputTokenLimit, outputTokenLimit = outputTokenLimit)
    }
    return this
}

The DSL call is one line. The result is a KillSwitch wired to every child pipeline in the manifold, with a custom callback if you provided one. The propagation is invisible to the operator. The enforcement is structural.

What termination does that a budget cap cannot

A token budget is a number the agent checks. A KillSwitch is a mechanism that terminates. The difference shows up in failure modes.

A token budget fails when:

The agent retries on a transient error
The wrapper catches the over-budget exception and returns a default
The supervisor catches the failure and restarts the agent
A loop in the agent’s logic re-spends the same tokens

A KillSwitch fails when:

The onTripped callback does not throw
The catch block swallows the exception before it propagates
The check is never invoked because the container hierarchy is misconfigured

The first failure mode is “the budget does not work in practice.” The second failure mode is “the developer wrote a misconfigured KillSwitch.” One is a normal failure of normal systems. The other is a misuse of a defensive system.

The KillSwitch is harder to misuse than a token budget. The compiler rejects callbacks that return normally. The catch-and-rethrow carve-out prevents normal exception handlers from swallowing the propagation. The root-down accumulation prevents the agent from gaming the budget. The architectural commitment is to termination, not estimation.

Setting up your own KillSwitch

The DSL is the easiest path. Inside any manifold, junction, or distributionGrid block:

val manifold = manifold {
    manager {
        pipeline { /* ... */ }
    }
    worker("analyzer") {
        pipeline { /* ... */ }
    }
    killSwitch(inputTokenLimit = 100_000, outputTokenLimit = 50_000)
}

For a custom callback that adds observability:

val pipeline = Pipeline()
pipeline.killSwitch = KillSwitch(
    inputTokenLimit = 50_000,
    outputTokenLimit = 25_000,
    onTripped = { ctx ->
        logger.warn("KillSwitch tripped: ${ctx.reason} at ${ctx.elapsedMs}ms")
        telemetry.reportKillSwitchEvent(ctx)
        // Must throw to terminate
        throw KillSwitchException(ctx)
    }
)

The callback must end with a throw. The type system enforces it. The developer enforces the throw. The throw is the only line that matters.

The bigger picture

The KillSwitch is one of seven intervention points in TPipe’s safety architecture. It is the one that protects against runaway cost. The ContextBank protects against context loss. The DistributionGrid protects against single-node failure. The reasoning pipes protect against structural hallucination. The KillSwitch is the cost side of the safety story.

The next post in this series covers the LangChain migration: how to take an existing agent system running on a Python-based framework and move it to TPipe. The contrast between “type your way into determinism” and “prompt your way into determinism” makes the architectural argument concrete. If you have ever watched a retry loop burn through a budget cap while the agent cheerfully continued, the LangChain post is for you.

For now: 66 lines. One Nothing. One throw. One catch-and-rethrow that defends the propagation. Six months of production. Every trip has held the line. The pattern works because the architecture commits to termination, not estimation.

Reasoning Pipes Explained: How TPipe Stops Prompting and Starts Programming — The prior post in this series. How JSON schema field order forces LLM determinism, and why the LLM is treated as a compiler instead of a conversation partner.
Why P2P Agent Communication Is Inevitable — How KillSwitch propagation works through the P2P call chain, and why operator budgets are enforced top-down across the call tree.
Headless AI Agents: What, Why, and How — Why any agent running 24/7 without a human in the loop needs a KillSwitch, and how the cost-safety pattern maps to the headless production case.
Building Your First TPipe Pipeline — The practical configuration of KillSwitch on a pipeline, with code examples for input/output token limits and custom callbacks.