The billion-token burn

The SDK timed out. AWS did not tell us.

The Bedrock call hung past the configured timeout. The SDK returned a generic exception that, by every convention in every framework we had ever used, was a transient error. The retry policy kicked in. The retry policy tried again. The retry policy tried again. Each retry spent somewhere between 50,000 and 200,000 input tokens. Autogenesis, running headless 24/7 with no human in the loop, kept retrying.

By the time the chaos alarm fired, we had burned close to a billion input tokens. The bill would have been in the thousands of dollars — the kind of number that ends a three-person company before it ends the month. We got lucky. The alarm fired in time. The agent stopped. The invoice never arrived.

Luck is not a safety system. The next outage is a different exception class. The next retry loop lands on a different transient error. The next silent SDK failure bills us before we wake up.

We built the KillSwitch because we were the ones who would pay the bill. The framework we were using did not have one. We could have waited for the framework to add one. We could have asked a vendor to fix it. We could have raised money to pay for the next outage. We had none of those options. We had source code, and we had the kind of anger that produces good engineering.

The KillSwitch is what we built. It is 66 lines of Kotlin. It has been in production for about 6 months. It has never failed to terminate. This post is about what those 66 lines do, why they work, and how the architecture defends them.

What frameworks get wrong

The standard answer to runaway cost in agent systems is a per-call token budget. You set a limit, you check tokens against the limit, you reject the call when the limit is exceeded. This is a budget cap. It assumes the agent cooperates with the cap.

It is wrong in two ways the industry has not caught up to.

A budget cap is a per-call ceiling. It does not see the debt across calls. If each call is allowed to spend up to X input tokens, and the agent retries N times, the agent spends N*X. The budget cap does not see the accumulated debt. The cap is a guardrail on a single trip. It is not a balance sheet across the trip series. An agent that fails fifty times in a row under a per-call cap has spent 50X — and the cap has not blinked.

AWS bills input tokens on crashes. The model does not need to produce output for you to be charged. The prefill happens before the generation. If the SDK times out, if the model returns an error, if the request gets dropped at the load balancer — the input tokens were already counted, and AWS already charged them. You paid for input and got nothing back. This is the worst-case scenario, and it is the default scenario for any retry loop on a transient error. Each crash bills you for the input. Each retry bills you again. At ten crashes per second, the bill grows at ten times the input cost per second — forever, until the loop stops, or your company stops existing.

That is what hit us. A silent SDK timeout, a retry policy that did not know the error was unrecoverable, and AWS happily billing input tokens for every retry. A billion input tokens before the alarm fired. The per-call budget was never exceeded. The accumulated debt was catastrophic. The 66 lines below is the answer to both failure modes.

A budget cap that the agent can work around is a budget cap the agent will work around. The KillSwitch is termination architecture. The runtime kills the call chain before the cost accrues. The agent does not get a chance to spend what it should not.

The difference is structural, not procedural. A budget cap is a per-call value the framework’s call wrapper checks — and the wrapper is also what decides what happens when the cap is exceeded. The wrapper can throw, return a default, log and continue, or hand the failure back to the caller. Propagation is a property the wrapper has to honor. A KillSwitch is a property of the P2PInterface. Every container that implements it — Pipeline, Manifold, Junction, Splitter, MultiConnector, DistributionGrid — runs a check as it executes: the running container captures the token totals, fires the kill, and propagates it through the call chain. Whichever container is currently executing is the active checker. The runtime guarantees propagation at every depth of nesting and recursion by design, with a catch-and-rethrow carve-out for KillSwitchException that prevents any generic catch from swallowing the kill.

The 66 lines

The entire KillSwitch implementation lives at src/main/kotlin/P2P/KillSwitch.kt. Here it is, in full:

package com.TTT.P2P

import com.TTT.P2P.P2PInterface

/**
 * Emergency kill switch that halts agent execution when token consumption exceeds configured limits.
 *
 * When attached to a [P2PInterface], the kill switch monitors input and output token usage and
 * immediately terminates the agent if either limit is exceeded. The termination is absolute —
 * no retry policies, loop re-entry, or generic exception handlers will intercept it.
 */
data class KillSwitch(
    /** Maximum tokens allowed for input (prompt + context). null = no limit. */
    val inputTokenLimit: Int? = null,
    /** Maximum tokens allowed for output (response + reasoning). null = no limit. */
    val outputTokenLimit: Int? = null,
    /** Callback invoked when the kill switch trips. Default throws [KillSwitchException]. */
    val onTripped: (KillSwitchContext) -> Nothing = { ctx -> throw KillSwitchException(ctx) }
)

data class KillSwitchContext(
    val p2pInterface: P2PInterface,
    val inputTokensSpent: Int,
    val outputTokensSpent: Int,
    val elapsedMs: Long,
    val reason: String,
    val accumulatedInputTokens: Int = inputTokensSpent,
    val accumulatedOutputTokens: Int = outputTokensSpent,
    val depth: Int = 0
)

class KillSwitchException(val context: KillSwitchContext) : RuntimeException(
    buildString {
        append("KillSwitch tripped: ${context.reason}")
        append(" | inputTokens=${context.inputTokensSpent}")
        append(" | outputTokens=${context.outputTokensSpent}")
        append(" | elapsedMs=${context.elapsedMs}")
    }
)

That is the entire safety system. One data class. One context class. One exception class. The default callback is { ctx -> throw KillSwitchException(ctx) }. The file ends with a throw.

This is not a coincidence. The throw is the entire architectural argument.

The Nothing type and what it enforces

Look at the signature of the default callback:

onTripped: (KillSwitchContext) -> Nothing = { ctx -> throw KillSwitchException(ctx) }

The return type is Nothing. In Kotlin, Nothing is the bottom type — a type that has no instances. A function typed () -> Nothing cannot return. The only way for such a function to complete is to throw an exception, call error(), call exitProcess(), or call another Nothing-returning function. The compiler enforces this at the type level.

This means you cannot write a misconfigured onTripped callback. The compiler will reject any callback that does not throw, call error(), or call another Nothing-returning function. If your callback types as (KillSwitchContext) -> Nothing, the compiler has verified that it cannot return normally.

The default callback is the simplest possible case: a single throw. The runtime sees a thrown exception, unwinds the call stack, and propagates the exception to the next handler up the chain. The agent stops. The bill stops accruing.

The user can override the callback to add observability. The custom callback can log, send a metric, page an operator, or fire a webhook. The custom callback must end with a throw — and the type system enforces that. A callback that logs and returns is a compilation error. A callback that logs and throws is a working KillSwitch.

Where the check happens

The check lives in Pipe.kt, at line 7622:

protected fun checkKillSwitch(inputTokens: Int, outputTokens: Int, elapsedMs: Long)
{
    killSwitch?.let { ks ->
        val inputLimit = ks.inputTokenLimit
        val outputLimit = ks.outputTokenLimit

        val inputExceeded = inputLimit != null && inputTokens > inputLimit
        val outputExceeded = outputLimit != null && outputTokens > outputLimit

        if (inputExceeded || outputExceeded)
        {
            val reason = when {
                inputExceeded && outputExceeded -> "input_and_output_exceeded"
                inputExceeded -> "input_exceeded"
                else -> "output_exceeded"
            }
            ks.onTripped(com.TTT.P2P.KillSwitchContext(
                p2pInterface = this,
                inputTokensSpent = inputTokens,
                outputTokensSpent = outputTokens,
                elapsedMs = elapsedMs,
                reason = reason
            ))
        }
    }
}

The function is called from the main pipe execution loop at line 6015, after every token-count update. The check pulls the current accumulated totals from pipeTokenUsage or from the parent pipeline, calculates the elapsed time, and asks the KillSwitch whether the limit is exceeded.

The onTripped callback is invoked with a KillSwitchContext that carries the agent that tripped, the tokens spent, the elapsed time, the reason, the accumulated totals from the root, and the nesting depth. The callback’s throw propagates the KillSwitchException up the call stack.

The check fires on every pipe execution. There is no way for the agent to spend tokens without the check seeing them. The check is the cost observability layer. The throw is the enforcement layer.

The catch-and-rethrow carve-out

This is the architectural punchline. Look at Pipeline/Splitter.kt, line 778:

catch(e: com.TTT.P2P.KillSwitchException)
{
    // KillSwitchException must never be caught — it must propagate to terminate the agent
    throw e
}
catch(e: Exception)
{
    //Handle pipeline execution failure by creating error content.
    val errorContent = MultimodalContent("Pipeline execution failed: ${e.message}")

    //Store error result in results collection.
    storeResult(key, pipeline, errorContent)

    if(tracingEnabled)
    {
        trace(TraceEventType.SPLITTER_FAILURE, TracePhase.EXECUTION,
              activatorValue.content,

The Splitter has a generic catch(e: Exception) block. That block turns any exception into an “error content” result and stores it, allowing the next pipeline in the sequence to run. This is normal exception handling. This is the behavior that lets a failed agent be replaced by a working one.

But the generic catch is preceded by a SPECIFIC catch for KillSwitchException. The specific catch re-throws the exception. The comment makes the intent explicit: “KillSwitchException must never be caught — it must propagate to terminate the agent.”

This is the structural defense. The architecture anticipates that a generic exception handler will try to swallow the KillSwitchException and continue execution. The architecture blocks that path with a specific catch that re-throws. The generic catch never sees the KillSwitchException. The propagation is enforced.

Without this carve-out, the KillSwitch would still throw. The exception would still propagate. The Splitter’s generic catch would catch it, turn it into error content, store the result, and the next pipeline in the sequence would run. The agent would not stop. The bill would keep growing. The KillSwitch would be defeated by normal error handling.

The carve-out is the architectural commitment. It says: when the cost limit is hit, the agent stops. Not “the agent logs the failure and tries again.” Not “the agent falls back to a cheaper model.” The agent stops. The carve-out enforces that.

The root-down accumulator

The Splitter runs branches in parallel. Each branch spends tokens. The Splitter needs to know the total spent across all branches to enforce the budget. The accumulator lives in Splitter.kt, at line 732:

if(killSwitch != null)
{
    killSwitchInputAccumulator += pipeline.inputTokensSpent
    killSwitchOutputAccumulator += pipeline.outputTokensSpent
    val elapsedMs = System.currentTimeMillis() - killSwitchExecutionStartTime
    checkKillSwitch(killSwitchInputAccumulator, killSwitchOutputAccumulator, elapsedMs)
}

After every branch completes, the Splitter adds the branch’s spend to the running total. The total includes the spend from every previous branch. The check is performed against the accumulated total.

The accumulation flows root-down, not bottom-up. The limit is the operator’s limit, not the agent’s limit. If the operator sets a budget of 100,000 input tokens, the KillSwitch trips when the entire call chain has spent 100,000 tokens — regardless of which branch spent them.

This catches the case where no individual agent is over budget but the combined spend is. Three child agents each spending 40,000 input tokens will not trip any individual KillSwitch. The Splitter’s accumulated total of 120,000 will trip the root KillSwitch before any of the children can finish their work.

The accumulation is the operator’s safety net. It is the difference between a per-agent budget (which the operator can game) and a per-call-tree budget (which the operator cannot).

Propagation through the container hierarchy

Every container in TPipe implements the P2PInterface, which exposes the killSwitch property. When you set a killSwitch on a container, the container’s setter propagates it to every child pipeline. The Splitter does it explicitly at line 124:

override var killSwitch: com.TTT.P2P.KillSwitch?
    get() = _killSwitch
    set(value) {
        _killSwitch = value
        // Propagate kill switch to all pipelines
        activatorKeys.values.flatMap { it.pipelines }.forEach { pipeline ->
            pipeline.killSwitch = value
        }
    }

Set the killSwitch on a Manifold and it propagates to the manager and every worker. Set it on a Junction and it propagates to the moderator and every participant. Set it on a DistributionGrid and it propagates to the router and every node. Set it on a Splitter and it propagates to every branch.

The propagation is the design pattern. The operator sets the budget at the highest level that maps to the budget they want to enforce. The runtime walks the container hierarchy and applies the budget to every leaf. The leaves spend tokens. The Splitter accumulates. The root checks the total. The budget is enforced.

The DSL builders wrap this pattern in a clean API. From ManifoldDsl.kt, line 154:

fun killSwitch(inputTokenLimit: Int? = null, outputTokenLimit: Int? = null, onTripped: ((KillSwitchContext) -> Nothing)? = null): ManifoldBuilder<S>
{
    killSwitchConfiguration = if(onTripped != null) {
        KillSwitch(inputTokenLimit = inputTokenLimit, outputTokenLimit = outputTokenLimit, onTripped = onTripped)
    } else {
        KillSwitch(inputTokenLimit = inputTokenLimit, outputTokenLimit = outputTokenLimit)
    }
    return this
}

The DSL call is one line. The result is a KillSwitch wired to every child pipeline in the manifold, with a custom callback if you provided one. The propagation is invisible to the operator. The enforcement is structural.

What termination does that a budget cap cannot

A token budget is a number the agent checks. A KillSwitch is a mechanism that terminates. The difference shows up in failure modes.

A token budget fails when:

  • The agent retries on a transient error
  • The wrapper catches the over-budget exception and returns a default
  • The supervisor catches the failure and restarts the agent
  • A loop in the agent’s logic re-spends the same tokens

A KillSwitch fails when:

  • The onTripped callback does not throw
  • The catch block swallows the exception before it propagates
  • The check is never invoked because the container hierarchy is misconfigured

The first failure mode is “the budget does not work in practice.” The second failure mode is “the developer wrote a misconfigured KillSwitch.” One is a normal failure of normal systems. The other is a misuse of a defensive system.

The KillSwitch is harder to misuse than a token budget. The compiler rejects callbacks that return normally. The catch-and-rethrow carve-out prevents normal exception handlers from swallowing the propagation. The root-down accumulation prevents the agent from gaming the budget. The architectural commitment is to termination, not estimation.

Setting up your own KillSwitch

The DSL is the easiest path. Inside any manifold, junction, or distributionGrid block:

val manifold = manifold {
    manager {
        pipeline { /* ... */ }
    }
    worker("analyzer") {
        pipeline { /* ... */ }
    }
    killSwitch(inputTokenLimit = 100_000, outputTokenLimit = 50_000)
}

For a custom callback that adds observability:

val pipeline = Pipeline()
pipeline.killSwitch = KillSwitch(
    inputTokenLimit = 50_000,
    outputTokenLimit = 25_000,
    onTripped = { ctx ->
        logger.warn("KillSwitch tripped: ${ctx.reason} at ${ctx.elapsedMs}ms")
        telemetry.reportKillSwitchEvent(ctx)
        // Must throw to terminate
        throw KillSwitchException(ctx)
    }
)

The callback must end with a throw. The type system enforces it. The developer enforces the throw. The throw is the only line that matters.

The bigger picture

The KillSwitch is one of seven intervention points in TPipe’s safety architecture. It is the one that protects against runaway cost. The ContextBank protects against context loss. The DistributionGrid protects against single-node failure. The reasoning pipes protect against structural hallucination. The KillSwitch is the cost side of the safety story.

The next post in this series covers the LangChain migration: how to take an existing agent system running on a Python-based framework and move it to TPipe. The contrast between “type your way into determinism” and “prompt your way into determinism” makes the architectural argument concrete. If you have ever watched a retry loop burn through a budget cap while the agent cheerfully continued, the LangChain post is for you.

For now: 66 lines. One Nothing. One throw. One catch-and-rethrow that defends the propagation. Six months of production. Every trip has held the line. The pattern works because the architecture commits to termination, not estimation.