What's the order of pipe configuration calls?

The compiler doesn't enforce order, but matching this order makes the config read top-to-bottom as a story: useConverseApi(), setRegion, setModel, setTemperature/setTopP, setSystemPrompt, setJsonInput/setJsonOutput with requireJsonPromptInjection(), setTokenBudget, setPipeName, setReasoningPipe if needed, and enableLoreBookFillAndSplitMode() if you have a populated lorebook. The field you'll tune most is contextWindowSize — it depends on your model. For Haiku 200k it's 200k. For Qwen 32b on Bedrock it might be 32k.

When should I use the builder vs apply { }?

Use chained calls for short configs — five lines is readable as a chain. Use apply { } when the config hits 15+ lines or has conditional logic. Both compile to the same place; apply { } just runs the block in the context of the pipe so every call inside is a method on the pipe without repeating the receiver. It's a readability choice, not a correctness choice.

What's the difference between autoTruncateContext(), enableLoreBookFillMode(), and enableLoreBookFillAndSplitMode()?

autoTruncateContext() turns on the runtime truncation algorithm — without this call, setTokenBudget activates the config-time checks but the algorithm doesn't run at execution time. enableLoreBookFillMode() is a select-and-fill strategy: top-weighted lorebook entries get selected first, the remaining budget goes to other context. enableLoreBookFillAndSplitMode() adds a split budget reservation for the rest of the top-level context window. The Autogenesis WriterAgent uses enableLoreBookFillAndSplitMode() on all three pipes because long-running TTRPG sessions need guaranteed space for both lorebook entries and recent story context.

How does the control flag system work?

Every pipe receives a MultimodalContent object. After each pipe executes, the pipeline reads the content's control flags: terminatePipeline halts cleanly, passPipeline exits early without being an error, repeatPipe re-calls this pipe with the same content, jumpToPipe redirects execution by pipe name or 'skip-to-next-pipe'. Any pipe can write any flag. The content object is the control plane. This is what makes non-linear pipelines possible — a classifier pipe can set content.jumpToPipe = 'escalation-pipe' to redirect based on its classification result.

What happens if I forget to call init()?

The first call to execute() (or any method that requires initialized state) throws UninitializedComponentException with a clear message naming the uninitialized component. The stack trace points at the line where you called execute(), not the line where you forgot init() — which is why this is the most common mistake and the most time-consuming to debug. Call init() at the end of the config block. The Autogenesis code calls init() after all pipes are built, in a separate pass — that's the right pattern.

How does setTokenBudget interact with KillSwitch?

setTokenBudget activates the runtime context algorithm: lorebook selection, multi-page budget allocation, text-matching preservation, and overflow handling via truncation or compression. KillSwitch is the hard ceiling — it halts the pipeline if tokens truly run out. Together they form a two-layer system: setTokenBudget manages what stays in context, KillSwitch manages what happens when context is exhausted. KillSwitch is automatic once setTokenBudget is configured. The algorithm runs before each LLM call, KillSwitch fires only when the algorithm has no room left to operate.

How to Build a TPipe Pipeline: The Settings, The Patterns, and What Each One Does

Configuring a pipe

The simplest pipe config:

val pipe = BedrockMultimodalPipe()
    .setRegion("us-west-2")
    .setModel("anthropic.claude-3-haiku-20240307-v1:0")
    .setTemperature(0.7)
    .setSystemPrompt("You are a helpful assistant.")
    .init()

runBlocking {
    val result = pipe.execute("Hello, world!")
    println(result.text)
}

Region, model, temperature, prompt, init, run. Five lines of config. Most production pipes are 10-20 lines.

For longer configs, group with apply { }:

val guidePipe = BedrockMultimodalPipe().apply {
    useConverseApi()
    setRegion("us-west-2")
    setModel(BedrockConfig.qwen235B)
    setTemperature(1.0)
    setTopP(.9)
    requireJsonPromptInjection()
    setJsonInput(PlayerStoryInput::class)
    setJsonOutput(GuideData::class)
    setTokenBudget(BedrockConfig.generativeBudgetSettings)
    setReasoningPipe(BedrockConfig.authorBuilder(
        effectiveAuthorPersonality,
        depth = ReasoningDepth.High,
        duration = ReasoningDuration.Short
    ))
    setPipeName("guide pipe")
    enableLoreBookFillAndSplitMode()

    val systemPrompt = """You are a guide generation agent...""".trimMargin()
    val middlePrompt = """Your user prompt will contain...""".trimMargin()
    val context = """You have been provided with as much...""".trimMargin()

    setSystemPrompt(systemPrompt)
    setMiddlePrompt(middlePrompt)
    autoInjectContext(context)
}

apply { } runs the block in the context of the pipe — every call inside is a method on the pipe. The block’s value is the pipe itself. It’s a style choice. Both compile to the same place. Use it when the config is long enough that chained calls become a wall.

The settings on this pipe are real. They come from the Autogenesis WriterAgent, which runs three pipes like this — guide, selection, writing — for 100+ turn TTRPG sessions. Here’s what each one does.

What each setting does

I’m only covering the ones you’ll touch in production. The full list is in the Bedrock docs. These are the 90%.

useConverseApi() — Use AWS Bedrock’s Converse API. Converse is the unified interface for multi-model access. New pipes should always use it. The Autogenesis team uses it everywhere.

setRegion / setModel — Where the inference runs and which model you hit. setModel always takes a string — either a model ID like anthropic.claude-3-haiku-20240307-v1:0 or a full ARN. Save the IDs you use to constants on a config object (like BedrockConfig.qwen235B) so you don’t have to copy-paste them. For cross-region models, call bedrockEnv.bindInferenceProfile(modelId, arn) first to map the ID to the ARN, or pass the ARN directly.

setTemperature / setTopP — Sampling. 0.0 is deterministic, 1.0 is creative. For extraction and classification, use 0.0-0.3. For creative generation, use 0.7-1.0. Top-p of 0.9 is a reasonable default for most cases — it caps the cumulative probability mass the model considers when sampling the next token, so you keep diversity without wandering into garbage. The Autogenesis guide pipe uses 1.0/0.9 because it generates creative narrative suggestions. The selection pipe uses lower values because it’s picking from a fixed set of options.

setSystemPrompt — The role. “You are a…” Stable across calls. Goes into the system message slot of the prompt.

setJsonInput / setJsonOutput — The types. You use these to make the LLM return typed data. setJsonInput declares the Kotlin class that describes the expected input structure. setJsonOutput declares the Kotlin class that describes the expected output structure. TPipe serializes the input type into the user prompt, embeds the output schema in the system prompt, and validates the LLM’s response against the output schema at the boundary. The LLM is bound by the schema in the prompt. The pipe enforces the schema at the boundary. The Kotlin types are the contract.

requireJsonPromptInjection() — Makes setJsonOutput work by injecting the JSON schema descriptions into the prompt. Pair them — requireJsonPromptInjection() + setJson* is the standard combo.

setTokenBudget(...) — This is the memory management system, not just a cap. Calling setTokenBudget(...) activates TPipe’s runtime context algorithm. At config time, the pipe tokenizes the system prompt, max output, reasoning budget, and user prompt size, subtracts them from the context window, and throws if the configuration itself would overflow. At runtime, before each LLM call, the pipe runs the truncation stage: lorebook selection by priority or weight, multi-page budget allocation across MiniBank pages, text-matching preservation that keeps content matching user-prompt keywords, and either truncates or compresses whatever overflows. Working with KillSwitch (the hard ceiling that halts the pipeline if tokens truly run out), this is the layer that keeps the agent from forgetting important context, drifting under pressure, or drowning in oversized context.

Here’s how the budget object actually gets built:

val budget = TokenBudgetSettings().apply {
    contextWindowSize = 32_000      // total context window in tokens
    maxTokens = 4_000                // reserve for LLM output
    reasoningBudget = 2_000          // reserve for reasoning sub-pipe output
    subtractReasoningFromInput = true // carve reasoning from input, not from maxTokens
    userPromptSize = 8_000           // cap on user prompt reservation
    allowUserPromptTruncation = true // let the algorithm shrink the prompt to fit
    preserveJsonInUserPrompt = true  // keep JSON structure intact during truncation
    compressUserPrompt = false       // truncate, don't compress (safer for typed I/O)
    preserveTextMatches = true       // keep content matching user-prompt keywords
    truncationMethod = ContextWindowSettings.TruncateMiddle  // chop from both ends
    multiPageBudgetStrategy = MultiPageBudgetStrategy.DYNAMIC_SIZE_FILL  // default
    pageWeights = mapOf("story" to 2.0, "lorebook" to 1.0)  // for WEIGHTED_SPLIT
    reserveEmptyPageBudget = false   // don't reserve budget for empty MiniBank pages
}

pipe.setTokenBudget(budget)

The fields, briefly:

contextWindowSize / maxTokens / reasoningBudget / userPromptSize — the four reservations. They get subtracted from the total window; the remainder is available for lorebook and context elements. The pipe throws at config time if these add up to more than contextWindowSize.
subtractReasoningFromInput — when true, the reasoning budget comes out of the input side. When false (default), it’s carved out of the output maxTokens.
allowUserPromptTruncation — when true, the algorithm shrinks the user prompt to fit if other content is too large. When false (default), it throws on overflow instead.
compressUserPrompt — when true, the algorithm compresses the user prompt rather than truncating it. Faster, but compression doesn’t preserve JSON structure and the pipe can’t validate compressed output the way it validates truncated output. For pipes using setJsonInput / setJsonOutput, leave this off.
preserveTextMatches — when true, items containing words from the user prompt are kept before other content gets truncated. This is the “important context survives” guarantee.
truncationMethod — TruncateTop chops from the start, TruncateBottom chops from the end, TruncateMiddle chops from both ends evenly. TruncateTop is the default.
multiPageBudgetStrategy — see below.
pageWeights — only consulted by WEIGHTED_SPLIT. Map of MiniBank page key to weight.
reserveEmptyPageBudget — when true, empty MiniBank pages still reserve a portion of the budget. When false, the budget is only divided across pages that have content. DYNAMIC_FILL and DYNAMIC_SIZE_FILL override this and redistribute to active pages regardless.

Multi-page budget strategies (for MiniBank-tracked content, setMultiPageBudgetStrategy(...) or via the field above):

EQUAL_SPLIT — every page gets the same share of the budget. Predictable, dumb.
WEIGHTED_SPLIT — pages get shares proportional to pageWeights. Use this when you know which pages matter more (e.g. "story" > "notes").
PRIORITY_FILL — walks pages in declared order, fills each up to its current need, exhausts the budget as it goes. First pages win.
DYNAMIC_FILL — starts with priority fill, simulates actual usage after truncation, then redistributes any unused budget across up to 3 passes. Smart fill.
DYNAMIC_SIZE_FILL — DYNAMIC_FILL but prioritizes smaller contexts to protect them from being squeezed out. This is the default.

Lorebook truncation modes (the other half of the algorithm, switched on at the pipe level):

pipe.autoTruncateContext() — turn on automatic truncation at runtime. Without this, the algorithm doesn’t run at execution time even if you’ve set a budget.
pipe.enableLoreBookFillMode() — select-and-fill strategy. Top-weighted lorebook entries get selected first, the remaining budget goes to other context. Use this when lorebook entries are the priority.
pipe.enableLoreBookFillAndSplitMode() — fill mode + reserves a split budget for the rest of the top-level context window. Best for long-running agents where recent context matters more than old context. The Autogenesis WriterAgent uses this on all three pipes.

setReasoningPipe(...) — Attaches a reasoning sub-pipe. The reasoning pipe runs first and produces a chain-of-thought scratchpad. The main pipe uses that scratchpad in its prompt. Use this for complex generation tasks. For simple classification, the reasoning pipe adds latency and tokens you don’t need — skip it. The authorBuilder(...) factory takes a personality, a depth, and a duration. High depth, short duration is a good default for most use cases.

setPipeName("...") — Names the pipe for tracing, logging, and KillSwitch reports. The name appears in the TraceServer UI, in logs, in error messages. A good pipe name is a sentence fragment — "guide pipe", "sentiment classifier", "response generator". Don’t use pipe1.

enableLoreBookFillAndSplitMode() — Strategy switch for the lorebook portion of the truncation stage. The truncation stage is already running because you called setTokenBudget(...) above; this method tells that stage to use the select-and-fill strategy for lorebook entries and reserve a split budget for the rest of the top-level context. Use this when you have a populated lorebook and want guaranteed space for both lorebook entries and other context after truncation. The Autogenesis WriterAgent enables this on all three pipes.

setMiddlePrompt / autoInjectContext — Two more prompt slots beyond the system prompt. The system prompt is stable across calls. The middle prompt is per-pipe-type but reusable — usually the input/output schema explanation. The auto-injected context is dynamic per call — the story history, the lorebook, the world setting. Splitting these into three layers makes prompt maintenance sane.

init() — Loads the provider backend and gets the LLM ready to run. For Bedrock pipes, this means loading inference profile mappings from ~/.aws/inference.txt, resolving the model ID to an inference profile ARN, and initializing the AWS Bedrock Runtime client with credentials and HTTP timeouts. Without this call, bedrockClient is never created and the first execute() throws a runtime exception because the provider backend is missing. The Autogenesis code calls init() after all pipes are built, in a separate pass — this is the right pattern.

That’s the 90%. There are more — setHttpReferer, setOpenRouterTitle, setCacheControl, setServiceTier, enableStreaming, setStreamingCallback — but they cover provider-specific or advanced use cases. The settings above are what every production pipe configures.

Chaining pipes into a Pipeline

Once you have a few pipes, chain them into a Pipeline:

val pipeline = Pipeline()
    .add(extractor)
    .add(classifier)
    .add(generator)
    .init()

Order of .add() calls is the order pipes run. First pipe gets the pipeline input. Its output feeds the second pipe. Final pipe’s output is the pipeline’s output.

Pipelines aren’t simple chains — every pipe receives a MultimodalContent object and can redirect flow by writing to it. After each pipe executes, the pipeline reads the content’s control flags to decide what happens next. The flags live on the content, not on the pipeline:

terminatePipeline — halt the pipeline cleanly. Not an error, just an early exit.
passPipeline — exit early without being an error. The task is done, skip remaining steps.
repeatPipe — re-call this pipe with the same content. The pipe keeps getting called until you set it to false.
jumpToPipe — redirect execution. Empty string means sequential (next pipe). "skip-to-next-pipe" means skip ahead. A pipe name string means jump to that named pipe (can go forward or backward in the pipeline).
skipReasoningPipe — skip the reasoning sub-pipe for this turn.
interuptPipeline — fires an interrupt signal for the PumpStation harness system.
metadata — a scratch pad map. Pipes read and write it to pass signals between stages.

The pipeline evaluates these flags after every pipe execution. A pipe can set multiple flags — the pipeline processes them in priority order.

The content object is the control plane. Any pipe can directly set content.jumpToPipe to redirect execution:

// Inside a pipe's execution logic — redirect to a named pipe
if (result.sentiment == "negative") {
    content.jumpToPipe = "escalation-pipe"  // jump forward
}
if (result.needsRetry) {
    content.repeatPipe = true  // re-run this pipe
}
if (result.isComplete) {
    content.passPipeline = true  // exit early, clean
}

jumpToPipe accepts a pipe name string (forward or backward in the pipeline), or "skip-to-next-pipe" to advance one step. terminatePipeline, repeatPipe, and passPipeline work the same way — write to the content, the pipeline reads it after each pipe finishes. The Connector is a convenience pattern for key-based dispatch; jumpToPipe is the primitive that makes arbitrary redirection possible.

Always call init() at the end. Forgetting it is the most common mistake and the most embarrassing.

That’s the builder pattern. Five settings in a chain, or twenty. apply { } for the long ones. Chain into a Pipeline, set the control flags, call init(). The structure doesn’t change — only the settings do. Every pipe in TPipe follows this pattern.

The KillSwitch: Token Budgets That Actually Kill the Agent — How to configure the KillSwitch on a pipeline, with code examples for input/output token limits and custom callbacks. The 66-line Kotlin file that ends with throw.
Reasoning Pipes Explained: How TPipe Stops Prompting and Starts Programming — How reasoning pipes are attached to a main pipe, with code examples for the builder pattern this post covers.
Headless AI Agents: What, Why, and How — A production pipeline running headless 24/7. The Autogenesis deployment is the proof that the pipeline builder works at scale.
Why P2P Agent Communication Is Inevitable — How pipelines run across the P2P call chain. The pipeline builder integrates with the P2P request/response mechanism for distributed execution.

Configuring a pipe

What each setting does

Chaining pipes into a Pipeline

Related posts