Configuring a pipe
The simplest pipe config:
val pipe = BedrockMultimodalPipe()
.setRegion("us-west-2")
.setModel("anthropic.claude-3-haiku-20240307-v1:0")
.setTemperature(0.7)
.setSystemPrompt("You are a helpful assistant.")
.init()
runBlocking {
val result = pipe.execute("Hello, world!")
println(result.text)
}
Region, model, temperature, prompt, init, run. Five lines of config. Most production pipes are 10-20 lines.
For longer configs, group with apply { }:
val guidePipe = BedrockMultimodalPipe().apply {
useConverseApi()
setRegion("us-west-2")
setModel(BedrockConfig.qwen235B)
setTemperature(1.0)
setTopP(.9)
requireJsonPromptInjection()
setJsonInput(PlayerStoryInput::class)
setJsonOutput(GuideData::class)
setTokenBudget(BedrockConfig.generativeBudgetSettings)
setReasoningPipe(BedrockConfig.authorBuilder(
effectiveAuthorPersonality,
depth = ReasoningDepth.High,
duration = ReasoningDuration.Short
))
setPipeName("guide pipe")
enableLoreBookFillAndSplitMode()
val systemPrompt = """You are a guide generation agent...""".trimMargin()
val middlePrompt = """Your user prompt will contain...""".trimMargin()
val context = """You have been provided with as much...""".trimMargin()
setSystemPrompt(systemPrompt)
setMiddlePrompt(middlePrompt)
autoInjectContext(context)
}
apply { } runs the block in the context of the pipe — every call inside is a method on the pipe. The block’s value is the pipe itself. It’s a style choice. Both compile to the same place. Use it when the config is long enough that chained calls become a wall.
The settings on this pipe are real. They come from the Autogenesis WriterAgent, which runs three pipes like this — guide, selection, writing — for 100+ turn TTRPG sessions. Here’s what each one does.
What each setting does
I’m only covering the ones you’ll touch in production. The full list is in the Bedrock docs. These are the 90%.
useConverseApi() — Use AWS Bedrock’s Converse API. Converse is the unified interface for multi-model access. New pipes should always use it. The Autogenesis team uses it everywhere.
setRegion / setModel — Where the inference runs and which model you hit. setModel always takes a string — either a model ID like anthropic.claude-3-haiku-20240307-v1:0 or a full ARN. Save the IDs you use to constants on a config object (like BedrockConfig.qwen235B) so you don’t have to copy-paste them. For cross-region models, call bedrockEnv.bindInferenceProfile(modelId, arn) first to map the ID to the ARN, or pass the ARN directly.
setTemperature / setTopP — Sampling. 0.0 is deterministic, 1.0 is creative. For extraction and classification, use 0.0-0.3. For creative generation, use 0.7-1.0. Top-p of 0.9 is a reasonable default for most cases — it caps the cumulative probability mass the model considers when sampling the next token, so you keep diversity without wandering into garbage. The Autogenesis guide pipe uses 1.0/0.9 because it generates creative narrative suggestions. The selection pipe uses lower values because it’s picking from a fixed set of options.
setSystemPrompt — The role. “You are a…” Stable across calls. Goes into the system message slot of the prompt.
setJsonInput / setJsonOutput — The types. You use these to make the LLM return typed data. setJsonInput declares the Kotlin class that describes the expected input structure. setJsonOutput declares the Kotlin class that describes the expected output structure. TPipe serializes the input type into the user prompt, embeds the output schema in the system prompt, and validates the LLM’s response against the output schema at the boundary. The LLM is bound by the schema in the prompt. The pipe enforces the schema at the boundary. The Kotlin types are the contract.
requireJsonPromptInjection() — Makes setJsonOutput work by injecting the JSON schema descriptions into the prompt. Pair them — requireJsonPromptInjection() + setJson* is the standard combo.
setTokenBudget(...) — This is the memory management system, not just a cap. Calling setTokenBudget(...) activates TPipe’s runtime context algorithm. At config time, the pipe tokenizes the system prompt, max output, reasoning budget, and user prompt size, subtracts them from the context window, and throws if the configuration itself would overflow. At runtime, before each LLM call, the pipe runs the truncation stage: lorebook selection by priority or weight, multi-page budget allocation across MiniBank pages, text-matching preservation that keeps content matching user-prompt keywords, and either truncates or compresses whatever overflows. Working with KillSwitch (the hard ceiling that halts the pipeline if tokens truly run out), this is the layer that keeps the agent from forgetting important context, drifting under pressure, or drowning in oversized context.
Here’s how the budget object actually gets built:
val budget = TokenBudgetSettings().apply {
contextWindowSize = 32_000 // total context window in tokens
maxTokens = 4_000 // reserve for LLM output
reasoningBudget = 2_000 // reserve for reasoning sub-pipe output
subtractReasoningFromInput = true // carve reasoning from input, not from maxTokens
userPromptSize = 8_000 // cap on user prompt reservation
allowUserPromptTruncation = true // let the algorithm shrink the prompt to fit
preserveJsonInUserPrompt = true // keep JSON structure intact during truncation
compressUserPrompt = false // truncate, don't compress (safer for typed I/O)
preserveTextMatches = true // keep content matching user-prompt keywords
truncationMethod = ContextWindowSettings.TruncateMiddle // chop from both ends
multiPageBudgetStrategy = MultiPageBudgetStrategy.DYNAMIC_SIZE_FILL // default
pageWeights = mapOf("story" to 2.0, "lorebook" to 1.0) // for WEIGHTED_SPLIT
reserveEmptyPageBudget = false // don't reserve budget for empty MiniBank pages
}
pipe.setTokenBudget(budget)
The fields, briefly:
contextWindowSize/maxTokens/reasoningBudget/userPromptSize— the four reservations. They get subtracted from the total window; the remainder is available for lorebook and context elements. The pipe throws at config time if these add up to more thancontextWindowSize.subtractReasoningFromInput— whentrue, the reasoning budget comes out of the input side. Whenfalse(default), it’s carved out of the outputmaxTokens.allowUserPromptTruncation— whentrue, the algorithm shrinks the user prompt to fit if other content is too large. Whenfalse(default), it throws on overflow instead.compressUserPrompt— whentrue, the algorithm compresses the user prompt rather than truncating it. Faster, but compression doesn’t preserve JSON structure and the pipe can’t validate compressed output the way it validates truncated output. For pipes usingsetJsonInput/setJsonOutput, leave this off.preserveTextMatches— whentrue, items containing words from the user prompt are kept before other content gets truncated. This is the “important context survives” guarantee.truncationMethod—TruncateTopchops from the start,TruncateBottomchops from the end,TruncateMiddlechops from both ends evenly.TruncateTopis the default.multiPageBudgetStrategy— see below.pageWeights— only consulted byWEIGHTED_SPLIT. Map of MiniBank page key to weight.reserveEmptyPageBudget— whentrue, empty MiniBank pages still reserve a portion of the budget. Whenfalse, the budget is only divided across pages that have content.DYNAMIC_FILLandDYNAMIC_SIZE_FILLoverride this and redistribute to active pages regardless.
Multi-page budget strategies (for MiniBank-tracked content, setMultiPageBudgetStrategy(...) or via the field above):
EQUAL_SPLIT— every page gets the same share of the budget. Predictable, dumb.WEIGHTED_SPLIT— pages get shares proportional topageWeights. Use this when you know which pages matter more (e.g."story">"notes").PRIORITY_FILL— walks pages in declared order, fills each up to its current need, exhausts the budget as it goes. First pages win.DYNAMIC_FILL— starts with priority fill, simulates actual usage after truncation, then redistributes any unused budget across up to 3 passes. Smart fill.DYNAMIC_SIZE_FILL—DYNAMIC_FILLbut prioritizes smaller contexts to protect them from being squeezed out. This is the default.
Lorebook truncation modes (the other half of the algorithm, switched on at the pipe level):
pipe.autoTruncateContext()— turn on automatic truncation at runtime. Without this, the algorithm doesn’t run at execution time even if you’ve set a budget.pipe.enableLoreBookFillMode()— select-and-fill strategy. Top-weighted lorebook entries get selected first, the remaining budget goes to other context. Use this when lorebook entries are the priority.pipe.enableLoreBookFillAndSplitMode()— fill mode + reserves a split budget for the rest of the top-level context window. Best for long-running agents where recent context matters more than old context. The Autogenesis WriterAgent uses this on all three pipes.
setReasoningPipe(...) — Attaches a reasoning sub-pipe. The reasoning pipe runs first and produces a chain-of-thought scratchpad. The main pipe uses that scratchpad in its prompt. Use this for complex generation tasks. For simple classification, the reasoning pipe adds latency and tokens you don’t need — skip it. The authorBuilder(...) factory takes a personality, a depth, and a duration. High depth, short duration is a good default for most use cases.
setPipeName("...") — Names the pipe for tracing, logging, and KillSwitch reports. The name appears in the TraceServer UI, in logs, in error messages. A good pipe name is a sentence fragment — "guide pipe", "sentiment classifier", "response generator". Don’t use pipe1.
enableLoreBookFillAndSplitMode() — Strategy switch for the lorebook portion of the truncation stage. The truncation stage is already running because you called setTokenBudget(...) above; this method tells that stage to use the select-and-fill strategy for lorebook entries and reserve a split budget for the rest of the top-level context. Use this when you have a populated lorebook and want guaranteed space for both lorebook entries and other context after truncation. The Autogenesis WriterAgent enables this on all three pipes.
setMiddlePrompt / autoInjectContext — Two more prompt slots beyond the system prompt. The system prompt is stable across calls. The middle prompt is per-pipe-type but reusable — usually the input/output schema explanation. The auto-injected context is dynamic per call — the story history, the lorebook, the world setting. Splitting these into three layers makes prompt maintenance sane.
init() — Loads the provider backend and gets the LLM ready to run. For Bedrock pipes, this means loading inference profile mappings from ~/.aws/inference.txt, resolving the model ID to an inference profile ARN, and initializing the AWS Bedrock Runtime client with credentials and HTTP timeouts. Without this call, bedrockClient is never created and the first execute() throws a runtime exception because the provider backend is missing. The Autogenesis code calls init() after all pipes are built, in a separate pass — this is the right pattern.
That’s the 90%. There are more — setHttpReferer, setOpenRouterTitle, setCacheControl, setServiceTier, enableStreaming, setStreamingCallback — but they cover provider-specific or advanced use cases. The settings above are what every production pipe configures.
Chaining pipes into a Pipeline
Once you have a few pipes, chain them into a Pipeline:
val pipeline = Pipeline()
.add(extractor)
.add(classifier)
.add(generator)
.init()
Order of .add() calls is the order pipes run. First pipe gets the pipeline input. Its output feeds the second pipe. Final pipe’s output is the pipeline’s output.
Pipelines aren’t simple chains — every pipe receives a MultimodalContent object and can redirect flow by writing to it. After each pipe executes, the pipeline reads the content’s control flags to decide what happens next. The flags live on the content, not on the pipeline:
terminatePipeline— halt the pipeline cleanly. Not an error, just an early exit.passPipeline— exit early without being an error. The task is done, skip remaining steps.repeatPipe— re-call this pipe with the same content. The pipe keeps getting called until you set it to false.jumpToPipe— redirect execution. Empty string means sequential (next pipe)."skip-to-next-pipe"means skip ahead. A pipe name string means jump to that named pipe (can go forward or backward in the pipeline).skipReasoningPipe— skip the reasoning sub-pipe for this turn.interuptPipeline— fires an interrupt signal for the PumpStation harness system.metadata— a scratch pad map. Pipes read and write it to pass signals between stages.
The pipeline evaluates these flags after every pipe execution. A pipe can set multiple flags — the pipeline processes them in priority order.
The content object is the control plane. Any pipe can directly set content.jumpToPipe to redirect execution:
// Inside a pipe's execution logic — redirect to a named pipe
if (result.sentiment == "negative") {
content.jumpToPipe = "escalation-pipe" // jump forward
}
if (result.needsRetry) {
content.repeatPipe = true // re-run this pipe
}
if (result.isComplete) {
content.passPipeline = true // exit early, clean
}
jumpToPipe accepts a pipe name string (forward or backward in the pipeline), or "skip-to-next-pipe" to advance one step. terminatePipeline, repeatPipe, and passPipeline work the same way — write to the content, the pipeline reads it after each pipe finishes. The Connector is a convenience pattern for key-based dispatch; jumpToPipe is the primitive that makes arbitrary redirection possible.
Always call init() at the end. Forgetting it is the most common mistake and the most embarrassing.
That’s the builder pattern. Five settings in a chain, or twenty. apply { } for the long ones. Chain into a Pipeline, set the control flags, call init(). The structure doesn’t change — only the settings do. Every pipe in TPipe follows this pattern.
Related posts
- The KillSwitch: Token Budgets That Actually Kill the Agent — How to configure the KillSwitch on a pipeline, with code examples for input/output token limits and custom callbacks. The 66-line Kotlin file that ends with
throw. - Reasoning Pipes Explained: How TPipe Stops Prompting and Starts Programming — How reasoning pipes are attached to a main pipe, with code examples for the builder pattern this post covers.
- Headless AI Agents: What, Why, and How — A production pipeline running headless 24/7. The Autogenesis deployment is the proof that the pipeline builder works at scale.
- Why P2P Agent Communication Is Inevitable — How pipelines run across the P2P call chain. The pipeline builder integrates with the P2P request/response mechanism for distributed execution.