The term “context engineering” entered the agent-framework vocabulary in 2025 and consolidated in 2026. Three definitions now anchor the conversation: Anthropic’s September 2025 framing, LangChain’s July 2025 four-strategy framing, and LlamaIndex’s July 2025 window-filling framing. They disagree on what context is, and the disagreement is where production failure modes live.
This post reads the canonical sources, identifies the failure mode each leaves on the floor, and explains what typed substrate state adds that token-set curation does not. Every claim is grounded in either a published source URL or a TPipe source-code file path.
Two definitions, three framings, one disagreement about the unit
Anthropic’s “Effective context engineering for AI agents” (September 29, 2025) opened the term with a contrast to prompt engineering the field had been doing for years. The post states it plainly: “After a few years of prompt engineering being the focus of attention in applied AI, a new term has come to prominence: context engineering.” Then it draws the line: “In contrast to the discrete task of writing a prompt, context engineering is iterative and the curation phase happens each time we decide what to pass to the model.” The unit in Anthropic’s framing is what we pass to the model. The discipline is iterative curation at every call (source).
LangChain’s “Context Engineering for Agents” (Lance Martin, July 2, 2025) made a complementary but distinct claim. The post formalised four strategies the discipline operates on:
- Write — add context to the model (notes, RAG retrieval results, prior turn outputs).
- Select — pull the relevant context from a larger pool.
- Compress — summarise long context to fit the model window.
- Isolate — split context across agents so each agent sees a subset of state.
The unit in LangChain’s framing is tokens the model reads from. The four strategies are operations on text. The framing became the most-cited definition in the ecosystem because it is concrete enough to build against, a team can pick a strategy and ship.
LlamaIndex’s “Context Engineering Guide” (Jerry Liu, July 3, 2025) is tighter still. LlamaIndex defines context engineering as “the delicate art and science of filling the context window with just the right information for the next step.” The unit is the next context window. The discipline is window-filling. Three vendors, three months, three different units.
| Definition | Unit | Discipline |
|---|---|---|
| Anthropic (Sept 2025) | What we pass to the model | Iterative curation at every call |
| LangChain (Jul 2025) | Tokens the model reads | Four operations: write, select, compress, isolate |
| LlamaIndex (Jul 2025) | The next context window | Filling the window with the right slice |
The convergence matters. The divergence matters more. The four strategies sit at a specific layer, text the model reads, and that layer is precisely where production long-horizon agents outgrow the framing.
Where the four-strategy framing breaks in production
The LangChain four-strategy framing is concrete enough to implement. It also has a production failure mode the framework layer cannot see.
Long-horizon state loses its structure under compression. Anthropic’s post acknowledges this implicitly. The recommended approach for long-running agents is “compaction”, summarise older context to keep recent context window-resident. The post is honest that compaction loses information. What the post does not address is what compaction preserves. A summary of a typed-state transaction ledger is not the ledger. A summary of a typed relationship graph is not the graph. Token-level compress produces text-shaped outputs; substrate-shaped state cannot be compressed into tokens without losing the type structure that makes the state addressable.
Select pulls from indices the substrate does not own. The LangChain select strategy assumes a retrieval pool exists. In practice the pool is a vector index or a memory store, maintained at the framework layer. The selection decision runs at the LLM call boundary, not at the substrate boundary. The decision makes no commitment about what happens when the slice is updated, that is downstream of the substrate, in a write strategy the framework may or may not expose. The four strategies describe four framework-layer operations. They do not describe what happens to the state across emissions.
Isolate at the framework layer leaks through shared state. CrewAI’s Memory class (per the CrewAI Memory docs) unifies four prior memory types, short-term, long-term, entity, external, into a single wrapper bolted onto role-based orchestration. Agents that share a Memory instance share state across roles whether or not they intended to. Isolation lives in the role definition, not in the state layer. The strategy works at the framework boundary; it cannot reach into the substrate to enforce isolation.
Write without a typed store means write into a memory object. OpenAI Agents SDK’s Session object maintains conversation history across agent runs. Session is the closest analogue to a context store in the OpenAI stack. It is a conversation-history object, text with reasoning layered on top. There is no typed key, no page-key addressing, no mutex between writers. Concurrent writes lose. Session is conversation-as-context, which is precisely the prompt engineering shape Anthropic’s framing identifies as the previous paradigm.
The four failure modes share a root cause. The strategies operate on text. The discipline operates at the LLM call boundary. Long-horizon agents accumulate state the token set cannot carry. Compression loses type information. Selection defers to a retrieval pool the framework maintains. Isolation lives in role definitions, not in state. Write means append to a memory object.
What typed substrate state adds that token-set curation does not
The Ten Trillion Triangles TPipe substrate treats context as a typed, addressable, mutate-able surface that the framework layer cannot own. Three primitives compose the discipline.
PumpStation is the runtime harness. The Ten Trillion Triangles TPipe runtime harness routes every path as a function the LLM picks by name; the path’s execution function runs before the next LLM call sees its output. The path returns through a pathTransformationFunction hook that bounds output to the substrate’s preferred density. The harness enforces curation structurally — the LLM never sees five thousand tokens of raw shell output because the harness transforms the result before the next emission. The source is Pipeline/PumpStation.kt:4294 for the transformation hook, with the path execution entry at line 508.
ContextBank is the typed, page-key addressed store. Writes are mutex-guarded: emplaceWithMutex(key: String, window: ContextWindow, ...) at Context/ContextBank.kt:559. The lock primitive is a kotlinx.coroutines.sync.Mutex declared at Context/ContextLock.kt:43, wrapped around five operations: addLockWithMutex at line 170, removeLockWithMutex at line 249, lockKeyBundleWithMutex at line 288, unlockKeyBundleWithMutex at line 327, and the ContextBank emplaceWithMutex family at line 574. Concurrent emissions against the same key serialise. The store is a typed substrate surface. Vector databases retrieve by semantic similarity over embeddings — the unit is the embedding, not the key. Key-value caches retrieve by key but the value is opaque. ContextBank retrieves by typed key against a typed store — the unit is the key, the value is typed, and the next emission’s context composes from typed reads against known keys.
Reasoning pipes are typed LLM-call methods. Eight methods are wired in: StructuredCot, ExplicitCot, processFocusedCot, BestIdea, ComprehensivePlan, RolePlay, ChainOfDraft, SemanticDecompression. Each method emits a typed data class response (StructuredCot at Structs/ModelReasoning.kt:109, BestIdeaResponse at line 213, ChainOfDraftResponse at line 419, SemanticDecompressionResponse at line 545, and four more). The data class has an unravel() method (line 51, 115, 158, 220, 291, 365) that flattens structured fields back into a thought stream the parent pipe consumes. The LLM produces typed reasoning the substrate can log, diff, and test. Provenance is verifiable at every emission.
The three primitives compose at the substrate level. They are not bolted onto a framework; they ship because the substrate owns the discipline. The framework layer is downstream.
What the canonical sources get right, and what they leave on the floor
Anthropic’s framing correctly identifies the time dimension. Context engineering is iterative. Curation runs at every call. The framing was a meaningful upgrade from one-shot prompt design, the post moves the practitioner from “write the perfect prompt” to “decide what the model sees at every call.” That is a real engineering problem. It deserves a discipline.
LangChain’s four strategies correctly identify the operations any agent framework performs. A team building with LangChain can pick a strategy and ship. The strategies are concrete. They have become the most-cited definition in the ecosystem because they map cleanly to implementation: a write strategy is a function call, a select strategy is a retrieval call, a compress strategy is a summariser, an isolate strategy is a routing rule. Frameworks can build against these.
LlamaIndex correctly narrows the scope to window-filling, which is the right framing for retrieval-first stacks. LlamaIndex’s customers care most about which chunk lands in the next context window. The framing does that job.
The failure mode is at the boundary. The four strategies operate on text the model reads. They do not address what happens to state across emissions. Long-horizon agents accumulate state the token set cannot carry. Compression loses type information. Selection defers to a retrieval pool the framework maintains. Isolation lives in role definitions, not in the substrate. The four strategies are correct at the layer they describe, and they stop at that layer.
The Ten Trillion Triangles TPipe substrate extends the framing: curation runs at every emission, but what the LLM receives is composed from typed, addressable, mutate-able state that survives the iteration loop. The framework’s four strategies correspond to operations the substrate runs at a different layer. Write becomes a typed mutation. Select becomes a typed read. Compress becomes an unravel() on a typed data class. Isolate becomes a mutex on the key. The substrate operations have type signatures where the framework operations have function calls.
The answer to “context engineering vs prompt engineering” depends on the unit
Two answers, depending on the unit being addressed.
If the unit is text in a single chat, prompt engineering and context engineering are different but adjacent. Prompt engineering writes one input. Context engineering picks what to pass to the model at every call. Both operate on text. The discipline is the difference between one curated input and many curated inputs.
If the unit is typed state across a long-running substrate, context engineering is a different category of work. Prompt engineering does not operate on the unit at all. Context engineering composes state from a typed store, transforms it on every emission, and routes reasoning through typed LLM-call methods. Anthropic’s framing identifies the time dimension (iterative vs discrete). The substrate captures the unit dimension (state vs text).
Anthropic, LangChain, LlamaIndex, and the Ten Trillion Triangles TPipe substrate are saying the same thing at different layers. The discipline moved from prompt-as-text to context-as-typed-state, and the engineering moved from one-shot design to iterative composition. The four-strategy framing named the discipline. Anthropic clarified its relationship to prompt engineering. The substrate operates on the unit the discipline was pointing at.