Compounding Primitives
The leverage in agentic development isn't any single library. It's the stack of primitives you build underneath, where each layer makes the next one obvious and every project subsidizes the one after it.
A few weeks ago a new service needed a model-fallback ladder: try the cheap, fast model first, and only climb to the expensive one when the cheap one fails or refuses. It is the kind of feature that, two years ago, would have eaten a week. Provider quirks, retry semantics, streaming reassembly, cost accounting, the careful bookkeeping of “did this actually fail, or did it just answer in a way I didn’t like.”
It took an afternoon.
Not because the problem got easier, but because every hard part of it was already a primitive I had built and rebuilt enough times to finally get right. The ladder was forty lines of composition over a foundation that had been paid for, with interest, by the projects that came before it. That afternoon is the argument I want to make here. The leverage in agentic development is not any single clever library. It is the stack of primitives underneath it, where each layer makes the next one obvious, and every project quietly subsidizes the one after it.
The altitude problem
Most code that talks to a language model is written at the wrong altitude. You open a stream, parse server-sent events, accumulate tool-call fragments across deltas, dispatch the calls, feed the results back, loop, and special-case the three providers that each disagree about what a “finish reason” means. It works. It is also the same two hundred lines in every repository, subtly different each time, wrong in a new place each time.
The tell is that the code reads like plumbing instead of intent. This is the altitude I mean, the loop everyone writes:
open stream
for each event:
if event is content delta: append to buffer
if event is tool-call delta: merge into the right partial call
if event is finish: break
parse accumulated tool calls
for each tool call: look up handler, parse args, run, format result
append results to messages
go again, hoping you got the provider's message shape right
And here is the altitude I want:
final, transcript := runTools(ctx, model, messages, tools)
Same behavior. One of them is something you maintain; the other is something you use. A good primitive collapses a paragraph of mechanism into a single line of meaning. What matters is that it does so in a way that reads the same whether the model underneath is the cheap one or the expensive one, this vendor or that one. The abstraction is the point. The intent survives and the plumbing disappears.
I did not arrive at that one-liner by being clever. I arrived at it by writing the paragraph version four or five times across different projects, getting annoyed, and finally building the layer that meant I would never write it again.
Layer one: the provider primitive
The bottom of the stack is the boring, essential layer that makes every model look the same from the outside. One interface. Many providers behind it. The contract is small enough to hold in your head and uniform enough that swapping the model underneath is a one-line change rather than a rewrite.
Three decisions on this layer have paid for themselves over and over:
Streaming is a channel that closes exactly once. No callbacks, no half-states. You range over it, and when it ends it ends. The final value carries either “done” or “here is the error,” and then the channel closes. What took me longest to get right is what happens when a consumer stops reading. Go’s default there is unforgiving: the producer blocks forever on a send nobody is receiving, and that goroutine leaks (the memory it pins is just the downstream symptom). The fix is unglamorous. The producer selects on the caller’s context, so when the consumer walks away and cancels, the blocked send unwinds instead of hanging. The channel closing isn’t what saves you; the context cancellation is. But you build that in once, on this layer, so nothing above ever has to think about it again.
Structured output comes from a type, not a prompt. You describe the shape you want as an ordinary struct and ask for it back:
recipe := generateTyped[Recipe](ctx, model, messages)
The schema is derived from the type, and the response is parsed back into it. When the model returns something that doesn’t quite fit, a bounded repair pass feeds the parse error back and asks it to try again, automatically, because that retry is mechanism, and mechanism belongs in the library. It is not magic; the repair can still run out of attempts and fail. But the failing is the library’s problem to surface cleanly, not yours to hand-roll for the hundredth time.
Configuration composes. Every per-call option is a small, pure function you can mix freely: temperature, max tokens, tool choice, caching. No giant config struct, no builder you have to construct in the right order. The options read like a sentence and combine like Lego.
None of this is novel on its own. The compounding comes from the fact that once it exists, it exists, and everything above it gets to assume it.
Layer two: the tool primitive
On top of the provider layer sits the agent’s hands: the tools it can call, and the loop that lets it call them. This is where most agentic systems accumulate their worst code, because the tool loop is genuinely fiddly and everyone writes it in a hurry.
The primitive that fixed this for me has two properties.
First, the loop is one call. runTools takes the model, the conversation, and a registry of tools, and it runs the whole model→tool→model cycle until the model stops asking for tools or hits an iteration ceiling. It returns the final answer and the full transcript, so the caller can inspect everything that happened without having driven it by hand.
Second, each tool handler depends only on the narrow slice of the world it actually needs. A tool that searches the codebase takes a “thing that can search,” not the entire application. A tool that writes a file takes a “thing that can write.” Adding a new capability is additive: you define a small interface, register a handler, and nothing else has to change. The blast radius of a new tool is the new tool. None of this is new. It is the interface-segregation principle with a fresh coat of paint. The rare part is not knowing the principle; it is holding the line on it when you have fifty tools and a deadline and a god object would be one import away.
That property is what let a downstream system grow to fifty-some tools without the tool layer collapsing under its own weight. New tools didn’t inherit a monolith. They borrowed exactly what they needed and left the rest alone. The seam scaled because it was a seam, not a god object.
Layer three: the primitives a product actually wants
Now the compounding shows up. With those two layers underneath, the things a real product needs stop being projects and start being afternoons.
The model-fallback ladder I opened with: a thin policy over “call a model, recognize a real failure, climb.” Cheap to write because call a model and recognize a failure were already solved.
A per-tenant model pool, where each customer brings their own keys and every call is metered against their account: mostly bookkeeping, because the call itself was already a uniform primitive that didn’t care whose key it used.
An evaluation harness for agent behavior, with declarative assertions (this tool was called, in this order, with these arguments; the final answer satisfies this rubric) built on top of the same transcript that runTools already returned. The eval harness didn’t need a new way to run agents. It needed to watch the way that already existed.
Three different products, every one of them built by the same team of one: me. That is the honest shape of this. I am not describing a platform that a hundred engineers lean on; I am describing primitives that earned their keep against a single demanding user first, the stage that comes before you hand anything to anyone else, not a substitute for it. But the curve is the same shape at either scale. Each product was shorter than it had any right to be because the two layers below had already been paid for. The work is front-loaded into primitives, and the primitives keep paying out.
The ones that didn’t pay off
I want to be careful not to draw this curve as if it only ever bends up. Not every primitive earned its keep, and the ones that didn’t taught me more than the ones that did.
The clearest miss: early on I built an elaborate abstraction for multi-step plans, a way for an agent to declare its intended sequence of actions before taking any of them, so I could validate the whole plan up front. It was elegant. I threw it away. The models were already choosing each next step from the result of the last one, and freezing that into a plan fought the grain of how they actually work. I had built a primitive around a thing that wanted to stay concrete. The lesson is the same one that produced the good primitives, just from the other side: an abstraction is only worth building once you have written the painful version enough times to know its real shape. Build it too early and you are not compounding. You are guessing in advance, with more ceremony.
Knowing what to take out
There is one more failure mode I want to name, because compounding is not only about adding.
At one point the foundation layer had grown a heavy observability dependency baked into its core, the kind of transitive weight that every single consumer paid for, whether or not they used it. Pulling in the base primitive meant pulling in twenty packages of tracing machinery you might never touch. So I took it out. Observability moved to an optional layer you opt into; the core went back to being cheap to import.
That was a breaking change, made on purpose, with a migration path written for it. It is also, to me, the more senior move than any feature. Compounding works only if the base layer stays light enough that everything above it stays cheap. The moment your foundation gets expensive, every floor you build on it inherits the tax. Knowing what to remove, and being willing to break your own callers to do it, is how you keep the curve bending the right way.
The shape of it
Step back far enough and the three layers are the same idea told three times. A provider primitive so the model is interchangeable. A tool primitive so capability is additive. Product primitives so features are compositions. Each one exists so the layer above it can assume it and stop thinking about it.
I think this is what operating a platform actually looks like, as opposed to shipping a product. A product is a thing you finish. A platform is a set of primitives where the return on the last one you built shows up in how fast you build the next. The afternoon that should have been a week was never really about a fallback ladder. It was the interest payment on everything underneath it.
Build the library you keep rewriting. Then build the one that assumes it.