Why the orchestrator is the product
Picking a model is the easy part. The product, the thing buyers pay for, lives in the routing layer above it.
For about eighteen months I have been running an editorial AI in production on free-tier providers. Twelve of them. Every week one of them goes through some kind of weather, a rate-limit change, a model deprecation, a billing dispute, an outage, a chain-of-thought leak. The system keeps shipping its newspaper.
The reason it keeps shipping is not the model. The reason is the orchestrator.
What "orchestrator" actually means
Most people use the word for a workflow engine: Airflow, Prefect, Temporal. I mean something narrower. The call site between an application that wants an answer and a fleet of providers that may or may not give it. The orchestrator is responsible for:
- Picking the right strategy for the task. A classification job wants majority-vote across cheap models. A weekly narrative wants synthesize-best across complex models. A growth report wants ranked-fallback at high council size because it has to survive cascading rate limits.
- Knowing which providers are healthy right now. A 5-minute cron pings each provider with the prompt "Reply with exactly: OK". The orchestrator only routes to nodes the monitor marks healthy.
- Enforcing the budget. Per-provider Bottleneck RPM, Redis-tracked daily quotas. The orchestrator refuses to call a provider it has maxed.
- Logging every vote. Every provider attempt, with latency and tokens, is fire-and-forget logged to Postgres. The vote is replayable, queryable, auditable.
That is what the buyer is actually buying. The model is a parameter.
The case for thinking this way
Two things change when the orchestrator is the product:
- Provider risk becomes a configuration problem. A new free-tier provider opens, you add an adapter, you add their models to the registry, the orchestrator picks them up at the next health check. A provider closes, you remove the adapter, the orchestrator routes around them automatically.
- Cost becomes a routing decision. Cheap-tier classification jobs go to providers with high RPM and small models. Complex-tier narrative jobs go to providers with capable models and lower volume. The same code can target a different cost surface by changing one tier flag.
This is the part of the system that does not show up in a demo. It is also the part that gets you on a Zoom call with a CTO.
A small invariant
The strategy switch in my orchestrator is twelve lines:
switch (strategy) {
case 'single-query': result = await this.singleQuery(request, providers); break;
case 'majority-vote': result = await this.majorityVote(request, providers); break;
case 'synthesize-best': result = await this.synthesizeBest(request, providers); break;
case 'ranked-fallback': result = await this.rankedFallback(request, providers); break;
default: result = await this.singleQuery(request, providers);
}That switch is the entire surface area of the strategy decision. Everything else, the rate limiter, the registry, the health monitor, the analytics writer, is composed around it. Small, legible, testable.
If you are building on top of LLM providers and you have not written your version of this switch, that is the work. Not the prompt. Not the model. The routing.