§ DISPATCH NO. 004 ·2026 · live

A2UI free.

A free-tier deployment of Google's agent UI protocol, running on 92 live LLMs nobody pays for.

Stack

PythonADKLitViteDockerLiteLLM

Livea2ui-free.alkenacode.dev

The problem on file

Google's A2UI lets an agent declare its UI as JSON instead of describing it in prose. The catch is that every A2UI agent needs an LLM to generate that JSON tree, and the reference samples all assume a paid Gemini key. For a developer who wants to try the protocol end-to-end without a billing surface, that is friction at exactly the wrong moment.

The interesting work is not "swap Gemini for an OpenAI key." Free-tier providers each have different rate limits, models that get sunset without warning, OpenAI-compat shapes that disagree on auth headers, and a few models that emit visible chain-of-thought tokens that break the strict A2UI JSON parser.

The right system is one env var (LLM_MODEL) plus a council that papers over all of that.

What it is, in numbers

Live free models

Providers

Routing strategies

Council children per call

Monthly LLM bill

Cold-cache page TTFB

<2s

Cold open. Two panes, three suggested prompts, an empty council activity stream on the right. The EventSource is already open (sage indicator top-right).

Council fanning out across three free providers

First click. NVIDIA Llama 3.3 70B resolves tool decision in 2.4s. Three council children fan out: Groq rate-limits at 175ms, Mistral lands at 9.1s, Cerebras keeps working.

Two minutes later. The judge synthesizes a schema-valid response; the Lit renderer paints five restaurant cards. Footer: 'judge picked Mistral · $0 · free tier.'

The activity panel up close. Each row: provider, model, terminal state, latency, inline error. Every entry arrived over SSE from the agent's telemetry bus.

The rendered A2UI surface (restaurant cards) — Fig.The surface pane. JSON envelopes from the synthesized response are rendered by the official @a2ui/lit renderer with no extra design work — five Card components, each containing a Row, two Columns, an Image and a Button. The agent emits structure; the renderer makes it pixels.

The marquee piece, the hybrid council

The reference restaurant_finder agent has two distinct LLM steps per request. The first is tool-decision: pick get_restaurants(cuisine, location, count). The second is UI generation: emit a validated A2UI JSON tree wrapped in <a2ui-json> tags. The two steps want different model shapes.

Tool decision wants a single deterministic model that follows the OpenAI function-call schema cleanly. UI generation tolerates more variance, so it can fan out to a council. The deployed config makes that explicit:

yamlsite/docker-compose.yml

environment:
LLM_MODEL: hybrid/synthesize-best
TOOL_MODEL: nvidia/meta/llama-3.3-70b-instruct
COUNCIL_JUDGE: nvidia/meta/llama-3.3-70b-instruct
COUNCIL_SIZE: 3
COUNCIL_PROVIDERS: nvidia,groq,cerebras,mistral

Listing.Hybrid routing. The tool turn pins on one NVIDIA model. The UI turn fans out, picks the cleanest candidate, and a fourth model judges. Total cost per request: still zero.

The synthesize-best strategy is the load-bearing piece. Small fast free models (Cerebras gpt-oss-120b, Groq qwen3-32b, Mistral ministral) often emit reasoning tokens or skip the <a2ui-json> wrapper. The judge pass cleans those candidates into one schema-valid response, and the A2UI parser accepts it on the first attempt.

The council activity panel mid-run, annotated — Fig.Anatomy of one council pass. Each numbered point is published by a single telemetry event on the agent side, delivered to the panel over SSE, and reduced into a row state by the Lit reducer.

The hidden ninety percent

The hidden ninety percent· what most portfolios skip

01A weekly cron probe (discover_free_models.py) walks each provider's /v1/models catalog and writes a manifest of live, slow, dead. The auto council reads it at boot, so when NVIDIA rotates a model in a quiet release the next deploy picks up the change without code edits.
02The agent advertises its public URL through PUBLIC_URL, not its bind address. nginx fronts both client and agent on one domain; the AgentCard's `url` field points at https://a2ui-free.alkenacode.dev/agent so the browser-side A2A SDK routes subsequent calls back through nginx.
03Free-tier tool models emit int arguments as JSON strings. NVIDIA Llama 3.3 70B passes count='3' where the upstream restaurant_finder expected count=3, and all_items[:count] TypeErrors. A two-line int() coerce in the tool fixed it without changing the model.
04The hard pre-check in the upstream __main__.py refuses to boot without GEMINI_API_KEY. The deployed launcher (start.py) skips that check because the model factory routes through LLM_MODEL, and GOOGLE_AI_KEY is aliased to GEMINI_API_KEY for the parts of the SDK that still read it natively.
05CORS in the upstream sample is locked to localhost. CORS_ORIGIN is a regex env var the deployed launcher reads, so the production allow-list is ^https://a2ui-free\.alkenacode\.dev$ and nothing wider.

How the stack is laid out

The repository is Google's A2UI sources unmodified, plus a site/ subdirectory that owns the deploy. Two Docker images, both built from the same context, fronted by host nginx.

pysite/agent/start.py

public_url = os.getenv("PUBLIC_URL", "http://localhost:10002").rstrip("/")
cors_origin = os.getenv("CORS_ORIGIN", r"http://localhost:\d+")

agent = RestaurantAgent(base_url=public_url)
app = A2AStarletteApplication(
  agent_card=agent.agent_card,
  http_handler=request_handler,
).build()
app.add_middleware(
  CORSMiddleware,
  allow_origin_regex=cors_origin,
  allow_credentials=True,
  allow_methods=["*"],
  allow_headers=["*"],
)

Listing.The public launcher. No A2UI source file is modified; PUBLIC_URL and CORS_ORIGIN are read by this wrapper, then RestaurantAgent imports normally.

What lives at the URL

Open https://a2ui-free.alkenacode.dev, type a restaurant query, and watch a Lit-rendered A2UI surface fill in three restaurant cards. The agent card at /agent/.well-known/agent-card.json is the bootstrap document.

Everything after that is A2A protocol JSON-RPC. The end-to-end pass on a cold cache lands inside 30 seconds for the council step plus another 5 for tool-decision plus client render. On a warm pass with a healthy provider mix, the same flow completes in under 8 seconds.

In production

01Live at https://a2ui-free.alkenacode.dev, fronted by Let's Encrypt with the council picking 3 children from cerebras/groq/mistral per request.
0292 live free models verified across 7 providers, refreshed by a weekly cron probe. When a provider rate-limits, the next request silently picks a different child.
03Monthly LLM spend: zero. The recommended hybrid/synthesize-best config has not produced a Pollinations-style failure in 30+ probe runs.
04Source code is MIT, mirrored from github.com/Kiragu-Maina/a2ui-free. site/README.md walks anyone through the deploy in under ten minutes.
05Demonstrates the A2UI protocol on a budget Google never planned for, which is exactly the proof every developer evaluating A2UI actually needs.

Take away

The model is a parameter. Everything important happens in the orchestration around it.

Next dispatchDISPATCH NO. 005

Aether

A local-first knowledge graph PWA — Obsidian alternative with daily Pulse reports refined by the same council pattern as Shellwire.