One gateway in front of an on-prem mesh.
The same agentic demos, routed through Otari — Mozilla.ai’s self-hostable, OpenAI-compatible gateway — which meters and budgets every call before it reaches a Cascadia mesh of Intel AI PCs. Otari is wired to Cascadia as a first-class cascadia provider in any-llm; nothing about the gateway is Cascadia-specific.
The four agentic-workflow demos. Orchestrates the pipeline step-by-step and renders per-step node IDs + signed receipts. Holds no secrets.
Serverless functions run every LLM call server-side, so the Otari virtual key never reaches the browser. This is the only thing deployed to the cloud.
Authenticates the virtual key, enforces its dollar budget, prices and meters every call into a usage ledger, then resolves the provider through any-llm. Runs self-hosted (SQLite, standalone) on the miner, exposed via Tailscale Funnel.
Splits cascadia:<id>, forwards the bare model id to the Cascadia coordinator over its OpenAI-compatible API, and passes the response through unchanged — including the cascadia receipt block.
Open-weight 8B-class models split into INT4 OpenVINO shards across a fleet of Intel AI PCs (one model pipeline-parallel across two machines). Every response carries the serving node ID and a signed receipt. Zero bytes leave the premises.
The data plane stops at the mesh — no cloud GPU is touched at any layer. Swap the mesh for any OpenAI-compatible backend and the gateway, keys, budgets, and this UI are unchanged.