Cascadia×Otari
integration · Otari + Cascadia

One gateway in front of an on-prem mesh.

The same agentic demos, routed through Otari — Mozilla.ai’s self-hostable, OpenAI-compatible gateway — which meters and budgets every call before it reaches a Cascadia mesh of Intel AI PCs. Otari is wired to Cascadia as a first-class cascadia provider in any-llm; nothing about the gateway is Cascadia-specific.

clientBrowser

The four agentic-workflow demos. Orchestrates the pipeline step-by-step and renders per-step node IDs + signed receipts. Holds no secrets.

step UISSE token streamaudit view
fetch /api/* (same-origin)
edgeVercel — Next.js + /api/* routes

Serverless functions run every LLM call server-side, so the Otari virtual key never reaches the browser. This is the only thing deployed to the cloud.

server-held virtual keySSE relay/api/usage · /api/gateway
OpenAI /v1 · Bearer <virtual key> · model = cascadia:<id>
gatewayOtariOtari· Mozilla.ai · OpenAI-compatible LLM gateway

Authenticates the virtual key, enforces its dollar budget, prices and meters every call into a usage ledger, then resolves the provider through any-llm. Runs self-hosted (SQLite, standalone) on the miner, exposed via Tailscale Funnel.

virtual keysbudget enforcementusage + cost meteringPrometheus /metrics
any-llm split_model_provider → cascadia
providerany-llm cascadia provider· thin BaseOpenAIProvider subclass

Splits cascadia:<id>, forwards the bare model id to the Cascadia coordinator over its OpenAI-compatible API, and passes the response through unchanged — including the cascadia receipt block.

provider registrationkeyless-or-keyed authtransparent passthrough
OpenAI /v1 (coordinator)
meshCascadia meshCascadia mesh· Community Labs · on-prem inference

Open-weight 8B-class models split into INT4 OpenVINO shards across a fleet of Intel AI PCs (one model pipeline-parallel across two machines). Every response carries the serving node ID and a signed receipt. Zero bytes leave the premises.

INT4 OpenVINO shardsdistributed fleetsigned receipts

The data plane stops at the mesh — no cloud GPU is touched at any layer. Swap the mesh for any OpenAI-compatible backend and the gateway, keys, budgets, and this UI are unchanged.