Otari × Cascadia · gateway + on-prem mesh

Real agentic workflows.
Metered by Otari. Every token on-prem.

Four regulated-industry agents running live through Otari — Mozilla.ai’s self-hostable, OpenAI-compatible gateway — onto a Cascadia mesh: open-weight 8B-class models spread across a room of Intel AI PCs, one pipeline-parallel across two machines. Otari authenticates a virtual key, enforces a budget, and meters every call; each step still shows the serving node, latency, and signed receipt.

See the architecture →What is Otari? ↗

Healthcare

Clinical referral triage

extract → triage criteria → urgency → ICD-10 coding assist → schedule → SBAR + letters → safety gate

Why on-prem: PHI never leaves the premises

Run the demo →

Finance

KYC onboarding + AML screening

extract → watchlist screen → adjudicate hits → adverse media → risk rules → MLRO memo → policy gate

Why on-prem: BSA/AML · SAR-adjacent confidentiality

Run the demo →

Finance

Financial model builder

extract assumptions → DCF + scenarios + Monte Carlo (deterministic) → IC valuation memo → QA gate

Why on-prem: Deal data stays on the premises

Run the demo →

Government

FOIA redaction

intake → responsiveness → PII detection → exemption review → redact + verify (zero-leak) → response letter → gate

Why on-prem: IRS Pub 1075 — FTI on-prem is the compliant default

Run the demo →

usage & budget/v1/usage · /v1/budgets

Pulled live from the Otari gateway’s own ledger — every call below was authenticated against a virtual key, priced, and metered by the gateway before the tokens were served on-prem by Cascadia.

loading usage…

The fleet

qwen3-8b	single node	extraction · classification · adjudication · QA gates
llama-8b-2stage	pipeline-parallel × 2 AI PCs	long-form synthesis, streamed live off the chain
phi-3.5-mini	single node	JSON repair rung · gate fallback

Real agentic workflows.Metered by Otari. Every token on-prem.

Real agentic workflows.
Metered by Otari. Every token on-prem.