Otari × Cascadia · gateway + on-prem mesh
Real agentic workflows.
Metered by Otari. Every token on-prem.
Four regulated-industry agents running live through Otari — Mozilla.ai’s self-hostable, OpenAI-compatible gateway — onto a Cascadia mesh: open-weight 8B-class models spread across a room of Intel AI PCs, one pipeline-parallel across two machines. Otari authenticates a virtual key, enforces a budget, and meters every call; each step still shows the serving node, latency, and signed receipt.
Healthcare
Clinical referral triage
extract → triage criteria → urgency → ICD-10 coding assist → schedule → SBAR + letters → safety gate
Why on-prem: PHI never leaves the premises
Run the demo →Finance
KYC onboarding + AML screening
extract → watchlist screen → adjudicate hits → adverse media → risk rules → MLRO memo → policy gate
Why on-prem: BSA/AML · SAR-adjacent confidentiality
Run the demo →Finance
Financial model builder
extract assumptions → DCF + scenarios + Monte Carlo (deterministic) → IC valuation memo → QA gate
Why on-prem: Deal data stays on the premises
Run the demo →Government
FOIA redaction
intake → responsiveness → PII detection → exemption review → redact + verify (zero-leak) → response letter → gate
Why on-prem: IRS Pub 1075 — FTI on-prem is the compliant default
Run the demo →Pulled live from the Otari gateway’s own ledger — every call below was authenticated against a virtual key, priced, and metered by the gateway before the tokens were served on-prem by Cascadia.
loading usage…
The fleet
| qwen3-8b | single node | extraction · classification · adjudication · QA gates |
| llama-8b-2stage | pipeline-parallel × 2 AI PCs | long-form synthesis, streamed live off the chain |
| phi-3.5-mini | single node | JSON repair rung · gate fallback |