Provider routing / fallback
What you will achieve
Section titled “What you will achieve”Pass a models array to route(). It returns a RouteResult with servedBy (which model answered), attempts (the full attempt log), and everything from complete(). When all models share the openrouter/ prefix, one request is sent with OpenRouter’s native models array; when models span different providers, each is tried in order until one succeeds.
When and why
Section titled “When and why”Provider routing solves two problems:
High availability: If your primary model is overloaded or down, fall over to a secondary automatically. Rate limits, 5xx errors, model deprecations, and transient network failures are all retryable — the next model in the list gets the request.
Cost optimisation: List an expensive high-quality model first, a cheaper fallback second. The fast path uses the best model; fallback saves money when the best is unavailable.
When NOT to route: If every failure is retryable, routing is fine. If the request itself is invalid (bad schema, content policy violation, auth error), routing to another model will not help — route() detects this and fails fast rather than wasting quota.
Step by step
Section titled “Step by step”Step 1 — Server-side routing via OpenRouter
Section titled “Step 1 — Server-side routing via OpenRouter”When every model in the list is openrouter/*, route() sends a single request with OpenRouter’s native models array. OpenRouter selects the first available model server-side — one network round-trip, zero extra latency:
import { route } from '@combycode/llm-sdk';
const res = await route({ models: [ 'openrouter/anthropic/claude-opus-4.8', 'openrouter/anthropic/claude-sonnet-4-5', 'openrouter/google/gemini-2.0-flash', ], apiKey: process.env.OPENROUTER_API_KEY, prompt: 'Reply with exactly: OK', maxTokens: 8,});
console.log(res.servedBy); // which OpenRouter model actually served itconsole.log(res.text); // 'OK'OpenRouter strips the openrouter/ prefix from model ids before sending (e.g. anthropic/claude-opus-4.8), and returns the serving model in response.model. servedBy is set from that field.
Step 2 — Client-side fallback across providers
Section titled “Step 2 — Client-side fallback across providers”When models span multiple providers, route() tries each in sequence:
import { route } from '@combycode/llm-sdk';
// Engine pre-configured with multiple keys:import { createEngine } from '@combycode/llm-sdk';
const engine = createEngine({ apiKeys: { anthropic: process.env.ANTHROPIC_API_KEY!, openai: process.env.OPENAI_API_KEY!, google: process.env.GOOGLE_API_KEY!, },});
const res = await route({ engine, models: [ 'anthropic/claude-opus-4.8', 'openai/gpt-4o', 'google/gemini-2.0-flash', ], prompt: 'What is 2 + 2?', maxTokens: 16,});
console.log(res.servedBy); // 'anthropic/claude-opus-4.8' (first to succeed)console.log(res.attempts); // [{ model: 'anthropic/claude-opus-4.8' }]Each attempt calls complete() internally with the candidate model. On a retryable error, the model is added to attempts with its error, and the next is tried. On success, servedBy is the successful model id.
Step 3 — Inspect the attempt log
Section titled “Step 3 — Inspect the attempt log”When a fallback fires, attempts carries the full error history:
const res = await route({ models: ['anthropic/claude-opus-4.8', 'openai/gpt-4o'], engine, prompt: 'Ping', maxTokens: 8,});
for (const a of res.attempts) { if (a.error) { console.log(`${a.model} [${a.kind}]: ${a.error}`); // e.g. 'anthropic/claude-opus-4.8 [rate_limit]: 429 Too Many Requests' } else { console.log(`${a.model} -- served`); }}a.kind is an ErrorKind string from the SDK’s error taxonomy (see below).
Step 4 — Override which errors trigger fallback
Section titled “Step 4 — Override which errors trigger fallback”By default, route() falls back on: rate_limit, server_error, model_not_found, timeout, network, quota_exceeded, unsupported. It does NOT fall back on auth, invalid_request, content_filter, or context_overflow — those won’t be fixed by trying another model:
import { route } from '@combycode/llm-sdk';
// Only fall back on rate limits and server errorsconst res = await route({ models: ['anthropic/claude-opus-4.8', 'anthropic/claude-haiku-4-5'], engine, prompt: 'Hello', maxTokens: 16, fallbackOn: ['rate_limit', 'server_error'],});Step 5 — Combine with all complete() options
Section titled “Step 5 — Combine with all complete() options”RouteOptions extends CompleteOptions (minus model and provider, which are replaced by models). Every complete() option works:
const res = await route({ models: ['openrouter/anthropic/claude-opus-4.8', 'openrouter/google/gemini-2.0-flash'], apiKey: process.env.OPENROUTER_API_KEY, system: 'You are a concise assistant.', prompt: 'Describe TypeScript in one sentence.', maxTokens: 64, temperature: 0.3, structured: { schema: { type: 'object', properties: { description: { type: 'string' } }, required: ['description'], additionalProperties: false, }, },});console.log(res.parsed?.description); // auto-parsed JSONYour options
Section titled “Your options”RouteOptions:
| Option | Type | Required | Notes |
|---|---|---|---|
models | string[] | Yes | Ordered candidate list. Namespaced ids (provider/model) are preferred. Single-element arrays work — reduces to a plain complete(). |
fallbackOn | ErrorKind[] | No | Overrides the default retryable error set. |
All CompleteOptions except model, provider | various | — | Passed through to each complete() attempt unchanged. |
ErrorKind values (from LLMError.kind):
| Kind | Fallback by default | Typical cause |
|---|---|---|
rate_limit | Yes | 429 from the provider |
server_error | Yes | 5xx from the provider |
model_not_found | Yes | Model id not available on this key |
timeout | Yes | Network or provider timeout |
network | Yes | TCP/DNS failure |
quota_exceeded | Yes | Monthly cap hit |
unsupported | Yes | Feature not supported by this model |
auth | No | 401/403 — invalid key |
invalid_request | No | 400 — bad request body |
content_filter | No | Content policy rejection |
context_overflow | No | Input exceeds context window |
RouteResult<T>:
| Field | Type | Notes |
|---|---|---|
text | string | Reply text from the serving model. |
parsed | T | undefined | Auto-parsed JSON when structured was set. |
response | CompletionResponse | Full normalised response from the serving model. |
servedBy | string | The model id that produced the response. |
attempts | RouteAttempt[] | Every attempt (including failures). Last entry is always the success. |
Server-side vs client-side routing — when each applies:
| Scenario | Strategy used | Round-trips |
|---|---|---|
All models are openrouter/* | Server-side via models array | 1 (OpenRouter routes internally) |
Any model is not openrouter/* | Client-side sequential | 1 per attempt (up to models.length) |
| Single model | Plain complete() | 1 |
Compare the SDKs
Section titled “Compare the SDKs”This scenario is ORXA-only by design: no official SDK exposes cross-provider routing with a unified servedBy field and attempt log. OpenRouter’s own API supports the models array field natively, but it is a non-standard extension not exposed by any official SDK. Client-side cross-provider fallback requires multiple try/catch blocks, per-provider error parsing, and manual retry logic in every application that needs it. route() encapsulates all of this in one call, with a single ErrorKind taxonomy across providers.
Gotchas and next steps
Section titled “Gotchas and next steps”OpenRouter routing is all-or-nothing per namespace. If even one model in the list is not openrouter/*, the entire list falls back to client-side sequential mode. To use server-side routing, all candidates must be openrouter/*.
servedBy for server-side routing comes from response.model. OpenRouter fills this with the actual serving model id (e.g. anthropic/claude-opus-4.8). If response.model is empty or missing, servedBy falls back to models[0].
Bare model ids (without provider/ prefix) default to client-side. route() uses parseModelId() to extract the provider from a namespaced id. A bare id like gpt-4o has no detectable provider — pair it with the provider option in CompleteOptions (inherited from RouteOptions) or use the namespaced form.
Auth and content filter errors fail fast. If your API key is wrong, or the content hits a policy block, route() throws immediately without trying the next model. This is intentional — those errors are not fixed by a different model.
Cost tracking works per attempt. Each complete() call inside route() emits onCompletion and onCostEntry hooks. Failed attempts that charged tokens still emit their cost hooks.
Next steps:
- Models list — discover which models are available to route across
- Cost tracking guide — compare per-model costs before building a routing list
- Server-side built-in tools — built-ins work with
route()too