Skip to content

Provider routing / fallback

Pass a models array to route(). It returns a RouteResult with servedBy (which model answered), attempts (the full attempt log), and everything from complete(). When all models share the openrouter/ prefix, one request is sent with OpenRouter’s native models array; when models span different providers, each is tried in order until one succeeds.

Provider routing solves two problems:

High availability: If your primary model is overloaded or down, fall over to a secondary automatically. Rate limits, 5xx errors, model deprecations, and transient network failures are all retryable — the next model in the list gets the request.

Cost optimisation: List an expensive high-quality model first, a cheaper fallback second. The fast path uses the best model; fallback saves money when the best is unavailable.

When NOT to route: If every failure is retryable, routing is fine. If the request itself is invalid (bad schema, content policy violation, auth error), routing to another model will not help — route() detects this and fails fast rather than wasting quota.

Step 1 — Server-side routing via OpenRouter

Section titled “Step 1 — Server-side routing via OpenRouter”

When every model in the list is openrouter/*, route() sends a single request with OpenRouter’s native models array. OpenRouter selects the first available model server-side — one network round-trip, zero extra latency:

import { route } from '@combycode/llm-sdk';
const res = await route({
models: [
'openrouter/anthropic/claude-opus-4.8',
'openrouter/anthropic/claude-sonnet-4-5',
'openrouter/google/gemini-2.0-flash',
],
apiKey: process.env.OPENROUTER_API_KEY,
prompt: 'Reply with exactly: OK',
maxTokens: 8,
});
console.log(res.servedBy); // which OpenRouter model actually served it
console.log(res.text); // 'OK'

OpenRouter strips the openrouter/ prefix from model ids before sending (e.g. anthropic/claude-opus-4.8), and returns the serving model in response.model. servedBy is set from that field.

Step 2 — Client-side fallback across providers

Section titled “Step 2 — Client-side fallback across providers”

When models span multiple providers, route() tries each in sequence:

import { route } from '@combycode/llm-sdk';
// Engine pre-configured with multiple keys:
import { createEngine } from '@combycode/llm-sdk';
const engine = createEngine({
apiKeys: {
anthropic: process.env.ANTHROPIC_API_KEY!,
openai: process.env.OPENAI_API_KEY!,
google: process.env.GOOGLE_API_KEY!,
},
});
const res = await route({
engine,
models: [
'anthropic/claude-opus-4.8',
'openai/gpt-4o',
'google/gemini-2.0-flash',
],
prompt: 'What is 2 + 2?',
maxTokens: 16,
});
console.log(res.servedBy); // 'anthropic/claude-opus-4.8' (first to succeed)
console.log(res.attempts); // [{ model: 'anthropic/claude-opus-4.8' }]

Each attempt calls complete() internally with the candidate model. On a retryable error, the model is added to attempts with its error, and the next is tried. On success, servedBy is the successful model id.

When a fallback fires, attempts carries the full error history:

const res = await route({
models: ['anthropic/claude-opus-4.8', 'openai/gpt-4o'],
engine,
prompt: 'Ping',
maxTokens: 8,
});
for (const a of res.attempts) {
if (a.error) {
console.log(`${a.model} [${a.kind}]: ${a.error}`);
// e.g. 'anthropic/claude-opus-4.8 [rate_limit]: 429 Too Many Requests'
} else {
console.log(`${a.model} -- served`);
}
}

a.kind is an ErrorKind string from the SDK’s error taxonomy (see below).

Step 4 — Override which errors trigger fallback

Section titled “Step 4 — Override which errors trigger fallback”

By default, route() falls back on: rate_limit, server_error, model_not_found, timeout, network, quota_exceeded, unsupported. It does NOT fall back on auth, invalid_request, content_filter, or context_overflow — those won’t be fixed by trying another model:

import { route } from '@combycode/llm-sdk';
// Only fall back on rate limits and server errors
const res = await route({
models: ['anthropic/claude-opus-4.8', 'anthropic/claude-haiku-4-5'],
engine,
prompt: 'Hello',
maxTokens: 16,
fallbackOn: ['rate_limit', 'server_error'],
});

Step 5 — Combine with all complete() options

Section titled “Step 5 — Combine with all complete() options”

RouteOptions extends CompleteOptions (minus model and provider, which are replaced by models). Every complete() option works:

const res = await route({
models: ['openrouter/anthropic/claude-opus-4.8', 'openrouter/google/gemini-2.0-flash'],
apiKey: process.env.OPENROUTER_API_KEY,
system: 'You are a concise assistant.',
prompt: 'Describe TypeScript in one sentence.',
maxTokens: 64,
temperature: 0.3,
structured: {
schema: {
type: 'object',
properties: { description: { type: 'string' } },
required: ['description'],
additionalProperties: false,
},
},
});
console.log(res.parsed?.description); // auto-parsed JSON

RouteOptions:

OptionTypeRequiredNotes
modelsstring[]YesOrdered candidate list. Namespaced ids (provider/model) are preferred. Single-element arrays work — reduces to a plain complete().
fallbackOnErrorKind[]NoOverrides the default retryable error set.
All CompleteOptions except model, providervariousPassed through to each complete() attempt unchanged.

ErrorKind values (from LLMError.kind):

KindFallback by defaultTypical cause
rate_limitYes429 from the provider
server_errorYes5xx from the provider
model_not_foundYesModel id not available on this key
timeoutYesNetwork or provider timeout
networkYesTCP/DNS failure
quota_exceededYesMonthly cap hit
unsupportedYesFeature not supported by this model
authNo401/403 — invalid key
invalid_requestNo400 — bad request body
content_filterNoContent policy rejection
context_overflowNoInput exceeds context window

RouteResult<T>:

FieldTypeNotes
textstringReply text from the serving model.
parsedT | undefinedAuto-parsed JSON when structured was set.
responseCompletionResponseFull normalised response from the serving model.
servedBystringThe model id that produced the response.
attemptsRouteAttempt[]Every attempt (including failures). Last entry is always the success.

Server-side vs client-side routing — when each applies:

ScenarioStrategy usedRound-trips
All models are openrouter/*Server-side via models array1 (OpenRouter routes internally)
Any model is not openrouter/*Client-side sequential1 per attempt (up to models.length)
Single modelPlain complete()1
import { route } from '@combycode/llm-sdk';

// Unified routing/fallback. For openrouter models, route() uses OpenRouter's
// native server-side `models` array (one request); for any other mix it falls
// over client-side on retryable errors. `servedBy` reports who answered.
const primary = process.env.LLM_MODEL!; // openrouter/<model>
const t0 = performance.now();
const res = await route({
  models: [primary, 'openrouter/google/gemini-3.1-flash-lite'],
  apiKey: process.env.LLM_API_KEY,
  prompt: 'Reply with exactly: OK',
  maxTokens: 16,
});

console.log(JSON.stringify({ result: res.servedBy || 'no-model', ms: Math.round(performance.now() - t0) }));

This scenario is ORXA-only by design: no official SDK exposes cross-provider routing with a unified servedBy field and attempt log. OpenRouter’s own API supports the models array field natively, but it is a non-standard extension not exposed by any official SDK. Client-side cross-provider fallback requires multiple try/catch blocks, per-provider error parsing, and manual retry logic in every application that needs it. route() encapsulates all of this in one call, with a single ErrorKind taxonomy across providers.

OpenRouter routing is all-or-nothing per namespace. If even one model in the list is not openrouter/*, the entire list falls back to client-side sequential mode. To use server-side routing, all candidates must be openrouter/*.

servedBy for server-side routing comes from response.model. OpenRouter fills this with the actual serving model id (e.g. anthropic/claude-opus-4.8). If response.model is empty or missing, servedBy falls back to models[0].

Bare model ids (without provider/ prefix) default to client-side. route() uses parseModelId() to extract the provider from a namespaced id. A bare id like gpt-4o has no detectable provider — pair it with the provider option in CompleteOptions (inherited from RouteOptions) or use the namespaced form.

Auth and content filter errors fail fast. If your API key is wrong, or the content hits a policy block, route() throws immediately without trying the next model. This is intentional — those errors are not fixed by a different model.

Cost tracking works per attempt. Each complete() call inside route() emits onCompletion and onCostEntry hooks. Failed attempts that charged tokens still emit their cost hooks.

Next steps: