Provider routing / fallback

What you will achieve

Pass a models array to route(). It returns a RouteResult with servedBy (which model answered), attempts (the full attempt log), and everything from complete(). When all models share the openrouter/ prefix, one request is sent with OpenRouter’s native models array; when models span different providers, each is tried in order until one succeeds.

When and why

Provider routing solves two problems:

High availability: If your primary model is overloaded or down, fall over to a secondary automatically. Rate limits, 5xx errors, model deprecations, and transient network failures are all retryable — the next model in the list gets the request.

Cost optimisation: List an expensive high-quality model first, a cheaper fallback second. The fast path uses the best model; fallback saves money when the best is unavailable.

When NOT to route: If every failure is retryable, routing is fine. If the request itself is invalid (bad schema, content policy violation, auth error), routing to another model will not help — route() detects this and fails fast rather than wasting quota.

Step by step

Step 1 — Server-side routing via OpenRouter

When every model in the list is openrouter/*, route() sends a single request with OpenRouter’s native models array. OpenRouter selects the first available model server-side — one network round-trip, zero extra latency:

import { route } from '@combycode/llm-sdk';

const res = await route({
  models: [
    'openrouter/anthropic/claude-opus-4.8',
    'openrouter/anthropic/claude-sonnet-4-5',
    'openrouter/google/gemini-2.0-flash',
  ],
  apiKey: process.env.OPENROUTER_API_KEY,
  prompt: 'Reply with exactly: OK',
  maxTokens: 8,
});

console.log(res.servedBy); // which OpenRouter model actually served it
console.log(res.text);     // 'OK'

OpenRouter strips the openrouter/ prefix from model ids before sending (e.g. anthropic/claude-opus-4.8), and returns the serving model in response.model. servedBy is set from that field.

Step 2 — Client-side fallback across providers

When models span multiple providers, route() tries each in sequence:

import { route } from '@combycode/llm-sdk';

// Engine pre-configured with multiple keys:
import { createEngine } from '@combycode/llm-sdk';

const engine = createEngine({
  apiKeys: {
    anthropic: process.env.ANTHROPIC_API_KEY!,
    openai: process.env.OPENAI_API_KEY!,
    google: process.env.GOOGLE_API_KEY!,
  },
});

const res = await route({
  engine,
  models: [
    'anthropic/claude-opus-4.8',
    'openai/gpt-4o',
    'google/gemini-2.0-flash',
  ],
  prompt: 'What is 2 + 2?',
  maxTokens: 16,
});

console.log(res.servedBy); // 'anthropic/claude-opus-4.8' (first to succeed)
console.log(res.attempts); // [{ model: 'anthropic/claude-opus-4.8' }]

Each attempt calls complete() internally with the candidate model. On a retryable error, the model is added to attempts with its error, and the next is tried. On success, servedBy is the successful model id.

Step 3 — Inspect the attempt log

When a fallback fires, attempts carries the full error history:

const res = await route({
  models: ['anthropic/claude-opus-4.8', 'openai/gpt-4o'],
  engine,
  prompt: 'Ping',
  maxTokens: 8,
});

for (const a of res.attempts) {
  if (a.error) {
    console.log(`${a.model} [${a.kind}]: ${a.error}`);
    // e.g. 'anthropic/claude-opus-4.8 [rate_limit]: 429 Too Many Requests'
  } else {
    console.log(`${a.model} -- served`);
  }
}

a.kind is an ErrorKind string from the SDK’s error taxonomy (see below).

Step 4 — Override which errors trigger fallback

By default, route() falls back on: rate_limit, server_error, model_not_found, timeout, network, quota_exceeded, unsupported. It does NOT fall back on auth, invalid_request, content_filter, or context_overflow — those won’t be fixed by trying another model:

import { route } from '@combycode/llm-sdk';

// Only fall back on rate limits and server errors
const res = await route({
  models: ['anthropic/claude-opus-4.8', 'anthropic/claude-haiku-4-5'],
  engine,
  prompt: 'Hello',
  maxTokens: 16,
  fallbackOn: ['rate_limit', 'server_error'],
});

Step 5 — Combine with all `complete()` options

RouteOptions extends CompleteOptions (minus model and provider, which are replaced by models). Every complete() option works:

const res = await route({
  models: ['openrouter/anthropic/claude-opus-4.8', 'openrouter/google/gemini-2.0-flash'],
  apiKey: process.env.OPENROUTER_API_KEY,
  system: 'You are a concise assistant.',
  prompt: 'Describe TypeScript in one sentence.',
  maxTokens: 64,
  temperature: 0.3,
  structured: {
    schema: {
      type: 'object',
      properties: { description: { type: 'string' } },
      required: ['description'],
      additionalProperties: false,
    },
  },
});
console.log(res.parsed?.description); // auto-parsed JSON

Your options

RouteOptions:

Option	Type	Required	Notes
`models`	`string[]`	Yes	Ordered candidate list. Namespaced ids (`provider/model`) are preferred. Single-element arrays work — reduces to a plain `complete()`.
`fallbackOn`	`ErrorKind[]`	No	Overrides the default retryable error set.
All `CompleteOptions` except `model`, `provider`	various	—	Passed through to each `complete()` attempt unchanged.

ErrorKind values (from LLMError.kind):

Kind	Fallback by default	Typical cause
`rate_limit`	Yes	429 from the provider
`server_error`	Yes	5xx from the provider
`model_not_found`	Yes	Model id not available on this key
`timeout`	Yes	Network or provider timeout
`network`	Yes	TCP/DNS failure
`quota_exceeded`	Yes	Monthly cap hit
`unsupported`	Yes	Feature not supported by this model
`auth`	No	401/403 — invalid key
`invalid_request`	No	400 — bad request body
`content_filter`	No	Content policy rejection
`context_overflow`	No	Input exceeds context window

RouteResult<T>:

Field	Type	Notes
`text`	`string`	Reply text from the serving model.
`parsed`	`T \| undefined`	Auto-parsed JSON when `structured` was set.
`response`	`CompletionResponse`	Full normalised response from the serving model.
`servedBy`	`string`	The model id that produced the response.
`attempts`	`RouteAttempt[]`	Every attempt (including failures). Last entry is always the success.

Server-side vs client-side routing — when each applies:

Scenario	Strategy used	Round-trips
All models are `openrouter/*`	Server-side via `models` array	1 (OpenRouter routes internally)
Any model is not `openrouter/*`	Client-side sequential	1 per attempt (up to `models.length`)
Single model	Plain `complete()`	1

Compare the SDKs

import { route } from '@combycode/llm-sdk';

// Unified routing/fallback. For openrouter models, route() uses OpenRouter's
// native server-side `models` array (one request); for any other mix it falls
// over client-side on retryable errors. `servedBy` reports who answered.
const primary = process.env.LLM_MODEL!; // openrouter/<model>
const t0 = performance.now();
const res = await route({
  models: [primary, 'openrouter/google/gemini-3.1-flash-lite'],
  apiKey: process.env.LLM_API_KEY,
  prompt: 'Reply with exactly: OK',
  maxTokens: 16,
});

console.log(JSON.stringify({ result: res.servedBy || 'no-model', ms: Math.round(performance.now() - t0) }));

This scenario is ORXA-only by design: no official SDK exposes cross-provider routing with a unified servedBy field and attempt log. OpenRouter’s own API supports the models array field natively, but it is a non-standard extension not exposed by any official SDK. Client-side cross-provider fallback requires multiple try/catch blocks, per-provider error parsing, and manual retry logic in every application that needs it. route() encapsulates all of this in one call, with a single ErrorKind taxonomy across providers.

Gotchas and next steps

OpenRouter routing is all-or-nothing per namespace. If even one model in the list is not openrouter/*, the entire list falls back to client-side sequential mode. To use server-side routing, all candidates must be openrouter/*.

servedBy for server-side routing comes from response.model. OpenRouter fills this with the actual serving model id (e.g. anthropic/claude-opus-4.8). If response.model is empty or missing, servedBy falls back to models[0].

Bare model ids (without provider/ prefix) default to client-side. route() uses parseModelId() to extract the provider from a namespaced id. A bare id like gpt-4o has no detectable provider — pair it with the provider option in CompleteOptions (inherited from RouteOptions) or use the namespaced form.

Auth and content filter errors fail fast. If your API key is wrong, or the content hits a policy block, route() throws immediately without trying the next model. This is intentional — those errors are not fixed by a different model.

Cost tracking works per attempt. Each complete() call inside route() emits onCompletion and onCostEntry hooks. Failed attempts that charged tokens still emit their cost hooks.

Next steps:

Models list — discover which models are available to route across
Cost tracking guide — compare per-model costs before building a routing list
Server-side built-in tools — built-ins work with route() too