Skip to content

Models & Providers

The SDK ships with built-in support for five providers. Every call routes through a central catalog that knows each model’s pricing, capabilities, and the exact wire name the provider API expects.


ProviderKeyNotes
AnthropicanthropicClaude family; Messages API (messages).
OpenAIopenaiGPT / o-series; Responses API preferred, Chat Completions as fallback.
GooglegoogleGemini family; Generate API (generate), Interactions for stateful sessions.
xAIxaiGrok family; OpenAI-compatible (Responses + Completions).
OpenRouteropenrouterAggregator gateway — routes to 200+ upstream models under a single key. Not bundled in the local catalog; model info is fetched live.

You can also point OpenAI-compatible local servers (Ollama, vLLM, LM Studio) at the openai adapter by overriding the base URL in clientOptions.


The SDK bundles a versioned JSON catalog for every provider (except OpenRouter, which is inherently dynamic). The catalog is loaded once at engine startup — no network required.

FieldTypeMeaning
providerstringProvider key (e.g. "anthropic").
modelstringCanonical normalized slug (e.g. "claude-haiku-4-5").
providerModelNamestring?Exact id sent on the wire (may include a date suffix).
aliasesstring[]?Alternate callable ids (snapshots, dated forms).
pricing.inputPerMToknumber?USD per 1 M input tokens.
pricing.outputPerMToknumber?USD per 1 M output tokens.
pricing.cacheReadPerMToknumber?USD per 1 M cache-read tokens.
pricing.cacheWritePerMToknumber?USD per 1 M cache-write tokens.
pricing.perImagenumber?USD per image (image-gen models).
pricing.perMinutenumber?USD per minute of audio (STT models).
pricing.tiersRecord<string, TierRates>?Per-service-tier rate overrides (e.g. batch, priority, flex). The flat fields are the implicit standard tier.
capabilities.toolUsebooleanSupports tool/function calling.
capabilities.builtinToolsstring[]?Names of provider-native built-in tools (e.g. "web_search", "code_interpreter").
capabilities.streamingbooleanSupports token streaming.
capabilities.structuredOutputbooleanSupports JSON-schema-constrained output.
capabilities.visionbooleanAccepts image inputs.
capabilities.audiobooleanAccepts audio inputs.
capabilities.videobooleanAccepts video inputs.
capabilities.imageGenerationbooleanProduces images.
capabilities.audioGenerationbooleanProduces audio (TTS).
capabilities.videoGenerationbooleanProduces video.
reasoning.supportedbooleanModel has an extended thinking / reasoning mode.
reasoning.effortControlbooleanReasoning effort level is configurable.
reasoning.automaticbooleanReasoning activates automatically (no explicit toggle).
contextWindownumber?Max input context in tokens.
maxOutputnumber?Max output tokens per request.
preferredApiApiTypeAPI variant the SDK uses by default (messages, responses, completions, generate, interactions).
supportedApisApiType[]All API variants the model can use.
typestring?Model role: chat, code, image, video, tts, stt, embedding.
inputModalitiesstring[]?Content kinds accepted: text, image, audio, video, pdf.
outputModalitiesstring[]?Content kinds produced: text, image, audio, video.
familystring?Model family (e.g. "claude-opus", "gpt").
statusstring?Lifecycle: stable, preview, legacy.
activeboolean?Callable from this account right now.
import { listModels, createEngine } from '@combycode/llm-sdk';
const engine = createEngine({
catalog: 'defaults',
apiKeys: { anthropic: process.env.ANTHROPIC_API_KEY! },
});
// All models in the catalog
const all = listModels();
// One provider only
const anthropicModels = listModels({ provider: 'anthropic' });
// Inspect a model
const haiku = engine.catalog.get('anthropic', 'claude-haiku-4-5');
console.log(haiku?.pricing.inputPerMTok); // e.g. 0.8
console.log(haiku?.capabilities.vision); // true / false
console.log(haiku?.contextWindow); // 200000

For live availability (not just catalog entries), listModelsLive() hits the provider’s /models endpoint and merges results with the local catalog:

import { listModelsLive } from '@combycode/llm-sdk';
// Enriched ModelInfo[], cached 24 h in memory
const live = await listModelsLive({ provider: 'openai' });
// Bare id strings only
const ids = await listModelsLive({ provider: 'openai', raw: true });
// Force a fresh fetch (bypass cache)
const fresh = await listModelsLive({ provider: 'anthropic', refresh: true });

Browse all models interactively at /models.


Use engine.catalog.set() to register a model the bundled catalog does not know, or to override pricing and capabilities for an existing entry.

import { createEngine } from '@combycode/llm-sdk';
const engine = createEngine({
catalog: 'defaults',
apiKeys: { openai: process.env.OPENAI_API_KEY! },
});
// Register a custom / fine-tuned model
engine.catalog.set('openai', 'my-ft-gpt-4o', {
pricing: { inputPerMTok: 5, outputPerMTok: 15 },
preferredApi: 'responses',
supportedApis: ['responses', 'completions'],
contextWindow: 128000,
capabilities: {
toolUse: true,
streaming: true,
structuredOutput: true,
vision: false,
audio: false,
video: false,
imageGeneration: false,
audioGeneration: false,
videoGeneration: false,
},
// The exact id your fine-tune endpoint expects on the wire:
providerModelName: 'ft:gpt-4o-2024-08-06:acme::AbcXyz',
});
// Now usable like any catalog model:
const { text } = await complete({
model: 'openai/my-ft-gpt-4o',
prompt: 'Hello',
});

set() signature:

catalog.set(
provider: string,
model: string, // normalized slug — what you pass to complete()
info: Partial<Omit<ModelInfo, 'provider' | 'model'>> & { pricing: ModelPricing }
): void

Only pricing is required; everything else falls back to safe defaults (toolUse: true, streaming: true, structuredOutput: true, all media flags false, preferredApi: 'completions').

To load a batch of entries at once (same format as the bundled catalog.json files), use catalog.load(data):

engine.catalog.load({
'openai/my-model': {
pricing: { inputPerMTok: 2, outputPerMTok: 6 },
contextWindow: 32000,
preferredApi: 'responses',
supportedApis: ['responses'],
capabilities: { toolUse: true, streaming: true, structuredOutput: true,
vision: false, audio: false, video: false,
imageGeneration: false, audioGeneration: false, videoGeneration: false },
},
});

The SDK accepts two forms everywhere (complete, stream, agent, estimate, …):

Namespaced"provider/model" (recommended):

const { text } = await complete({ model: 'anthropic/claude-haiku-4-5', prompt: '...' });

Bare model + explicit provider field:

const { text } = await complete({ model: 'claude-haiku-4-5', provider: 'anthropic', prompt: '...' });

Both forms are equivalent. The namespaced form is preferred because it is unambiguous and self-contained.

Service tier suffix — append :tier to any namespaced id to pick a synchronous service tier (recognized values: auto, standard, priority, flex, scale):

// Routes through the flex tier (cheaper, higher latency)
const { text } = await complete({ model: 'openai/gpt-5.4:flex', prompt: '...' });

Note: batch is NOT a service tier. Batch is a separate, asynchronous request flow — the Batch API (submitBatch / the Batch guide), with its own ~50% pricing. The batch key under pricing.tiers exists only so the cost layer can price batch jobs; you never select it as a :tier.

Note: :free and :online are NOT parsed as tiers — they are OpenRouter variant suffixes and are passed through verbatim.

Smart selection: select() and selectModels()

Section titled “Smart selection: select() and selectModels()”

select() returns the single best "provider/model" string for a capability query; selectModels() returns the full ranked list. Both are availability-aware: only providers with a configured API key are considered.

import { select, selectModels } from '@combycode/llm-sdk';
// Cheapest vision-capable model across all configured providers
const model = select('vision; price:low');
// All reasoning-capable models, cheapest first
const candidates = selectModels('reasoning');
// Multiple constraints
const coder = select('type:code; tools; context > 100k');
// Restrict to one provider
const gemini = select('vision; streaming', { provider: 'google' });

Query syntax: a semicolon-separated string or string array. Each clause is one of:

ClauseMeaning
vision, tools, audio, structured, searchCapability flag must be true.
reasoningModel has a reasoning mode.
type:chatmodel.type === 'chat'.
status:stablemodel.status === 'stable'.
price:lowinputPerMTok <= 1 (default threshold).
price:midinputPerMTok <= 5.
price < 2inputPerMTok <= 2 (numeric, per 1 M tokens).
context > 200kcontextWindow >= 200000.
tier:flexModel has a flex pricing tier (also priority, etc.).
provider:anthropicRestrict to one provider (same as opts.provider).

Ranking: cheapest input price first; tiebreak: newest version.

const opts = {
prefs: {
thresholds: { 'price.low': 0.5 }, // redefine what "low" means
tags: { 'my-tag': 'vision; context > 128k' }, // custom shorthand
},
tier: 'flex', // evaluate price against the flex pricing tier
};
const model = select('my-tag', opts);

route() tries each candidate model in order, falling over on retryable errors (rate limits, server errors, timeouts). Non-retryable failures (auth, bad request, content filter) propagate immediately.

import { route } from '@combycode/llm-sdk';
const result = await route({
models: ['anthropic/claude-opus-4', 'openai/gpt-4o', 'google/gemini-2.5-pro'],
prompt: 'Summarize this document.',
maxTokens: 1024,
});
console.log(`Served by: ${result.servedBy}`);
console.log(`Attempts:`, result.attempts);

When every model in the list belongs to openrouter, a single request is sent with a models array and OpenRouter routes server-side (one round-trip, no client-side retry needed).


Three distinct identifiers exist for every model. Keeping them straight prevents subtle bugs.

Name typeExampleWhere you use it
Normalized id (slug)anthropic/claude-haiku-4-5Pass to complete(), select(), catalog.get(), everywhere in the SDK.
API name (providerModelName)claude-haiku-4-5-20250714What the adapter sends in the HTTP request body. You never write this — the SDK translates it.
Aliasclaude-haiku-4-5-20250714An alternate id (often the dated snapshot) that resolves to the same catalog entry.

Resolution flow:

You pass: "anthropic/claude-haiku-4-5"
|
v
catalog.get("anthropic", "claude-haiku-4-5") <- direct slug lookup
|
v
catalog.resolveModelId("anthropic", "claude-haiku-4-5")
|
v
adapter sends on the wire: "claude-haiku-4-5-20250714" (<-- providerModelName)

If you pass an alias (e.g. the dated form "anthropic/claude-haiku-4-5-20250714"), the alias index resolves it to the canonical slug first, then the same translation applies. If you pass a completely unknown id, the SDK sends it verbatim — no error, no translation.

// All three of these resolve to the same wire request:
await complete({ model: 'anthropic/claude-haiku-4-5', prompt: '...' });
await complete({ model: 'anthropic/claude-haiku-4-5-20250714', prompt: '...' }); // alias
await complete({ model: 'claude-haiku-4-5', provider: 'anthropic', prompt: '...' });

To inspect the wire name directly:

const wireName = engine.catalog.resolveModelId('anthropic', 'claude-haiku-4-5');
// -> "claude-haiku-4-5-20250714"