Text embeddings

What you will achieve

Call embed() with a string or an array of strings and receive a vector (or matrix) back — embeddings[0] is the first vector, dimensions is its length, and usage tokens are reported. One call shape works across OpenAI, Google, and OpenRouter. Anthropic and xAI have no first-party embeddings endpoint and are not supported.

When and why

Embeddings convert text into a fixed-length numeric vector that captures semantic meaning. Two sentences with similar meaning land close together in vector space; unrelated sentences land far apart. You need them for:

Semantic search — retrieve the most relevant chunks from a document corpus without exact keyword matches.
RAG (retrieval-augmented generation) — feed retrieved chunks as context into a subsequent complete() call.
Clustering / classification — group texts by topic without labelled training data.
Deduplication — detect near-duplicate records faster than string diff.

The raw problem: OpenAI’s call is client.embeddings.create({ model, input }) and returns data[0].embedding. Google’s call is ai.models.embedContent({ model, contents }) and returns embeddings[0].values. Different endpoints, different auth headers, different extraction paths. embed() normalises all of this.

Step by step

Step 1 — Embed a single string

import { embed } from '@combycode/llm-sdk';

const result = await embed({
  model: 'openai/text-embedding-3-small',
  apiKey: process.env.OPENAI_API_KEY,
  input: 'The quick brown fox',
});

console.log(result.dimensions);           // e.g. 1536
console.log(result.embeddings[0].length); // same as dimensions
console.log(result.embeddings[0][0]);     // first component, e.g. 0.012...

result.embeddings is always a number[][] — one row per input string. For a single string, result.embeddings[0] is your vector.

Step 2 — Embed multiple strings in one call

OpenAI and OpenRouter accept a batch in one HTTP round-trip. Google does not have a native batch endpoint — the adapter loops the calls automatically:

const result = await embed({
  model: 'openai/text-embedding-3-small',
  apiKey: process.env.OPENAI_API_KEY,
  input: [
    'The quick brown fox',
    'A lazy dog rests here',
    'TypeScript is a typed superset of JavaScript',
  ],
});

// result.embeddings[0] -- vector for first string
// result.embeddings[1] -- vector for second string
// result.embeddings[2] -- vector for third string
console.log(result.embeddings.length); // 3

All vectors have the same dimensions, making it safe to compare them with cosine similarity.

Step 3 — Check token usage (cost hook)

embed() emits an onCompletion hook after each call so the cost collector can account for embedding tokens. You can also read usage directly:

const result = await embed({
  model: 'openai/text-embedding-3-small',
  input: 'Hello world',
});

console.log(result.usage?.inputTokens); // e.g. 2

usage is { inputTokens: number } | undefined. Google’s adapter currently returns no usage object (the Gemini embedContent response does not carry token counts); OpenAI and OpenRouter return prompt_tokens from the response body.

Step 4 — Switch to Google

The call shape is identical — only the model string changes:

const result = await embed({
  model: 'google/gemini-embedding-exp-03-07',
  apiKey: process.env.GOOGLE_API_KEY,
  input: 'The quick brown fox',
});

Google routes to POST /v1beta/models/{model}:embedContent. The adapter adds the x-goog-api-key header and unwraps embedding.values from the response body.

Step 5 — Plug vectors into a similarity search

import { embed } from '@combycode/llm-sdk';

function cosineSimilarity(a: number[], b: number[]): number {
  const dot = a.reduce((s, v, i) => s + v * (b[i] ?? 0), 0);
  const magA = Math.sqrt(a.reduce((s, v) => s + v * v, 0));
  const magB = Math.sqrt(b.reduce((s, v) => s + v * v, 0));
  return dot / (magA * magB);
}

const corpus = ['OpenAI makes GPT models', 'Google makes Gemini', 'Rust is memory safe'];

const { embeddings: corpusVecs } = await embed({
  model: 'openai/text-embedding-3-small',
  input: corpus,
});

const { embeddings: [queryVec] } = await embed({
  model: 'openai/text-embedding-3-small',
  input: 'Who built GPT?',
});

const ranked = corpus
  .map((text, i) => ({ text, score: cosineSimilarity(queryVec, corpusVecs[i]) }))
  .sort((a, b) => b.score - a.score);

console.log(ranked[0].text); // 'OpenAI makes GPT models'

For a full retrieval pipeline with chunking, indexing, and RAG, see the Retrieval (RAG) guide.

Your options

embed() accepts an EmbedOptions object:

Option	Type	Required	Description
`model`	`string`	Yes	Namespaced (`openai/text-embedding-3-small`) or bare with `provider`. Determines which adapter is used.
`input`	`string \| string[]`	Yes	One string or a batch. Batch is sent in a single request for OpenAI/OpenRouter; looped for Google.
`provider`	`ProviderName`	No	Required when `model` is bare (e.g. `model: 'text-embedding-3-small'`, `provider: 'openai'`).
`apiKey`	`string`	No	Falls back to `engine.apiKeys[provider]` from the global engine config.
`adapter`	`EmbeddingProviderAdapter`	No	Override the auto-selected adapter with a custom one. Useful for testing or self-hosted endpoints.
`engine`	`EngineHandle`	No	Override the global engine (rate-limit queue, hooks, keys). Defaults to `coreRegistry.get()`.

Return value (EmbedResult):

Field	Type	Notes
`embeddings`	`number[][]`	One vector per input string, in order. Length equals `input.length` (or 1 for a single string).
`dimensions`	`number`	Length of each vector (`embeddings[0].length`). 0 if the response was empty.
`model`	`string`	The model id echoed from the request.
`usage`	`{ inputTokens: number } \| undefined`	Present for OpenAI and OpenRouter; absent for Google (Gemini embedContent carries no token count).

Supported providers:

Provider	Endpoint	`dimensions` note
`openai`	`POST /v1/embeddings`	From `data[0].embedding.length`. OpenAI `text-embedding-3-*` support `dimensions` truncation natively — pass via `params` on a custom adapter if needed.
`openrouter`	`POST /api/v1/embeddings` (OpenAI-compat)	Routed to any OpenAI-compatible embedding model available on OpenRouter.
`google`	`POST /v1beta/models/{model}:embedContent`	Looped per input (no batch endpoint). No usage returned.
`anthropic`	—	No first-party endpoint. Not supported.
`xai`	—	No first-party endpoint. Not supported.

When to use a custom adapter:

Pass adapter when you need a self-hosted or custom-base-URL endpoint. For example, an Azure OpenAI deployment:

import { OpenAIEmbeddingAdapter } from '@combycode/llm-sdk/providers/openai';

const azureAdapter = new OpenAIEmbeddingAdapter({
  apiKey: process.env.AZURE_KEY!,
  baseURL: 'https://my-resource.openai.azure.com/openai/deployments/my-embed-deployment',
});

const result = await embed({
  model: 'text-embedding-3-small',
  input: 'hello',
  adapter: azureAdapter,
});

Compare the SDKs

import { embed } from '@combycode/llm-sdk';

// One `embed()` across providers (openai / google / openrouter); returns vectors
// + dimensions. (anthropic/xai have no first-party embeddings endpoint.)
const t0 = performance.now();
const { dimensions } = await embed({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  input: 'hello',
});

console.log(JSON.stringify({ result: String(dimensions), ms: Math.round(performance.now() - t0) }));

OpenAI’s official SDK calls client.embeddings.create() and returns data[0].embedding; Google’s returns embeddings[0].values; OpenRouter mirrors the OpenAI shape. ORXA unifies these under one embed() signature with a consistent EmbedResult. All HTTP flows through engine.fetch, which means rate-limit queuing, retry, and onCompletion hooks (including cost tracking) apply to embedding calls exactly as they do to complete() calls.

Gotchas and next steps

Google loops requests. For an array of three strings, Google makes three HTTP calls. The calls share the NetworkEngine queue (rate-limit aware), but latency is additive. For large batches, prefer OpenAI or OpenRouter.

Dimensions vary by model. text-embedding-3-small returns 1536 dimensions by default; text-embedding-3-large returns 3072; gemini-embedding-exp-03-07 returns 3072. When mixing models across services in a corpus, all vectors must use the same model — dimensions must match for cosine similarity to be valid.

Usage is absent for Google. If you are using cost tracking via the onCompletion hook, Google embedding calls will emit a hook but with inputTokens: 0 (because the response carries no count). OpenAI and OpenRouter fill this correctly.

embed() is a one-shot helper. It creates its own adapter each call. For repeated embedding inside a tight loop, pass a pre-built adapter to skip the constructor overhead.

Next steps:

Retrieval (RAG) guide — full pipeline: chunk, embed, index, retrieve, complete
Web search (grounded search) — provider-side real-time search without a corpus
Cost tracking guide — how onCompletion accounts for embedding tokens