Skip to content

Text embeddings

Call embed() with a string or an array of strings and receive a vector (or matrix) back — embeddings[0] is the first vector, dimensions is its length, and usage tokens are reported. One call shape works across OpenAI, Google, and OpenRouter. Anthropic and xAI have no first-party embeddings endpoint and are not supported.

Embeddings convert text into a fixed-length numeric vector that captures semantic meaning. Two sentences with similar meaning land close together in vector space; unrelated sentences land far apart. You need them for:

  • Semantic search — retrieve the most relevant chunks from a document corpus without exact keyword matches.
  • RAG (retrieval-augmented generation) — feed retrieved chunks as context into a subsequent complete() call.
  • Clustering / classification — group texts by topic without labelled training data.
  • Deduplication — detect near-duplicate records faster than string diff.

The raw problem: OpenAI’s call is client.embeddings.create({ model, input }) and returns data[0].embedding. Google’s call is ai.models.embedContent({ model, contents }) and returns embeddings[0].values. Different endpoints, different auth headers, different extraction paths. embed() normalises all of this.

import { embed } from '@combycode/llm-sdk';
const result = await embed({
model: 'openai/text-embedding-3-small',
apiKey: process.env.OPENAI_API_KEY,
input: 'The quick brown fox',
});
console.log(result.dimensions); // e.g. 1536
console.log(result.embeddings[0].length); // same as dimensions
console.log(result.embeddings[0][0]); // first component, e.g. 0.012...

result.embeddings is always a number[][] — one row per input string. For a single string, result.embeddings[0] is your vector.

Step 2 — Embed multiple strings in one call

Section titled “Step 2 — Embed multiple strings in one call”

OpenAI and OpenRouter accept a batch in one HTTP round-trip. Google does not have a native batch endpoint — the adapter loops the calls automatically:

const result = await embed({
model: 'openai/text-embedding-3-small',
apiKey: process.env.OPENAI_API_KEY,
input: [
'The quick brown fox',
'A lazy dog rests here',
'TypeScript is a typed superset of JavaScript',
],
});
// result.embeddings[0] -- vector for first string
// result.embeddings[1] -- vector for second string
// result.embeddings[2] -- vector for third string
console.log(result.embeddings.length); // 3

All vectors have the same dimensions, making it safe to compare them with cosine similarity.

embed() emits an onCompletion hook after each call so the cost collector can account for embedding tokens. You can also read usage directly:

const result = await embed({
model: 'openai/text-embedding-3-small',
input: 'Hello world',
});
console.log(result.usage?.inputTokens); // e.g. 2

usage is { inputTokens: number } | undefined. Google’s adapter currently returns no usage object (the Gemini embedContent response does not carry token counts); OpenAI and OpenRouter return prompt_tokens from the response body.

The call shape is identical — only the model string changes:

const result = await embed({
model: 'google/gemini-embedding-exp-03-07',
apiKey: process.env.GOOGLE_API_KEY,
input: 'The quick brown fox',
});

Google routes to POST /v1beta/models/{model}:embedContent. The adapter adds the x-goog-api-key header and unwraps embedding.values from the response body.

Section titled “Step 5 — Plug vectors into a similarity search”
import { embed } from '@combycode/llm-sdk';
function cosineSimilarity(a: number[], b: number[]): number {
const dot = a.reduce((s, v, i) => s + v * (b[i] ?? 0), 0);
const magA = Math.sqrt(a.reduce((s, v) => s + v * v, 0));
const magB = Math.sqrt(b.reduce((s, v) => s + v * v, 0));
return dot / (magA * magB);
}
const corpus = ['OpenAI makes GPT models', 'Google makes Gemini', 'Rust is memory safe'];
const { embeddings: corpusVecs } = await embed({
model: 'openai/text-embedding-3-small',
input: corpus,
});
const { embeddings: [queryVec] } = await embed({
model: 'openai/text-embedding-3-small',
input: 'Who built GPT?',
});
const ranked = corpus
.map((text, i) => ({ text, score: cosineSimilarity(queryVec, corpusVecs[i]) }))
.sort((a, b) => b.score - a.score);
console.log(ranked[0].text); // 'OpenAI makes GPT models'

For a full retrieval pipeline with chunking, indexing, and RAG, see the Retrieval (RAG) guide.

embed() accepts an EmbedOptions object:

OptionTypeRequiredDescription
modelstringYesNamespaced (openai/text-embedding-3-small) or bare with provider. Determines which adapter is used.
inputstring | string[]YesOne string or a batch. Batch is sent in a single request for OpenAI/OpenRouter; looped for Google.
providerProviderNameNoRequired when model is bare (e.g. model: 'text-embedding-3-small', provider: 'openai').
apiKeystringNoFalls back to engine.apiKeys[provider] from the global engine config.
adapterEmbeddingProviderAdapterNoOverride the auto-selected adapter with a custom one. Useful for testing or self-hosted endpoints.
engineEngineHandleNoOverride the global engine (rate-limit queue, hooks, keys). Defaults to coreRegistry.get().

Return value (EmbedResult):

FieldTypeNotes
embeddingsnumber[][]One vector per input string, in order. Length equals input.length (or 1 for a single string).
dimensionsnumberLength of each vector (embeddings[0].length). 0 if the response was empty.
modelstringThe model id echoed from the request.
usage{ inputTokens: number } | undefinedPresent for OpenAI and OpenRouter; absent for Google (Gemini embedContent carries no token count).

Supported providers:

ProviderEndpointdimensions note
openaiPOST /v1/embeddingsFrom data[0].embedding.length. OpenAI text-embedding-3-* support dimensions truncation natively — pass via params on a custom adapter if needed.
openrouterPOST /api/v1/embeddings (OpenAI-compat)Routed to any OpenAI-compatible embedding model available on OpenRouter.
googlePOST /v1beta/models/{model}:embedContentLooped per input (no batch endpoint). No usage returned.
anthropicNo first-party endpoint. Not supported.
xaiNo first-party endpoint. Not supported.

When to use a custom adapter:

Pass adapter when you need a self-hosted or custom-base-URL endpoint. For example, an Azure OpenAI deployment:

import { OpenAIEmbeddingAdapter } from '@combycode/llm-sdk/providers/openai';
const azureAdapter = new OpenAIEmbeddingAdapter({
apiKey: process.env.AZURE_KEY!,
baseURL: 'https://my-resource.openai.azure.com/openai/deployments/my-embed-deployment',
});
const result = await embed({
model: 'text-embedding-3-small',
input: 'hello',
adapter: azureAdapter,
});
import { embed } from '@combycode/llm-sdk';

// One `embed()` across providers (openai / google / openrouter); returns vectors
// + dimensions. (anthropic/xai have no first-party embeddings endpoint.)
const t0 = performance.now();
const { dimensions } = await embed({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  input: 'hello',
});

console.log(JSON.stringify({ result: String(dimensions), ms: Math.round(performance.now() - t0) }));

OpenAI’s official SDK calls client.embeddings.create() and returns data[0].embedding; Google’s returns embeddings[0].values; OpenRouter mirrors the OpenAI shape. ORXA unifies these under one embed() signature with a consistent EmbedResult. All HTTP flows through engine.fetch, which means rate-limit queuing, retry, and onCompletion hooks (including cost tracking) apply to embedding calls exactly as they do to complete() calls.

Google loops requests. For an array of three strings, Google makes three HTTP calls. The calls share the NetworkEngine queue (rate-limit aware), but latency is additive. For large batches, prefer OpenAI or OpenRouter.

Dimensions vary by model. text-embedding-3-small returns 1536 dimensions by default; text-embedding-3-large returns 3072; gemini-embedding-exp-03-07 returns 3072. When mixing models across services in a corpus, all vectors must use the same model — dimensions must match for cosine similarity to be valid.

Usage is absent for Google. If you are using cost tracking via the onCompletion hook, Google embedding calls will emit a hook but with inputTokens: 0 (because the response carries no count). OpenAI and OpenRouter fill this correctly.

embed() is a one-shot helper. It creates its own adapter each call. For repeated embedding inside a tight loop, pass a pre-built adapter to skip the constructor overhead.

Next steps: