Async batch job
What you will achieve
Section titled “What you will achieve”Submit multiple prompts as an async batch job, poll until the provider completes all of them (up to 24 hours), and retrieve per-item results with success flags, text, and full usage data. Cost-collector hooks fire per completed item so the budget meter stays accurate. Supported on OpenAI, Anthropic, and Google.
When and why
Section titled “When and why”Batch APIs are designed for workloads where latency does not matter but cost does:
- Bulk classification or extraction — process thousands of documents overnight.
- Offline evaluation — run a prompt benchmark suite without paying synchronous API prices.
- Large-scale enrichment — annotate a dataset with model outputs without blocking.
Provider batch discounts:
- OpenAI: ~50% off synchronous pricing on supported models.
- Anthropic: 50% off on Claude models.
- Google: free (Gemini batch is experimental; check current pricing).
The raw work without the SDK: build a JSONL file, upload it with files.create(), call batches.create(), poll batches.retrieve() until status === 'completed', download the output file id from the batch object, download the file content, split on newlines, JSON-parse each line, handle per-line errors. For Anthropic and Google the flow differs. Total: 50-100 lines per provider.
Step by step
Section titled “Step by step”Step 1 — Auto mode: submit and wait in one call
Section titled “Step 1 — Auto mode: submit and wait in one call”import { batch } from '@combycode/llm-sdk';
const results = await batch({ model: 'openai/gpt-4o-mini', apiKey: process.env.OPENAI_API_KEY, requests: [ { customId: 'classify-1', prompt: 'Classify as positive or negative: "I love this product"', maxTokens: 8 }, { customId: 'classify-2', prompt: 'Classify as positive or negative: "Terrible experience"', maxTokens: 8 }, { customId: 'classify-3', prompt: 'Classify as positive or negative: "It was okay"', maxTokens: 8 }, ],});
for (const r of results) { console.log(r.customId, r.success ? r.text.trim() : `ERROR: ${r.error}`);}// classify-1 positive// classify-2 negative// classify-3 neutralbatch() blocks until all items complete (or until timeoutMs expires). The default poll interval is 5 seconds; the default timeout is 24 hours (the provider batch window).
Step 2 — Manual mode: submit, persist, resume
Section titled “Step 2 — Manual mode: submit, persist, resume”For long-running batches where your process may restart:
import { submitBatch, batchJob } from '@combycode/llm-sdk';
// Submit and immediately save the job idconst job = await submitBatch({ model: 'anthropic/claude-haiku-4-5', apiKey: process.env.ANTHROPIC_API_KEY, requests: [ { customId: 'doc-1', prompt: 'Summarise: ...', maxTokens: 128 }, { customId: 'doc-2', prompt: 'Summarise: ...', maxTokens: 128 }, ],});
console.log(job.id); // 'batch_abc123...'console.log(job.provider); // 'anthropic'
// Save job.id + job.provider to your database, queue, or env var.// Your process can restart here -- the batch continues running on the provider.
// Later (different process, different server):const restored = batchJob({ id: 'batch_abc123...', provider: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY,});
const status = await restored.status();console.log(status.status); // 'in_progress' | 'completed' | 'failed' | 'expired' | 'cancelled'
if (status.status === 'completed') { const results = await restored.results(); console.log(results.length); // 2}Step 3 — Poll with progress reporting
Section titled “Step 3 — Poll with progress reporting”import { submitBatch } from '@combycode/llm-sdk';
const job = await submitBatch({ model: 'google/gemini-2.0-flash', apiKey: process.env.GOOGLE_API_KEY, requests: Array.from({ length: 50 }, (_, i) => ({ customId: `item-${i}`, prompt: `Translate to French: item number ${i}`, maxTokens: 32, })),});
const results = await job.wait({ pollIntervalMs: 10_000, // check every 10 seconds timeoutMs: 2 * 60 * 60 * 1000, // 2 hour limit onProgress: (status) => { console.log(`[${new Date().toISOString()}] status: ${status.status}`); },});
console.log(results.filter(r => r.success).length, '/ 50 succeeded');Step 4 — Cancel a batch
Section titled “Step 4 — Cancel a batch”const job = await submitBatch({ model: 'openai/gpt-4o-mini', requests: [...] });
// Change of plans:await job.cancel();Cancellation is best-effort — items already processed by the provider may still appear in results.
Step 5 — Read per-item cost
Section titled “Step 5 — Read per-item cost”The SDK emits one onCostEntry hook per successfully-parsed batch item (cost accrues when results are downloaded, not at submit time). Listen on the engine:
import { createEngine } from '@combycode/llm-sdk';
const engine = createEngine({ apiKeys: { openai: process.env.OPENAI_API_KEY! },});engine.hooks.on('onCostEntry', ({ entry }) => { console.log(entry.tags.customId, entry.cost?.totalUsd?.toFixed(6));});
const results = await batch({ model: 'openai/gpt-4o-mini', engine, requests: [{ customId: 'q1', prompt: 'Hello', maxTokens: 16 }],});Your options
Section titled “Your options”BatchRequestInput — one item per request:
| Field | Type | Notes |
|---|---|---|
customId | string | Correlation id returned in results. Defaults to req-0, req-1, … when omitted. |
prompt | string | ContentPart[] | Message[] | Same shapes as complete(). |
system | string | Per-item system prompt override. |
maxTokens | number | Per-item max output tokens. |
temperature | number | Per-item temperature. |
structured | { schema, name? } | Per-item JSON schema output. |
SubmitBatchOptions — shared settings:
| Option | Type | Notes |
|---|---|---|
model | string | Namespaced (openai/gpt-4o-mini) or bare with provider. |
provider | ProviderName | Required when model is bare. |
apiKey | string | Falls back to engine.apiKeys[provider]. |
requests | BatchRequestInput[] | The items to submit. |
engine | EngineHandle | Override global engine. |
WaitOptions — for batch() and job.wait():
| Option | Type | Default | Notes |
|---|---|---|---|
pollIntervalMs | number | 5000 (5s) | How often to call job.status(). |
timeoutMs | number | 86400000 (24h) | Throw Error if not done by this deadline. |
onProgress | (status: BatchStatus) => void | undefined | Called after each status poll. |
BatchJob handle — manual mode:
| Method | Returns | Notes |
|---|---|---|
job.id | string | Provider batch id. Persist this for resume. |
job.provider | ProviderName | Provider name. Needed for batchJob() resume. |
job.status() | Promise<BatchStatus> | Current status without blocking. |
job.results() | Promise<BatchItemResult[]> | Throws if not yet terminal. |
job.wait(opts?) | Promise<BatchItemResult[]> | Poll + block until complete. |
job.cancel() | Promise<void> | Request cancellation. |
BatchItemResult — per-item output:
| Field | Type | Notes |
|---|---|---|
customId | string | Matches the input customId (or auto-generated req-N). |
success | boolean | true when the provider confirms success and the response parsed correctly. |
text | string | Parsed reply text. Empty string on failure. |
response | CompletionResponse | null | Full normalised response including usage and raw. null on failure. |
error | string | null | Error message when success is false. |
Provider support:
| Provider | Batch mechanism | Notes |
|---|---|---|
| OpenAI | /v1/batches + JSONL | Built on Responses API (/v1/responses). ~50% discount on supported models. |
| Anthropic | /v1/messages/batches | Native batch API. 50% discount. |
| Batch prediction API | Uses GoogleBatchAdapter. Experimental; pricing varies. | |
| xAI | Not supported | batch() throws for xai provider. |
| OpenRouter | Not supported | OpenRouter does not expose a batch endpoint. |
Google custom id mapping: Google keys results by index, not by the submitted customId. The SDK maps results back to the original ids using submission order. Results are returned in the same order as the requests array.
Compare the SDKs
Section titled “Compare the SDKs”OpenAI’s official SDK exposes the raw multi-step flow: upload JSONL file, create batch, poll, download output file, parse JSONL. Anthropic’s is similar but with different endpoint shapes. Google’s batch is a separate prediction API with different semantics. ORXA wraps all three behind the same submitBatch() / batch() / batchJob() API. The BatchJobImpl class handles per-provider result parsing, customId remapping (Google), and onCostEntry hook emission — cost tracking works the same for batch and synchronous calls.
Gotchas and next steps
Section titled “Gotchas and next steps”Batch turnaround is typically minutes to hours, not seconds. Providers process batches on spare capacity. A batch of 1000 items may complete in 10 minutes or 12 hours. Design accordingly — batch() auto mode will block your process for the full duration. Prefer manual mode for anything longer than a few minutes.
results() throws if the batch is not terminal. Always check status() first or use wait() to block until done. Terminal statuses are completed, failed, expired, and cancelled.
Failed items don’t throw. A partially-failed batch returns a BatchItemResult[] where some items have success: false. Inspect each item’s error field. The batch itself is considered completed by the provider even if some items failed.
Cost fires at download time, not submit time. The onCostEntry hook emits when results() or wait() parses each item. If your process restarts between submit and download, the cost for items already processed by the provider fires only when you call results() on the restored batchJob() handle.
OpenAI batch uses the Responses API internally. OpenAIBatchAdapter builds requests with OpenAIResponsesAdapter and parses responses with the same adapter. The batch JSONL body contains Responses API format, not Chat Completions format.
Next steps:
- File upload — upload a file once and reference it in batch requests
- Provider routing — route individual synchronous requests across providers
- Cost tracking guide — how
onCostEntryaccumulates batch costs