Skip to content

Media / Files / Batch -- createMediaOutput / batch / realtime

This group covers capabilities that go beyond text: generating images, audio, and video; uploading files for grounding; running requests as asynchronous provider batches; and opening a real-time audio/text session.

  • You need to generate an image, produce audio (TTS), or generate a video.
  • You need to attach a local file (PDF, image, audio) to a completion request.
  • You want to submit a large set of requests at provider-batch rates (cheaper, asynchronous, results available within hours).
  • You need a low-latency bidirectional audio session (OpenAI Realtime or Google Gemini Live).
ExportWhat it does
createMediaOutput(opts)Build a media handle for image/audio/video generation. .generateImage(), .generateAudio(), .generateVideo(). Saves results to a local directory.
transcribe(opts)Speech-to-text (covered in Tokens + embeddings as well).
batch(opts)One-shot auto batch: submit + poll + return results. Each request mirrors complete() options.
submitBatch(opts)Submit only — returns a BatchJob handle for manual polling.
batchJob(ref)Reconstruct a BatchJob from a previously persisted { id, provider }.
createRealtime(opts)Open a real-time session (WebSocket). Returns a RealtimeSession with event emitter API (open, text, audio, turnComplete, error, close).
loadContent(path)Load a file or URL into a ContentPart (image, PDF, audio, video — MIME-sniffed).
createMediaOutput(also covers TTS and video generation)

Type-only exports: BatchJob, BatchItemResult, BatchRequestInput, MediaResult, MediaMeta, RealtimeSession, RealtimeEvent, LoadImageOptions.

import { createMediaOutput } from '@combycode/llm-sdk';
const media = createMediaOutput({
model: 'openai/dall-e-3',
apiKey: process.env.OPENAI_API_KEY,
dir: './.media-out',
});
const [img] = await media.generateImage({
prompt: 'a red circle on a white background',
params: { size: '1024x1024' },
});
console.log(`Saved ${img?.meta.size} bytes to ${img?.meta.path}`);
import { createMediaOutput } from '@combycode/llm-sdk';
const media = createMediaOutput({
model: 'openai/gpt-4o-audio-preview',
apiKey: process.env.OPENAI_API_KEY,
dir: './.media-out',
});
const audio = await media.generateAudio({
input: 'Hello, world.',
params: { voice: 'alloy', format: 'wav' },
});
console.log(`Audio bytes: ${audio?.meta.size}`);

Files are attached via the attachments option on complete(). The SDK handles uploading to the provider’s File API when required (OpenAI/Anthropic), or inlining as base64 (Google).

import { complete } from '@combycode/llm-sdk';
const { text } = await complete({
model: 'openai/gpt-5.4-nano',
apiKey: process.env.OPENAI_API_KEY,
prompt: 'What word is in this file? Reply with just the word.',
attachments: ['./banana.txt'],
maxTokens: 32,
});
console.log(text);
import { batch } from '@combycode/llm-sdk';
const results = await batch({
model: 'anthropic/claude-haiku-4.5',
apiKey: process.env.ANTHROPIC_API_KEY,
requests: [
{ customId: 'a', prompt: 'Say apple.', maxTokens: 16 },
{ customId: 'b', prompt: 'Say banana.', maxTokens: 16 },
],
});
for (const r of results) {
console.log(`${r.customId}: ${r.success ? r.text : r.error}`);
}

Batch — manual mode (persist the job id, resume later)

Section titled “Batch — manual mode (persist the job id, resume later)”
import { submitBatch, batchJob } from '@combycode/llm-sdk';
// Submit and save the id.
const job = await submitBatch({
model: 'anthropic/claude-haiku-4.5',
apiKey: process.env.ANTHROPIC_API_KEY,
requests: [{ customId: 'a', prompt: 'Say apple.', maxTokens: 16 }],
});
console.log(`Batch id: ${job.id}`);
// Later -- reconstruct from persisted id.
const resumed = batchJob({ id: job.id, provider: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY });
const status = await resumed.status();
if (status.status === 'completed') {
const results = await resumed.results();
console.log(results[0].text);
}
import { createRealtime } from '@combycode/llm-sdk';
const session = createRealtime({
model: 'openai/gpt-4o-realtime-preview',
apiKey: process.env.OPENAI_API_KEY,
modalities: ['text'],
});
session.on('open', () => session.send({ text: 'Say PING' }));
session.on('text', (e) => process.stdout.write(e.delta));
session.on('turnComplete', () => {
session.close();
});