Image generation
What you will achieve
Section titled “What you will achieve”Prompt 'a red circle on white background' and confirm non-empty image bytes are written to disk — same generateImage() call for OpenAI and Google (Anthropic has no image generation API).
When and why you need this
Section titled “When and why you need this”Image generation turns a text prompt into pixel data you can display, embed in a document, or feed back into another model call as a vision input. Use cases: product mock-ups, icon sets, illustration pipelines, data augmentation.
The challenge with raw provider SDKs:
- OpenAI calls
client.images.generate({ model, prompt, size, quality, n })and returns a list ofb64_jsonorurlitems.gpt-image-1always returnsb64_jsonand rejects theresponse_formatparameter;dall-e-3requires it. Two models, two different call shapes inside the same provider. - Google Imagen uses a
:predictendpoint withinstancesandparameters; predictions containbytesBase64Encodedfields. Google Gemini image models instead usegenerateContentwithresponseModalities: ['IMAGE']and returninlineDataparts. Two endpoints within the same provider.
createMediaOutput() routes each provider+model combination to the correct endpoint, saves the raw bytes to disk, and returns a uniform MediaResult.
Step by step
Section titled “Step by step”Step 1 — Create a media handle
Section titled “Step 1 — Create a media handle”import { createMediaOutput } from '@combycode/llm-sdk';
const media = createMediaOutput({ model: 'openai/gpt-image-1', apiKey: process.env.OPENAI_API_KEY, dir: './.media-out',});createMediaOutput() requires either dir (Node/Bun, for FileMediaStore) or store (any environment, e.g. new MemoryMediaStore() in the browser). The model string is 'provider/model-id' or a bare model with a separate provider field. API key falls back to engine.apiKeys[provider] when not passed explicitly.
Step 2 — Generate an image
Section titled “Step 2 — Generate an image”const [img] = await media.generateImage({ prompt: 'a red circle on white background', params: { size: '1024x1024' },});
console.log(`saved ${img.meta.size} bytes, id: ${img.id}`);// img.meta.mimeType -> 'image/png'// img.meta.provider -> 'openai'// img.meta.model -> 'gpt-image-1'generateImage() returns a MediaResult[]. Each item has { id, type: 'image', mimeType, meta }. The bytes are saved to dir under the id. Load them back with output.raw.mediaStore.load(id) when needed.
Step 3 — Generate multiple images (n)
Section titled “Step 3 — Generate multiple images (n)”const images = await media.generateImage({ prompt: 'a watercolor painting of a mountain', params: { n: 4, size: '1024x1024' },});
for (const img of images) { console.log(`${img.id}: ${img.meta.size} bytes`);}params.n requests multiple images in one API call. OpenAI DALL-E 2 supports up to 10; gpt-image-1 and DALL-E 3 support 1 (with gpt-image-1 supporting up to 10 in batch). Google Imagen supports up to 4 (sampleCount). Images are stored individually; the array length matches n.
Step 4 — Edit an existing image
Section titled “Step 4 — Edit an existing image”import { readFileSync } from 'fs';
const sourceBytes = new Uint8Array(readFileSync('./original.png'));
const [edited] = await media.editImage({ prompt: 'replace the background with a sunset', sourceImage: { type: 'buffer', mimeType: 'image/png', data: sourceBytes }, params: { size: '1024x1024' },});editImage() is available on OpenAI (gpt-image-1 via /v1/images/edits) and Google (Gemini via generateContent with the image attached as an extra part). Pass mask as a second DataSource for inpainting (OpenAI only).
Step 5 — Switch to Google Imagen
Section titled “Step 5 — Switch to Google Imagen”const googleMedia = createMediaOutput({ model: 'google/imagen-4.0-generate-001', apiKey: process.env.GOOGLE_API_KEY, dir: './.media-out',});
const [img] = await googleMedia.generateImage({ prompt: 'a photorealistic red apple on a white table', params: { n: 1, aspectRatio: '1:1', },});The Google Imagen path calls :predict on the Imagen model. Google Gemini image models (e.g. gemini-2.0-flash-exp) call generateContent with responseModalities: ['IMAGE']. The SDK routes automatically based on whether the model name starts with 'imagen'.
Your options
Section titled “Your options”createMediaOutput() options:
| Option | Type | Description |
|---|---|---|
model | string | Namespaced ('openai/gpt-image-1') or bare. Required unless provider is set and the adapter uses a default. |
provider | ProviderName | Required when model is bare. |
apiKey | string | Optional; falls back to engine.apiKeys[provider]. |
dir | string | Directory for FileMediaStore (Node/Bun). |
store | MediaStore | Custom store. Use new MemoryMediaStore() in the browser. |
providers | Record<string, MediaProviderAdapter> | Override or extend auto-registered adapters (custom baseURL, shared instance). |
engine | EngineHandle | Share an existing engine (hooks, catalog, fetch queue). |
config | MediaOutputConfig | pollIntervalMs (default 5000) and maxPollWaitMs (default 600000) for async video. |
ImageGenRequest.params — full option set:
| Param | Type | Providers | Description |
|---|---|---|---|
n | number | OpenAI, Google Imagen | Number of images to generate. OpenAI default 1; Google Imagen max 4. |
size | string | OpenAI | Pixel dimensions string: '1024x1024', '1792x1024', '1024x1792' (DALL-E 3 / gpt-image-1). '256x256', '512x512', '1024x1024' (DALL-E 2). |
aspectRatio | string | Aspect ratio string: '1:1', '3:4', '4:3', '9:16', '16:9'. | |
imageSize | string | Google Imagen | Sample image size ('1K', '2K'). Maps to sampleImageSize for Imagen, imageSize for Gemini. |
quality | string | OpenAI | 'standard' or 'hd' (DALL-E 3); 'low', 'medium', 'high', 'auto' (gpt-image-1). |
style | string | OpenAI DALL-E 3 | 'vivid' or 'natural'. Ignored by gpt-image-1 and Google. |
background | string | OpenAI gpt-image-1 | 'transparent', 'opaque', or 'auto'. Requires PNG output. |
outputFormat | string | OpenAI gpt-image-1 | 'png', 'jpeg', or 'webp'. Default 'png'. |
responseFormat | 'b64_json' | 'url' | OpenAI DALL-E 2/3 only | gpt-image-1 always returns b64_json and ignores this parameter. The adapter omits it automatically. |
strength | number | OpenRouter | Image-to-image strength (0-1); lower = closer to source. |
MediaResult fields:
| Field | Type | Description |
|---|---|---|
id | string | Generated media id (img_<uuid>). Use to load bytes from the store. |
type | 'image' | Media type discriminator. |
mimeType | string | e.g. 'image/png'. From provider response or outputFormat. |
meta.size | number | Byte count of the stored file. |
meta.provider | string | Provider that generated it. |
meta.model | string | Model id used. |
meta.prompt | string | The prompt sent. |
meta.revisedPrompt | string | undefined | OpenAI may return a revised prompt when it rewrites your input. |
meta.width / meta.height | number | undefined | Pixel dimensions when reported by the provider. |
Cost note: Image generation is priced per image (DALL-E 2/3) or per output token (gpt-image-1, Gemini). gpt-image-1 output tokens are billed at $0.04/1K by default (higher for HD). DALL-E 3 standard 1024x1024 is $0.04/image. Google Imagen pricing varies by model and region. Check provider pricing pages before running large batches.
Provider and model reference:
| Provider | Models | Endpoint |
|---|---|---|
| OpenAI | gpt-image-1, dall-e-3, dall-e-2 | /v1/images/generations (generate); /v1/images/edits (edit) |
imagen-4.0-generate-001, gemini-2.0-flash-exp (image) | :predict for Imagen; generateContent for Gemini image | |
| xAI | Aurora models via OpenRouter | /v1/images/generations |
Compare the SDKs
Section titled “Compare the SDKs”OpenAI’s SDK calls client.images.generate() and returns response.data[] with b64_json or url fields — you decode base64 and save to disk yourself. Google has no official image-generation method in the Node SDK; you call client.models.generateContent() with responseModalities: ['IMAGE'] and extract inlineData.data from the response parts manually. ORXA calls generateImage() once and returns typed MediaResult[] with bytes already saved to dir — no provider-specific extraction code in your app.
Gotchas and next steps
Section titled “Gotchas and next steps”gpt-image-1 always returns PNG bytes as b64_json. The responseFormat parameter is silently omitted by the adapter for gpt-image-1 because the API rejects it. DALL-E 3 and DALL-E 2 still need it set to 'b64_json' internally — the adapter handles this.
Revised prompts. OpenAI’s API may rewrite your prompt for safety or quality. The original prompt is stored in meta.prompt; the rewritten version (if any) is in meta.revisedPrompt. Log revisedPrompt when debugging unexpected output.
Google Imagen vs Gemini image models have different endpoints. Model names starting with 'imagen' go to :predict; all other Google models go to generateContent. Set model in createMediaOutput to the correct string; the adapter routes automatically.
Bytes are saved on every generateImage() call. If generation fails mid-call (network error after the image is returned but before mediaStore.save()), no file is written. The promise rejects cleanly — retry safely.
Edit requires sourceImage as a DataSource, not a path string. editImage() does not accept raw file paths. Read the file into a buffer DataSource first (see Step 4 above).
Next steps:
- Vision input — feed generated images back into a vision model
- TTS — audio generation counterpart
- File upload — upload a source image to the Files API for use in edits