Skip to content

Realtime (Live) Sessions

createRealtime() opens a persistent WebSocket session to a provider’s live API, normalizing the two very different provider protocols (OpenAI’s typed event stream, Google’s turn-based bidirectional stream) onto one event model.

Beta. Both underlying provider APIs are in beta. Expect breaking changes from providers independent of this SDK.

ProviderModel exampleNotes
openaigpt-4o-realtime-previewFull duplex, text + audio
googlegemini-2.0-flash-liveTurn-based bidirectional, audio-native

Passing any other provider throws immediately with a clear message.

createRealtime(opts: CreateRealtimeOptions): RealtimeSession

CreateRealtimeOptions:

FieldTypeRequiredNotes
modelstringyesBare (gpt-4o-realtime-preview) or namespaced (openai/...)
providerProviderNamewhen model is bareIgnored when model is namespaced
apiKeystringnoFalls back to engine.apiKeys[provider]
modalitiesRealtimeModality[]no'text' and/or 'audio'. Default ['text']
audioAudioOptionsno{ voice?, format? } for audio output
voicestringnoDeprecated; use audio.voice
instructionsstringnoSystem-level instructions for the session
engineEngineHandlenoDefaults to the registered engine

Returns a RealtimeSession immediately (synchronous). The underlying WebSocket connection opens asynchronously; listen for the 'open' event before sending.

interface RealtimeSession {
send(input: RealtimeInput, opts?: { turnComplete?: boolean }): void;
on<E extends RealtimeEventType>(type: E, cb: (e: ...) => void): () => void;
close(): void;
}

RealtimeInput:

interface RealtimeInput {
text?: string;
audio?: Uint8Array; // raw audio bytes (provider-specific encoding, e.g. PCM16)
}

send() defaults to turnComplete: true — commits the turn and requests a response. Pass turnComplete: false to stream a single turn across multiple send() calls (useful for chunked audio).

on() returns an unsubscribe function. Call it to stop receiving events of that type.

type RealtimeEvent =
| { type: 'open' }
| { type: 'text'; delta: string }
| { type: 'audio'; chunk: Uint8Array; mimeType: string; sampleRate?: number }
| { type: 'turnComplete' }
| { type: 'usage'; usage: Usage }
| { type: 'error'; error: Error }
| { type: 'close' };
EventWhen
openWebSocket connected and session ready
textText delta from the model (stream chunks)
audioAudio chunk from the model
turnCompleteModel finished a response turn
usageToken usage reported (fires once per turn; wired into cost pipeline)
errorTransport or protocol error
closeSocket closed (normal or abnormal)

Usage events are automatically forwarded to the onCompletion hook so the CostCollector tracks and prices realtime calls alongside regular completions.

import { createEngine, createRealtime } from '@combycode/llm-sdk';
createEngine({ apiKeys: { openai: process.env.OPENAI_API_KEY! } });
const session = createRealtime({
model: 'openai/gpt-4o-realtime-preview',
modalities: ['text'],
instructions: 'You are a helpful assistant.',
});
const unsubText = session.on('text', (e) => {
process.stdout.write(e.delta);
});
session.on('turnComplete', () => {
console.log('\n[turn complete]');
session.close();
});
session.on('open', () => {
session.send({ text: 'Hello! Tell me a short joke.' });
});
session.on('error', (e) => console.error('Realtime error:', e.error));
session.on('close', () => unsubText());
import { createEngine, createRealtime } from '@combycode/llm-sdk';
createEngine({ apiKeys: { openai: process.env.OPENAI_API_KEY! } });
const session = createRealtime({
model: 'openai/gpt-4o-realtime-preview',
modalities: ['audio', 'text'],
audio: { voice: 'alloy' },
instructions: 'Respond concisely.',
});
// Collect audio chunks.
const audioChunks: Uint8Array[] = [];
session.on('audio', (e) => {
audioChunks.push(e.chunk);
});
session.on('turnComplete', () => {
console.log(`Got ${audioChunks.length} audio chunks.`);
// Combine and play / write to file as needed.
session.close();
});
session.on('open', () => {
// Send pre-encoded PCM16 audio or a text prompt.
session.send({ text: 'Say "Hello, world!" in a friendly tone.' });
});

In addition to the onCompletion event (for cost tracking), the network layer emits three realtime-specific hooks on engine.hooks:

HookWhen
onRealtimeOpenSocket connected (provider, model, url)
onRealtimeFrameEach frame direction/size (metadata only, no payload)
onRealtimeCloseSocket closed (code, reason)
onRealtimeErrorTransport error
engine.hooks.on('onRealtimeOpen', (ctx) => {
console.log(`Realtime connected: ${ctx.provider}/${ctx.model}`);
});