Realtime / live session
What you will achieve
Section titled “What you will achieve”Open a realtime session, send 'Say PING', and assert a text or audio response arrives — same createRealtime() API on OpenAI and Google (Anthropic has no realtime API).
Why it matters with a raw provider SDK
Section titled “Why it matters with a raw provider SDK”OpenAI realtime uses OpenAIRealtimeWebSocket from openai/beta/realtime. Google Live uses ai.live.connect() returning an AsyncSession with a completely different event model (receive() async generator vs typed event emitters). The two are incompatible — separate integrations for each.
Do it with ORXA
Section titled “Do it with ORXA”createRealtime() normalises both providers onto one event model (open, text, audio, turnComplete, error):
import { createRealtime } from '@combycode/llm-sdk';
const session = createRealtime({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY, modalities: ['text'],});
session.on('open', () => session.send({ text: 'Say PING' }));session.on('text', (e) => console.log(e.delta));session.on('turnComplete', () => session.close());Compare the SDKs
Section titled “Compare the SDKs”How it works
Section titled “How it works”createRealtime() opens a WebSocket to the provider’s live endpoint. Incoming events are normalised: OpenAI response.audio_transcript.delta and Google serverContent.modelTurn.parts both surface as { type: 'text', delta }. Google Gemini Live is audio-native; when modalities: ['audio'] is requested, audio chunks surface as { type: 'audio', chunk }. The turnComplete event fires when the provider signals end-of-turn.
Next steps
Section titled “Next steps”- Realtime guide — session options, interruption, audio format configuration
- Audio input — non-realtime audio understanding