Multi-turn conversation
What you will achieve
Section titled “What you will achieve”Tell the model your name in turn 1, then ask it to recall the name in turn 2. The model answers correctly because the full history is included. Same code on every provider.
When and why you need this
Section titled “When and why you need this”A single-shot complete() call has no memory. Each request is stateless. To build a
chatbot, a coding assistant, or any back-and-forth experience you must pass the prior
conversation on each new request.
The challenge with raw provider SDKs is that message shapes differ:
- OpenAI —
{ role: 'user' | 'assistant' | 'system', content: string }. - Anthropic — same shape for user/assistant, but
systemis a top-level field, not a messages-array entry. - Google —
{ role: 'user' | 'model', parts: [{ text: string }] }(note:'model'not'assistant').
If you build a portable history array you have to branch on provider before every send.
Step by step
Section titled “Step by step”Step 1 — Send a message and record the reply
Section titled “Step 1 — Send a message and record the reply”import { complete, type Message } from '@combycode/llm-sdk';
const history: Message[] = [];
// Turn 1: user introduces themselveshistory.push({ role: 'user', content: 'My name is Alex.' });
const r1 = await complete({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY, prompt: history, maxTokens: 64,});
// Append the assistant reply so the model sees it in turn 2history.push({ role: 'assistant', content: r1.text });The key step is appending the assistant’s reply to history after each turn. Without
this the model has no context for the next question.
Step 2 — Send the follow-up turn
Section titled “Step 2 — Send the follow-up turn”// Turn 2: ask it to recall the namehistory.push({ role: 'user', content: 'What is my name?' });
const r2 = await complete({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY, prompt: history, maxTokens: 32,});
console.log(r2.text); // 'Your name is Alex.'prompt accepts either a string (single user message) or a Message[] (full history).
When you pass an array it becomes the entire messages list for the request.
Step 3 — Add a system prompt
Section titled “Step 3 — Add a system prompt”The system prompt applies to all turns. Pass it once via system; do not include it in
the history array:
const SYSTEM = 'You are a concise assistant. Reply in at most one sentence.';
const history: Message[] = [ { role: 'user', content: 'My name is Alex.' }, { role: 'assistant', content: 'Nice to meet you, Alex.' }, { role: 'user', content: 'What is my name?' },];
const { text } = await complete({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY, system: SYSTEM, prompt: history, maxTokens: 32,});Step 4 — Use ConversationHistory for automatic tracking
Section titled “Step 4 — Use ConversationHistory for automatic tracking”For long-running conversations, ConversationHistory manages the array for you and
adds token estimation and layered system-prompt context:
import { ConversationHistory, complete } from '@combycode/llm-sdk';
const conv = new ConversationHistory();
// Write the agent role to the registry (the preferred way for agents)conv.registry.set('agent.role', 'You are a helpful assistant.', { priority: 10, tags: ['system'],});
async function chat(userMessage: string): Promise<string> { conv.append({ role: 'user', content: userMessage });
const { text, usage } = await complete({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY, system: conv.registry.flat({ tag: 'system' }), prompt: conv.messages(), maxTokens: 256, });
conv.append({ role: 'assistant', content: text }, { usage }); return text;}
console.log(await chat('My name is Alex.'));console.log(await chat('What is my name?'));console.log(`Total tokens so far: ${conv.estimatedTokens()}`);ConversationHistory also tracks usage, supports snapshots (for persistence), and
exposes the layered registry for multi-contributor system prompts. See
Layered context for the full registry API.
Step 5 — Handle a growing conversation
Section titled “Step 5 — Handle a growing conversation”Conversations grow without bound. After many turns you will approach the model’s context window. Common strategies:
// Keep only the last 20 turns (10 pairs) + systemconst MAX_TURNS = 20;if (history.length > MAX_TURNS) { history.splice(0, history.length - MAX_TURNS);}For more sophisticated strategies (sliding window, summarisation) see the Context Guard guide.
Your options
Section titled “Your options”Message shapes (Message type):
| Field | Type | Notes |
|---|---|---|
role | 'user' | 'assistant' | 'system' | 'tool' | Always use 'assistant' — the SDK rewrites it to 'model' for Google internally. |
content | string | ContentPart[] | A plain string for most cases. Use ContentPart[] for multi-modal content (images, audio, tool results). |
id | string | Optional. Universal message id for dedup and referencing. |
createdAt | number | Optional. Epoch ms. Used for server-state TTL checks. |
origin | MessageOrigin | Set automatically by llm.assistantMessage(r). Carries server-state id for stateful continuation. |
cache | boolean | Mark this message’s content for prompt-cache pinning (Anthropic cache_control). |
When to use a plain Message[] vs ConversationHistory:
| Approach | When to use |
|---|---|
Message[] | Simple scripts, short conversations, single-function call chains. You manage the array. |
ConversationHistory | Multi-turn chatbots, agents, anything that needs token tracking, export/import, or registry-based system prompts. |
Multi-modal turns:
Images and documents are ContentPart[] entries in a message’s content. The structure
is the same across all providers — the SDK maps them to the provider’s native format:
history.push({ role: 'user', content: [ { type: 'text', text: 'What is in this image?' }, { type: 'image', source: { type: 'url', url: 'https://example.com/photo.jpg' } }, ],});Compare the SDKs
Section titled “Compare the SDKs”The structural difference: official SDKs require knowing and handling role name
differences ('assistant' vs 'model'), content format differences (string vs
parts), and system prompt placement (top-level vs messages array). In a multi-provider
application this is conditional logic on every turn. ORXA normalises role: 'assistant'
to role: 'model' for Google, moves role: 'system' messages to the correct provider
field, and wraps string content in parts for Google — your history array stays in one
canonical format throughout.
Gotchas and next steps
Section titled “Gotchas and next steps”Always append both sides. A common mistake is appending only the user message and omitting the assistant reply. The model then has no memory of what it said and may contradict itself.
Do not modify past messages. Some providers (Anthropic) will reject requests where
message roles do not alternate (user/assistant/user/…). If you need to edit history,
use ConversationHistory.spliceRange() which also handles token-count re-anchoring.
Context window vs token budget. The sum of all tokens in the history plus your
maxTokens must not exceed the model’s context window. Use history.estimatedTokens()
or countTokens() to check before sending.
Next steps:
- Conversation state — let the server hold history (OpenAI Responses)
- Layered context — dynamic, multi-contributor system prompts via
history.registry - Streaming — stream replies inside a conversation loop
- Token counting — measure history size before sending