Skip to content

Multi-step tool loop (hand-rolled)

Chain two dependent tools: get_user_city returns 'Paris', then get_weather(city='Paris') returns 'sunny'. Prompt 'What is the weather where I am?'. Assert both tools are called in order and the final answer mentions 'sunny'.

This scenario deliberately uses complete() with tools — the loop is built into complete(). It shows you the ORXA version of what each official SDK forces you to write by hand. Compare it to scenario 09 which shows the same result via each provider’s native runner.

Multi-step tool loops appear whenever the answer to a question requires gathering information incrementally — the second tool call depends on the result of the first. Classic examples: look up the user’s location, then look up local weather; look up a record ID, then fetch the record; determine which API to call, then call it.

With raw provider SDKs you write the loop yourself: extract tool calls from the response, execute each, format results into the provider’s specific tool-result message shape, send again, and repeat. Each provider’s message format is different. A portable implementation is 50-80 lines per provider, growing with every provider you add.

import { defineTool } from '@combycode/llm-sdk';
const getUserCity = defineTool({
name: 'get_user_city',
description: "Return the user's current city.",
params: {},
execute: () => 'Paris',
});
const getWeather = defineTool({
name: 'get_weather',
description: 'Get the current weather for a city.',
params: { city: 'string' },
execute: ({ city }) => `sunny in ${city}`,
});

The two tools are independent — neither knows about the other. The model decides to call get_user_city first, then use the result 'Paris' as the argument to get_weather.

Step 2 — Run the multi-step loop with complete()

Section titled “Step 2 — Run the multi-step loop with complete()”
import { complete, defineTool } from '@combycode/llm-sdk';
const getUserCity = defineTool({ name: 'get_user_city', description: "Return the user's current city.", params: {}, execute: () => 'Paris' });
const getWeather = defineTool({ name: 'get_weather', description: 'Get weather for a city.', params: { city: 'string' }, execute: ({ city }) => `sunny in ${city}` });
const { text } = await complete({
model: process.env.LLM_MODEL!,
apiKey: process.env.LLM_API_KEY,
prompt: 'What is the weather where I am?',
tools: [getUserCity, getWeather],
maxTokens: 512,
});
console.log(text); // "The weather in Paris is sunny."

The loop runs automatically: model responds with get_user_city call -> SDK executes it -> result 'Paris' fed back -> model responds with get_weather(city='Paris') call -> SDK executes it -> result 'sunny in Paris' fed back -> model produces final text.

Step 3 — Understand what “hand-rolled” would look like

Section titled “Step 3 — Understand what “hand-rolled” would look like”

Without ORXA, a portable two-provider loop looks like this (abbreviated):

// What you would write for Anthropic without ORXA:
let messages = [{ role: 'user', content: prompt }];
while (true) {
const res = await anthropic.messages.create({ model, system, messages, tools, max_tokens });
messages.push({ role: 'assistant', content: res.content });
if (res.stop_reason !== 'tool_use') break;
const toolResults = [];
for (const block of res.content) {
if (block.type !== 'tool_use') continue;
const result = await executeTool(block.name, block.input);
toolResults.push({ type: 'tool_result', tool_use_id: block.id, content: result });
}
messages.push({ role: 'user', content: toolResults });
}
// Then repeat the entire thing differently for OpenAI (tool_calls array, tool role, tool_call_id)
// And again for Google (functionCall parts, functionResponse parts)

ORXA replaces all three variants with the eight-line version above.

complete() runs until the model stops requesting tools. For adversarial prompts or loops with many tools, bound the number of steps:

import { createLLM, AgentLoop, defineTool } from '@combycode/llm-sdk';
const llm = createLLM({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY });
const loop = new AgentLoop({ client: llm, tools: [getUserCity, getWeather], maxTokens: 512 });
// Use a manual step counter as a guard:
let steps = 0;
const MAX_STEPS = 5;
// The loop.complete() itself does not expose a maxSteps param in the public API;
// use stop() from a hook or wrap in a timeout for hard caps.
loop.hooks.on('onStepComplete', (ctx) => {
steps++;
if (steps >= MAX_STEPS) loop.stop();
});
const res = await loop.complete('What is the weather where I am?');
console.log(res.text);

For most practical scenarios (2-5 dependent tools) the loop terminates naturally when the model reaches a final text answer.

How complete() terminates:

ConditionWhat happens
Model returns text with no tool callsLoop ends, complete() returns
loop.stop() is calledLoop ends at the current step boundary
finishReason === 'length' (max_tokens hit mid-tool)Loop ends, last response returned
Guardrail tripsLoop ends with reason: 'guardrail'
execute throws and continueOnError = falseLoop ends with reason: 'error'

Accumulating token usage across steps:

AgentRunReport.totalUsage aggregates input + output tokens across ALL steps. Each step’s usage is also in report.steps[n].usage. This matters for cost estimation on long loops:

const loop = new AgentLoop({ client: llm, tools: [getUserCity, getWeather] });
const res = await loop.complete('What is the weather where I am?', { maxTokens: 512 });
const report = loop.lastReport!;
console.log(`${report.stepCount} steps, ${report.totalUsage.totalTokens} total tokens`);

History accumulation:

AgentLoop appends every turn to history. After complete() returns you can call it again with a follow-up question and the model sees the prior conversation. Use this for stateful agents. For one-shot calls with complete() helper, history is discarded after the call.

When to use complete() vs AgentLoop directly:

Use complete()Use AgentLoop directly
One-shot question-answer, history not neededStateful multi-turn agent
No need for step events or reportsNeed onStepStart/onStepComplete hooks
Simple scriptsNeed stop(), dump()/restore()
import { complete, defineTool } from '@combycode/llm-sdk';

// Two dependent tools; complete() runs the whole multi-step loop internally —
// the SAME single file across every provider (no hand-rolled while-loop, no
// provider-specific Agents SDK).
const getUserCity = defineTool({
  name: 'get_user_city',
  description: "Get the user's current city.",
  params: {},
  execute: () => 'Paris',
});
const getWeather = defineTool({
  name: 'get_weather',
  description: 'Get the weather for a city.',
  params: { city: 'string' },
  execute: () => 'sunny',
});

const t0 = performance.now();
const { text } = await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: 'What is the weather where I am?',
  tools: [getUserCity, getWeather],
  maxTokens: 512,
});

console.log(JSON.stringify({ result: text.trim(), ms: Math.round(performance.now() - t0) }));

ORXA’s complete() is the entire multi-step loop in one call. Each official SDK requires a while loop, provider-specific message construction, and provider-specific tool-call extraction — typically 30-50 lines per provider. The scenario 09 comparison shows the three official native runners side by side.

The model controls the sequence. You cannot force it to call get_user_city before get_weather. Write tool descriptions that make the dependency obvious, or use toolChoice: { name: 'get_user_city' } on the first call and manually manage the second (requires AgentLoop directly, not complete()).

History grows with each step. A 5-step loop with large tool results can consume significant input tokens on step 5 (all prior messages are included). Monitor report.totalUsage.inputTokens and truncate tool results if needed.

Do not confuse steps and turns. A “step” here is one LLM call inside the loop. A “turn” in a conversation is a user message + assistant reply pair. One user turn can trigger many steps internally.

Next steps: