Skip to content

Parallel tool calls

Prompt 'What is the weather in Paris and in Tokyo?' with a single get_weather tool. Confirm the model calls it twice (for both cities) in one turn — two results returned and fed back in a single follow-up message.

When a user asks about multiple entities in one question, a capable model will batch the tool calls into a single turn: it emits multiple tool call requests before needing any results. Running those calls in parallel — rather than one at a time — is critical for latency. Two sequential 300 ms calls is 600 ms; two parallel calls is still ~300 ms.

The difficulty with raw provider SDKs:

  • OpenAI — returns an array of tool_calls, expects a separate { role: 'tool', tool_call_id } message for each result.
  • Anthropic — returns multiple tool_use content blocks with distinct id fields; expects a user message with a tool_result block per id.
  • Google — returns multiple functionCall parts; expects a user message with functionResponse parts.

Building a generic parallel dispatcher means branching per provider for both the fan-out and the result assembly.

import { defineTool } from '@combycode/llm-sdk';
const getWeather = defineTool({
name: 'get_weather',
description: 'Get the current weather for a city.',
params: { city: 'string' },
execute: async ({ city }) => {
// Real impl would call a weather API here.
return `sunny in ${city}`;
},
});

Nothing parallel-specific needed in the tool definition. The loop handles the fan-out.

import { complete, defineTool } from '@combycode/llm-sdk';
let calls = 0;
const getWeather = defineTool({
name: 'get_weather',
description: 'Get the current weather for a city.',
params: { city: 'string' },
execute: ({ city }) => { calls++; return `sunny in ${city}`; },
});
const { text } = await complete({
model: process.env.LLM_MODEL!,
apiKey: process.env.LLM_API_KEY,
prompt: 'What is the weather in Paris and in Tokyo?',
tools: [getWeather],
maxTokens: 512,
});
console.log(calls); // 2
console.log(text); // "The weather in Paris is sunny, and in Tokyo it is also sunny."

The model issues two get_weather calls in one turn. The loop executes them in parallel (Promise.all) and sends both results back in a single follow-up message. The model then produces the final text.

After the run, lastReport (on the AgentLoop) carries per-tool-call detail. When using complete() directly you do not have a handle to the loop, but you can access the loop via a lower-level API:

import { createLLM, AgentLoop, defineTool } from '@combycode/llm-sdk';
const llm = createLLM({ model: process.env.LLM_MODEL!, apiKey: process.env.LLM_API_KEY });
let calls = 0;
const getWeather = defineTool({
name: 'get_weather',
params: { city: 'string' },
description: 'Get weather for a city.',
execute: ({ city }) => { calls++; return `sunny in ${city}`; },
});
const loop = new AgentLoop({ client: llm, tools: [getWeather] });
const res = await loop.complete('What is the weather in Paris and in Tokyo?', { maxTokens: 512 });
const report = loop.lastReport!;
console.log(report.toolCallCount); // 2
for (const step of report.steps) {
for (const tc of step.toolCalls) {
console.log(tc.toolName, tc.arguments, tc.latencyMs);
}
}

Step 4 — Control parallel vs sequential execution

Section titled “Step 4 — Control parallel vs sequential execution”

By default AgentLoop runs tool calls in parallel (parallelToolCalls: true). Switch to sequential if your tools share mutable state or side-effects must be ordered:

const loop = new AgentLoop({
client: llm,
tools: [getWeather],
parallelToolCalls: false, // execute one at a time, in model-returned order
});

Sequential mode uses the same message format; only the execution order changes.

parallelToolCalls (on AgentLoopConfig):

ValueBehaviourWhen to use
true (default)All tool calls in a step run via Promise.allStateless I/O (API calls, reads) — best latency
falseTool calls run sequentially in model-returned orderStateful operations where call N depends on call N-1

Partial failure handling:

Each tool call runs independently inside Promise.all. If one execute throws, the loop calls the onToolCallError hook. The hook’s continueOnError field (defaults to true) controls whether the loop sends an error result to the model and continues, or re-throws. To halt on any tool failure:

const loop = new AgentLoop({
client: llm,
tools: [getWeather],
});
loop.hooks.on('onToolCallError', (ctx) => {
ctx.continueOnError = false; // re-throw the error, stop the run
});

When continueOnError is true (default), the error message is sent back as the tool result and the model sees it — it can report the failure gracefully to the user.

toolTimeout — per-call deadline:

Each tool call runs with an AbortSignal that fires after toolTimeout milliseconds (default: 30_000). If execute does not respect ctx.signal, the timeout still fires but the tool may continue running in the background:

const loop = new AgentLoop({
client: llm,
tools: [getWeather],
toolTimeout: 5_000, // 5 s per tool call
});

Number of tool calls per turn:

There is no SDK-level cap on how many tool calls the model can emit in a single turn. If you need a cap, inspect lastResponse.toolCalls.length inside a manual loop (see Multi-step loop) or use a guardrail to enforce it.

import { complete, defineTool } from '@combycode/llm-sdk';

let calls = 0;
const getWeather = defineTool({
  name: 'get_weather',
  description: 'Get the current weather for a city.',
  params: { city: 'string' },
  execute: () => {
    calls++;
    return 'sunny';
  },
});

const t0 = performance.now();
await complete({
  model: process.env.LLM_MODEL!,
  apiKey: process.env.LLM_API_KEY,
  prompt: 'What is the weather in Paris and in Tokyo?',
  tools: [getWeather],
  maxTokens: 512,
});

console.log(JSON.stringify({ result: String(calls), ms: Math.round(performance.now() - t0) }));

The structural difference vs official SDKs: each one requires writing a per-provider fan-out loop — extract the array of tool calls, execute each, assemble a provider-specific results message (tool role array for OpenAI, user with tool_result blocks for Anthropic, user with functionResponse parts for Google). ORXA’s loop does this fan-out and assembly once, internally, for all providers. Your execute function is called once per tool request; the loop handles bundling results into the correct wire format.

Not all models batch in one turn. Some models (especially smaller ones) call tools one at a time even when multiple are needed. The loop handles this correctly either way — it just takes more steps. Test with your target model to see whether it batches.

Parallel execution order is non-deterministic. Promise.all resolves when all settle, but individual calls may finish in any order. The result message is always assembled in the model’s requested call order — the model is never confused about which result matches which call.

Tool result size matters. Large tool results increase input tokens on the follow-up step. If parallel calls each return large payloads, the combined follow-up message may approach the context window. Truncate results when possible.

Next steps: