Skip to content

4-layer architecture

ORXA: LLM-SDK is built around four composable layers. You use only what you need — no abstraction tax.

Layer 4: Server
OpenAI-compatible HTTP server (drop-in replacement)
Layer 3: AgentLoop
Tool execution, multi-step reasoning, HITL approval, guardrails
Layer 2: LLMClient
Provider routing, cost accounting, observability hooks
Layer 1: NetworkEngine
HTTP queue, rate-limit governance, retry, auth injection

The foundation. All HTTP calls from every layer above route through a single NetworkEngine instance. This is where:

  • Per-provider queues enforce rate limits without a gateway
  • Retry logic with exponential backoff handles transient errors
  • Auth injection attaches API keys, headers, and request signing
  • Observability spans are opened and closed around every request
import { createEngine } from '@combycode/llm-sdk';
const engine = createEngine({
maxConcurrency: 5,
retryOn: [429, 503],
});

The provider adapter. Takes your abstract request (model string, prompt, tools) and translates it to the provider’s native API shape. Handles:

  • Multi-provider routing from one model string
  • Response normalization - always returns { text, usage, cost, ... }
  • Cost accounting from the bundled model catalog
  • Streaming via async iterables
import { createLLM } from '@combycode/llm-sdk';
const llm = createLLM({ engine, apiKey: process.env.ANTHROPIC_KEY });
const { text } = await llm.complete({ model: 'anthropic/claude-haiku-4-5', prompt: '...' });

Multi-step reasoning with tool execution. Runs the tool loop until the model signals completion. Adds:

  • Tool execution - defineTool() with typed params and handlers
  • Guardrails - declarative input/output tripwires that halt unsafe runs
  • HITL approval - PermissionPolicy with 'ask' effect; durable pause/resume
  • Agent handoffs - structured sub-agent delegation via handoff()
import { runAgent, defineTool } from '@combycode/llm-sdk';
const result = await runAgent({
model: 'openai/gpt-4o',
apiKey: process.env.OPENAI_KEY,
prompt: 'What is the weather where I am?',
tools: [getUserCity, getWeather],
});

An OpenAI-compatible HTTP server built on top of Layer 2/3. Drop it in as a replacement for the OpenAI API endpoint — your existing clients (LangChain, LlamaIndex, any OpenAI-compatible tool) talk to it unchanged while you gain cost tracking, observability, and multi-provider routing.

import { createServer } from '@combycode/llm-sdk';
const server = createServer({ engine, port: 8080 });
await server.start();
// Now serving POST /v1/chat/completions, POST /v1/responses, etc.

For most use cases, complete() and stream() are all you need. They create a NetworkEngine and LLMClient internally so you don’t have to wire the layers manually.

When you need cost tracking, observability hooks, or agent governance, pass an explicit engine to share state across calls.