4-layer architecture

ORXA: LLM-SDK is built around four composable layers. You use only what you need — no abstraction tax.

Layer 4: Server
  OpenAI-compatible HTTP server (drop-in replacement)

Layer 3: AgentLoop
  Tool execution, multi-step reasoning, HITL approval, guardrails

Layer 2: LLMClient
  Provider routing, cost accounting, observability hooks

Layer 1: NetworkEngine
  HTTP queue, rate-limit governance, retry, auth injection

Layer 1 - NetworkEngine

The foundation. All HTTP calls from every layer above route through a single NetworkEngine instance. This is where:

Per-provider queues enforce rate limits without a gateway
Retry logic with exponential backoff handles transient errors
Auth injection attaches API keys, headers, and request signing
Observability spans are opened and closed around every request

import { createEngine } from '@combycode/llm-sdk';

const engine = createEngine({
  maxConcurrency: 5,
  retryOn: [429, 503],
});

Layer 2 - LLMClient

The provider adapter. Takes your abstract request (model string, prompt, tools) and translates it to the provider’s native API shape. Handles:

Multi-provider routing from one model string
Response normalization - always returns { text, usage, cost, ... }
Cost accounting from the bundled model catalog
Streaming via async iterables

import { createLLM } from '@combycode/llm-sdk';

const llm = createLLM({ engine, apiKey: process.env.ANTHROPIC_KEY });
const { text } = await llm.complete({ model: 'anthropic/claude-haiku-4-5', prompt: '...' });

Layer 3 - AgentLoop

Multi-step reasoning with tool execution. Runs the tool loop until the model signals completion. Adds:

Tool execution - defineTool() with typed params and handlers
Guardrails - declarative input/output tripwires that halt unsafe runs
HITL approval - PermissionPolicy with 'ask' effect; durable pause/resume
Agent handoffs - structured sub-agent delegation via handoff()

import { runAgent, defineTool } from '@combycode/llm-sdk';

const result = await runAgent({
  model: 'openai/gpt-4o',
  apiKey: process.env.OPENAI_KEY,
  prompt: 'What is the weather where I am?',
  tools: [getUserCity, getWeather],
});

Layer 4 - Server

An OpenAI-compatible HTTP server built on top of Layer 2/3. Drop it in as a replacement for the OpenAI API endpoint — your existing clients (LangChain, LlamaIndex, any OpenAI-compatible tool) talk to it unchanged while you gain cost tracking, observability, and multi-provider routing.

import { createServer } from '@combycode/llm-sdk';

const server = createServer({ engine, port: 8080 });
await server.start();
// Now serving POST /v1/chat/completions, POST /v1/responses, etc.

The `complete()` shortcut

For most use cases, complete() and stream() are all you need. They create a NetworkEngine and LLMClient internally so you don’t have to wire the layers manually.

When you need cost tracking, observability hooks, or agent governance, pass an explicit engine to share state across calls.