4-layer architecture
ORXA: LLM-SDK is built around four composable layers. You use only what you need — no abstraction tax.
Layer 4: Server OpenAI-compatible HTTP server (drop-in replacement)
Layer 3: AgentLoop Tool execution, multi-step reasoning, HITL approval, guardrails
Layer 2: LLMClient Provider routing, cost accounting, observability hooks
Layer 1: NetworkEngine HTTP queue, rate-limit governance, retry, auth injectionLayer 1 - NetworkEngine
Section titled “Layer 1 - NetworkEngine”The foundation. All HTTP calls from every layer above route through a single NetworkEngine instance. This is where:
- Per-provider queues enforce rate limits without a gateway
- Retry logic with exponential backoff handles transient errors
- Auth injection attaches API keys, headers, and request signing
- Observability spans are opened and closed around every request
import { createEngine } from '@combycode/llm-sdk';
const engine = createEngine({ maxConcurrency: 5, retryOn: [429, 503],});Layer 2 - LLMClient
Section titled “Layer 2 - LLMClient”The provider adapter. Takes your abstract request (model string, prompt, tools) and translates it to the provider’s native API shape. Handles:
- Multi-provider routing from one model string
- Response normalization - always returns
{ text, usage, cost, ... } - Cost accounting from the bundled model catalog
- Streaming via async iterables
import { createLLM } from '@combycode/llm-sdk';
const llm = createLLM({ engine, apiKey: process.env.ANTHROPIC_KEY });const { text } = await llm.complete({ model: 'anthropic/claude-haiku-4-5', prompt: '...' });Layer 3 - AgentLoop
Section titled “Layer 3 - AgentLoop”Multi-step reasoning with tool execution. Runs the tool loop until the model signals completion. Adds:
- Tool execution -
defineTool()with typed params and handlers - Guardrails - declarative input/output tripwires that halt unsafe runs
- HITL approval -
PermissionPolicywith'ask'effect; durable pause/resume - Agent handoffs - structured sub-agent delegation via
handoff()
import { runAgent, defineTool } from '@combycode/llm-sdk';
const result = await runAgent({ model: 'openai/gpt-4o', apiKey: process.env.OPENAI_KEY, prompt: 'What is the weather where I am?', tools: [getUserCity, getWeather],});Layer 4 - Server
Section titled “Layer 4 - Server”An OpenAI-compatible HTTP server built on top of Layer 2/3. Drop it in as a replacement for the OpenAI API endpoint — your existing clients (LangChain, LlamaIndex, any OpenAI-compatible tool) talk to it unchanged while you gain cost tracking, observability, and multi-provider routing.
import { createServer } from '@combycode/llm-sdk';
const server = createServer({ engine, port: 8080 });await server.start();// Now serving POST /v1/chat/completions, POST /v1/responses, etc.The complete() shortcut
Section titled “The complete() shortcut”For most use cases, complete() and stream() are all you need. They create a NetworkEngine and LLMClient internally so you don’t have to wire the layers manually.
When you need cost tracking, observability hooks, or agent governance, pass an explicit engine to share state across calls.