Run agent
ai.agent({...}) is the lowest rung of the ladder. One LLM call, optional tool loop, optional structured output. Stateless across calls.
The factory
Section titled “The factory”import { ai } from "@warlock.js/ai";import { OpenAISDK } from "@warlock.js/ai-openai";
const openai = new OpenAISDK({ apiKey: process.env.OPENAI_API_KEY! });
const myAgent = ai.agent({ name: "summarizer", // optional — anonymous gets a fingerprint model: openai.model({ name: "gpt-4o-mini" }), systemPrompt: "You are concise.", // string or SystemPromptContract tools: [searchTool, calcTool], // optional placeholders: { language: "English" }, // mustache substitution defaults maxTrips: 10, // tool-loop bound (default 10) modelOptions: { temperature: 0.2 }, // forwarded to the model output: summarySchema, // default structured-output schema middleware: [budgetMw, cacheMw], // optional streamingToolGuard: {}, // opt-in tool-call recovery on: { "agent.starting": handler }, // factory-level event handlers version: "v3-2026-05", // mirrored onto reports});The factory returns an AgentContract<TOutput>. Every execution spawns a fresh internal Execution — the factory holds no per-call state.
Anonymous agents
Section titled “Anonymous agents”name is optional. Anonymous agents receive a deterministic fingerprint based on provider, model, and tool names:
anon_openai_gpt-4o-minianon_openai_gpt-4o-mini_search+calcSame config across process restarts → same synthetic name. This matters when workflows fingerprint their structure for snapshot drift detection — composing the same anonymous agent twice produces the same identifier.
agent.execute(input, options)
Section titled “agent.execute(input, options)”The blocking call. Returns a promise of AgentResult<T>. Never throws.
const { data, text, report, usage, error } = await myAgent.execute(input);Every option is optional:
agent.execute(input, { history, // Message[] — prior turns attachments, // image attachments (strings or tagged objects) placeholders, // override factory placeholders for this call output, // override the structured-output schema for this call responseSchema, // hand-crafted JSON Schema escape hatch systemPrompt, // per-call system-prompt override repair, // { maxAttempts } — re-ask on validation failure signal, // AbortSignal sessionId, // stamped onto every report node streamingToolGuard, // per-call override on, // per-call event handlers});Call-site output fully replaces the factory’s output for that run — no merging.
agent.stream(input, options)
Section titled “agent.stream(input, options)”Same surface, but you get an async iterable of typed events plus a .result promise:
const stream = myAgent.stream(input);
for await (const event of stream) { if (event.type === "agent.trip.streaming") { process.stdout.write(event.delta); }}
const result = await stream.result;The same lifecycle events fire whether you call execute or stream — streaming adds per-token deltas on top. The final envelope on stream.result is identical to what execute would resolve with.
The result envelope
Section titled “The result envelope”type AgentResult<T> = { type: "agent"; data?: T; // typed structured output when `output` schema was supplied text?: string; // raw final LLM text report: AgentReport; usage: Usage; // aggregated tokens + per-channel cost breakdown error?: AIError;};
type AgentReport = { status: "completed" | "failed" | "cancelled"; startedAt: string; endedAt: string; duration: number; model: { name: string; provider: string }; trips: LLMTrip[]; toolCalls: ToolCall[];};trips and toolCalls are flat lists with timing and outcome on every entry — easy to write reporting on top of.
Cost truth — usage
Section titled “Cost truth — usage”usage is more than a token total. It carries the full per-channel breakdown so cost dashboards can tell how a number was reached:
type Usage = { input: number; // prompt tokens (includes cachedTokens) output: number; // completion tokens (includes reasoningTokens) total: number; cachedTokens?: number; // input served from the provider's prompt cache (read hits) cacheWriteTokens?: number;// input WRITTEN to the cache this call (Anthropic cache_creation) reasoningTokens?: number; // the reasoning/thinking subset of output (priced separately when the model does) cost?: { // computed at emit time from tokens × the model's declared pricing input?: number; output?: number; cachedInput?: number; cachedOutput?: number; };};cost is captured as a historical fact at emit time — stored reports stay accurate even after the upstream pricing table changes. It’s undefined when no pricing is available (legacy adapters, unknown model names). For one scalar total, sum the populated fields:
const total = (usage.cost?.input ?? 0) + (usage.cost?.output ?? 0) + (usage.cost?.cachedInput ?? 0) + (usage.cost?.cachedOutput ?? 0);All five provider adapters report this surface. See the cost tracking recipe for rolling it up across a report tree.
Reasoning and prompt caching
Section titled “Reasoning and prompt caching”Reasoning-capable and prompt-caching models accept extra modelOptions, gated by the model’s own capabilities — an adapter that lacks the feature ignores the option rather than forwarding an unsupported parameter:
await myAgent.execute(input, { modelOptions: { reasoning: { effort: "high" }, // honored when capabilities.reasoning cacheControl: { breakpoints: 1 }, // honored when capabilities.promptCaching },});reasoning.effort maps to the provider-native control (OpenAI reasoning_effort); reasoning.maxTokens caps the thinking budget. cacheControl.breakpoints translates into provider prompt-cache breakpoints. The resulting reasoning / cache token counts flow back through usage.
Structured output
Section titled “Structured output”import { v, type Infer } from "@warlock.js/seal";
const summarySchema = v.object({ summary: v.string(), keyPoints: v.array(v.string()).min(1),});
const result = await myAgent.execute(input, { output: summarySchema });
if (result.data) { // typed as Infer<typeof summarySchema>}Bake it into the agent for end-to-end typing:
const summarizer = ai.agent({ model: openai.model({ name: "gpt-4o-mini" }), output: summarySchema,});
const result = await summarizer.execute(input);// ^? AgentResult<{ summary?: string; keyPoints?: string[] }>Adapters with native capabilities.structuredOutput forward the schema as JSON Schema. Adapters without it get a soft “respond in JSON only” instruction. Client-side validation always runs.
Repair on validation failure
Section titled “Repair on validation failure”await myAgent.execute(input, { output: schema, repair: { maxAttempts: 1 },});Off by default. Each repair attempt counts against maxTrips. The framework tells the model what failed so it can correct.
Image attachments
Section titled “Image attachments”await myAgent.execute("What's in this?", { attachments: ["./photo.png", "https://cdn.example.com/cat.jpg"],});Shorthand strings infer the image kind from extension. Tagged form for explicit control:
attachments: [ { type: "image", source: "./photo" }, { type: "image", source: { base64: "...", mediaType: "image/png" } },];The model must declare capabilities.vision or you get a typed error at the boundary. OpenAI adapter auto-infers vision from the model name; override with openai.model({ name, vision: true }).
Cancellation
Section titled “Cancellation”const ctrl = new AbortController();const promise = myAgent.execute(input, { signal: ctrl.signal });
setTimeout(() => ctrl.abort("too slow"), 30_000);
const { error, report } = await promise;
if (report.status === "cancelled") { // error is an AgentCancelledError (code "AGENT_CANCELLED") // carrying `cancelledAt` + the abort `reason`}Between-trip abort is guaranteed. Mid-trip is best-effort — the provider SDK decides whether the in-flight request can be cut. report.status === "cancelled" and error.category === "cancelled" are stable signals.
Sessions
Section titled “Sessions”sessionId is metadata you control. The framework stamps it onto every report node produced during the run so downstream pipelines (cost dashboards, log aggregators) can group by session without joining the report tree.
const sessionId = "user_42_2026-05-12";
await myAgent.execute("what's my order?", { sessionId });await myAgent.execute("cancel it", { sessionId });No implicit persistence. No automatic history. Just a string that rides along.
Events
Section titled “Events”Names:
agent.starting,agent.trip.started,agent.trip.streaming,agent.trip.completedagent.tool.calling,agent.tool.called,agent.tool.failedagent.completed,agent.error
Three subscription tiers, fired in order:
// Factory level — fires for every executeai.agent({ model, on: { "agent.starting": handler } });
// Instance level — returns an unsubscribeconst unsubscribe = myAgent.on("agent.error", handler);
// Per-call level — fires only this runawait myAgent.execute(input, { on: { "agent.trip.completed": handler } });Every payload carries runId and rootRunId so nested runs (workflow → agent → tool) stitch into one trace.
Streaming tool guard
Section titled “Streaming tool guard”A streamingToolGuard config recovers a tool call when the model emits the structured input as text in the content stream instead of as a real tool_call. See the streaming tool guard recipe for the full story.
ai.agent({ model: someFastModel, tools: [suggestFollowupsTool], streamingToolGuard: {}, // empty object = on with defaults});Off by default. Set this explicitly on agents whose registered tools have been observed to leak.
DX helpers around agents
Section titled “DX helpers around agents”A handful of small utilities take the friction out of common agent patterns. They’re not new primitives — each returns a plain agent, model, or result you already know how to use.
ai.systemPrompt.fromFile(path)
Section titled “ai.systemPrompt.fromFile(path)”Seed a system prompt from a file, read once at construction:
const writer = ai.agent({ model, systemPrompt: ai.systemPrompt.fromFile("./prompts/writer.md"),});Also available as SystemPrompt.fromFile(path).
Executables auto-adapt in tools: [...]
Section titled “Executables auto-adapt in tools: [...]”You no longer need to call .asTool() by hand to compose primitives. Pass a workflow, supervisor, or orchestrator straight into an agent’s tools array and it is auto-adapted into a tool:
const concierge = ai.agent({ model, tools: [searchTool, refundWorkflow, supportSupervisor], // workflow + supervisor auto-adapted});ai.fallbackModel(models, opts?)
Section titled “ai.fallbackModel(models, opts?)”Wrap an ordered model list that fails over to the next on transient provider errors (rate limits, timeouts) — drop it in anywhere a model is expected:
const model = ai.fallbackModel([ openai.model({ name: "gpt-4o" }), anthropic.model({ name: "claude-sonnet-4" }),]);
const agent = ai.agent({ model });ai.batch(executable, items, opts?)
Section titled “ai.batch(executable, items, opts?)”Run any executable over a dataset with bounded concurrency and per-item retry. A batch never fails as a whole — each item’s outcome lives on its own BatchItemResult:
const { items, data, usage, report } = await ai.batch(summarizer, articles, { concurrency: 4, retry: { attempts: 3, backoff: "exponential" }, onItem: (item) => log(item.index, item.status),});
console.log(`${report.succeeded}/${report.total} ok, ${usage.total} tokens`);data is the positional array of successful items’ result.data (with undefined in failed slots); items is the per-item breakdown. usage rolls up across every item.
Evaluation — agent.eval(...)
Section titled “Evaluation — agent.eval(...)”Run a scored evaluation suite against an agent and get an aggregate EvalReport. Each case runs through execute(input) and is scored by the resolved scorers; a case passes only when every scorer passes and the agent didn’t error:
const report = await myAgent.eval({ cases: [ { name: "capital", input: "Capital of Egypt?", expected: "Cairo" }, ], scorers: [ai.eval.contains()], // exact / contains / predicate(fn) / judge(config)});
expect(report.passed).toBe(true);ai.eval.judge({ agent, rubric }) is LLM-as-judge. In Vitest, registerAiMatchers() adds toRouteTo / toConverge / toPassStep / toOutputShape. eval() never throws on a case failure — failures surface on the report.
ai.router(), ai.fanOut(), ai.mockRouter()
Section titled “ai.router(), ai.fanOut(), ai.mockRouter()”Supervisor-oriented helpers — see Run supervisor for ai.router() (generate the routing agent) and ai.fanOut() (voting / self-consistency). ai.mockRouter(decisions) replays canned routing decisions for supervisor tests.
When to graduate to the next rung
Section titled “When to graduate to the next rung”- Fixed pipeline of several agents → Run workflow.
- One input routed across specialists → Run supervisor.
- Multi-turn conversation with persistent session → Run orchestrator — durable session state across runs.
- Plan generated up front, then executed → Planner.
Related
Section titled “Related”- Define tools — typed tools the agent can call.
- Write system prompts — composable prompts.
- Handle errors — the typed error hierarchy.
- Streaming tool guard recipe — recovering leaked tool calls.