Middleware
Middleware in @warlock.js/ai is the pipeline that wraps an agent run. It exists for cross-cutting concerns — budgeting, content guardrails, caching, observability — that don’t belong in any one tool or system prompt.
This page is the mental model. For the API surface and authoring guide see Attach middleware.
Three granularities, one object
Section titled “Three granularities, one object”One middleware is one object. It can hook into any subset of three levels:
const myMiddleware: AgentMiddleware = { name: "my-mw", execute: { before(ctx) {...}, after(ctx, result) {...}, onError(ctx, error) {...} }, trip: { before(ctx) {...}, after(ctx, response) {...}, onError(ctx, error) {...} }, tool: { before(ctx) {...}, after(ctx, result) {...}, onError(ctx, error) {...} },};execute— wraps the whole run. Fires once peragent.execute()call.trip— wraps each LLM round-trip. Fires once per trip (1..N per execute).tool— wraps each tool dispatch. Fires once per tool call.
Hooks run in the obvious order: before outermost, after innermost-up, onError only when something throws.
What you can do
Section titled “What you can do”Each hook can:
- Observe — read context, write metrics. Most middleware lives here.
- Transform — mutate
ctx.statefor downstream middleware to read. - Short-circuit — return a synthetic result from
beforeto skip the real work. Semantic cache uses this to serve cached responses without hitting the model. - Reject — throw a typed
AIErrorsubclass. Budget and guardrail use this. - Recover — return a value from
onErrorto suppress the error.
The three built-ins
Section titled “The three built-ins”ai.middleware.budget
Section titled “ai.middleware.budget”Cumulative token / USD cap across all trips of one execution.
ai.middleware.budget({ maxTokens: 50_000, maxCostUSD: 0.5, onExceeded: "abort", // or "warn"});Breach surfaces BudgetExceededError with unit, limit, actual. Warn mode logs and continues — useful for measuring before enforcing.
ai.middleware.guardrail
Section titled “ai.middleware.guardrail”Pre / post content checks.
ai.middleware.guardrail({ inputCheck: async (text) => text.match(/\bSSN\b/) ? { ok: false, reason: "pii" } : { ok: true }, outputCheck: async (text) => text.length > 10_000 ? { ok: false, reason: "too-long" } : { ok: true },});Rejection surfaces GuardrailViolationError with phase: "input" | "output". Output checks fire BEFORE tool dispatch — a rejected response means the tools it requested never run.
ai.middleware.semanticCache
Section titled “ai.middleware.semanticCache”Two-tier cache — exact-match first, then vector similarity.
ai.middleware.semanticCache({ embedder: openai.embedder({ name: "text-embedding-3-small" }), threshold: 0.95, ttlMs: 60 * 60 * 1000,});Hits return a synthetic ModelResponse with usage: { total: 0 } — cost dashboards reflect the saving honestly. Trip-zero only: tool-using loops never cache tool-call responses (would infinite-loop).
Ordering matters
Section titled “Ordering matters”The canonical order is [cache, budget, guardrail, observability]. Three reasons:
- Cache outermost when guardrails are present. Guardrails reject bad output by throwing in
trip.after.afterhooks run innermost-up, so if the guardrail is inside the cache, the cache writes the bad response BEFORE the guardrail can reject it. Outermost cache means rejected outputs never poison it. - Budget before guardrails. Guardrails may call classifiers with their own token costs.
- Observability last. It should see the final decision every other middleware made.
The order you pass to middleware: [...] IS the order they nest. First in the array = outermost.
State scoping
Section titled “State scoping”Middleware sees ctx.state — a fresh Map<string, unknown> per execute() call. Use it for per-run scratch space (timing markers, accumulated metrics). Never close over module-level mutable state — concurrent calls would step on each other.
ctx.state does NOT cross execute() boundaries. One call → one fresh map.
Where middleware does and doesn’t apply
Section titled “Where middleware does and doesn’t apply”| Place | Middleware effect |
|---|---|
ai.agent({ middleware: [...] }) | Wraps every agent.execute(). |
Workflow step with agent: myAgent | The agent’s own middleware fires normally inside the step. |
workflow.asTool() called from an agent | The calling agent’s tool-level middleware wraps the workflow. |
| Step-level / workflow-level / supervisor-level middleware | Does NOT exist in v1. Use agent-level middleware on agents inside the workflow / supervisor. |
Custom middleware — when to write your own
Section titled “Custom middleware — when to write your own”Write one when:
- You need to log differently per agent (logging into a per-tenant channel, for example).
- You’re measuring something the built-ins don’t expose (queue depth, downstream service health).
- You’re implementing a custom cache backend the
semanticCacheshape doesn’t fit.
Don’t write one when:
- The concern is one tool’s business — put it in the tool.
- The concern is the prompt — put it in the system prompt.
- The concern is the agent loop —
execute()already has it.
Related
Section titled “Related”- Attach middleware — API surface, authoring rules, ordering invariants.
- Handle errors —
BudgetExceededError,GuardrailViolationError. - Persist AI data —
defaultStorefor semantic cache.