Workflows
A workflow is a named, ordered set of steps with a stable signature. Each step is exactly one of: an agent call, a free-form run function, or a parallel group of children. To run another workflow as a step, wrap it with workflow.asTool() and call it from a run step. Workflows are durable — snapshot to any CacheDriver from @warlock.js/cache — observable, cancellable, resumable.
This page is the mental model. For the API surface see Run workflow.
When to reach for a workflow
Section titled “When to reach for a workflow”- Fixed pipeline shape known at author time: fetch → extract → classify → save.
- Survive a crash mid-pipeline and pick up where you left off.
- Mix LLM steps with non-LLM steps — embeddings, DB writes, external API calls — under one envelope.
- Branch on intermediate results —
nextStepreads completed step outputs. - Run children in parallel when they’re independent.
Reach for a supervisor when the shape changes per call. Reach for an orchestrator (v2) when the session matters across runs.
The step lifecycle
Section titled “The step lifecycle”Every step runs through the same phases:
skip? → before? → (run | agent | parallel) → output.extract (+ schema) → after? → nextStep?| Phase | What it owns |
|---|---|
skip | Return true to skip the step entirely. Output becomes undefined. |
before | Pre-work — fetch resources, set state, validate. |
run | The core non-agent work — pure code. |
agent | An agent to dispatch. input(ctx) builds the prompt from current state. |
parallel | A list of child steps that fan out and settle together. |
output.extract | Pull the step’s output value out of ctx. Schema-validated if schema is supplied. |
after | Post-work — save, notify, emit metrics. |
nextStep | Step-level routing on success. Return { goto: "stepName" }, { end: true }, or void to fall through. |
onFailure | Step-level recovery after retries exhaust. |
onCancel | Cleanup if the signal aborts in flight. |
Errors in before / run / agent / after / output are retryable per the step’s retry config. Errors in nextStep / onFailure terminate the workflow immediately — those are programmer errors, not transient failures.
Context: input vs state vs context
Section titled “Context: input vs state vs context”type WorkflowContext<TInput, TState, TContext> = { readonly input: TInput; // frozen — the durable cause readonly context: TContext; // frozen — per-execution, never persisted readonly steps: Record<string, StepSnapshot>; // frozen snapshots of completed steps state: TState; // mutable — current shared state readonly agentResult?: AgentResult<unknown>; // set when current step has an agent readonly runId: string; readonly signal?: AbortSignal; readonly startedAt: Date;};The three buckets answer three different questions:
input— what are we processing? Frozen. Replayed verbatim on resume.context— who is running it? Tenancy, user, locale, traceId. Frozen. Never persisted — callers pass fresh on everyexecute()andresume().state— what’s in flight right now? Mutable. Cloned on each retry attempt.
The split matters because of resume: when you re-run a workflow with resume(runId), the framework rehydrates input from the snapshot but needs context fresh from the caller (DB connections, request IDs, current user). Mixing them would persist things you don’t want persisted.
Resume invariant. Persistence-scoping fields in context (e.g. organizationId) MUST match across resume calls. The framework can’t enforce this — it doesn’t fingerprint context — but mismatching it silently corrupts data.
State vs steps[x].output — performance
Section titled “State vs steps[x].output — performance”ctx.state.smallFlag = true; // cheap — clones on retryctx.state.giantHtml = await fetch(...); // expensive — same clone on every retryHeuristic: small control data in state, large artifacts in the producer’s output.extract and read via ctx.steps[prev].output. state is cloned on every attempt; steps is cloned once when the step commits.
Parallel groups
Section titled “Parallel groups”ai.step({ name: "generate", parallel: [ ai.step({ name: "draft", agent: writerAgent, input, output }), ai.step({ name: "suggest", agent: kbAgent, input, output }), ],});Children share state — last-write-wins on conflict. They settle atomically: any child failing doesn’t cancel siblings; the parent step records the first child’s error. The checkpoint writes once after all children settle.
Routing
Section titled “Routing”Two routing hooks:
nextStep(ctx)— fires on success. Reads completed step outputs, returns{ goto: "stepName" },{ end: true }, orvoid.onFailure(ctx, error)— fires after retries exhaust. Same return shape. Use it for typed recovery — fall back to a cheaper model, escalate to a human, halt with a clean error.
Guards: maxSteps (default 100) catches infinite loops. loopWarnAfter (default 5 revisits of one step) emits workflow.loop.warning.
Retries
Section titled “Retries”retry: { attempts: 3, backoff: "exponential", // "none" | "linear" | "exponential" | custom fn retryOn: (error, attempt) => true, onRetry: (attempt, error) => {},}Exponential defaults: 500ms → 1s → 2s → 4s → 8s, capped at 30s. AbortError short-circuits retry — cancellation is final.
Snapshot resume
Section titled “Snapshot resume”After every step settles, the workflow checkpoints to its snapshotStore (or the global ai.config({ defaultStore })). On resume:
- Read the snapshot for
runId. - Compute the current signature from the workflow definition.
- If signatures match → rehydrate state, continue from snapshot’s
next. - If signatures differ → throw
WorkflowDriftErrorwithout executing.
Drift is the framework refusing to silently corrupt your data. You then choose: discard the snapshot, force-resume (escape hatch for trivial edits), or migrate manually. See Persist AI data.
Result envelope
Section titled “Result envelope”type WorkflowResult<TOutput> = { type: "workflow"; data?: TOutput; // from workflow.output.extract report: WorkflowReport; // runId, signature, status, timings, per-step snapshots usage: Usage; // aggregated across all agent calls error?: AIError;};report.steps[name] carries a frozen StepSnapshot per step with output, status, attempts, attemptHistory, timings, and (for parallel groups) nested children.
When a workflow isn’t the right shape
Section titled “When a workflow isn’t the right shape”- Unknown shape at author time — wait for
ai.planner()(v3), or model it as a supervisor where the router decides. - Quality loop until goal met — use
ai.supervisor()withevaluate. - Multi-turn conversation with persistent session — orchestrator (v2). For now, model as supervisor + history.
- Iterate a runtime list of items — wrap a workflow with
ai.batch()utility.
Related
Section titled “Related”- Run workflow — the API surface.
- Supervisors — the next rung up.
- Persist AI data — snapshot resume + drift handling.