Skip to content
Warlock.js v4.4.0

Recipe — Provider fallback across OpenAI and Anthropic

Scenario. Your agent runs on OpenAI. During a regional incident OpenAI starts returning 429s and timeouts, and every customer request fails. You want the agent to transparently fail over to Anthropic for the duration of the blip — without retrying on errors that would fail identically downstream (a bad API key, an oversized prompt, a content-filter block), since those only burn the backup’s budget too.

ai.fallbackModel is a drop-in ModelContract that wraps an ordered list of models and advances to the next one only on a transient provider error. Hand it to any agent in place of a single model and the fall-over is invisible to the rest of your code.

Construct one SDK instance per provider, then order the models primary-first. The wrapper fronts the primary model’s identity, capabilities, and pricing for its whole lifetime.

import { ai } from "@warlock.js/ai";
import { OpenAISDK } from "@warlock.js/ai-openai";
import { AnthropicSDK } from "@warlock.js/ai-anthropic";
const openai = new OpenAISDK({
apiKey: process.env.OPENAI_API_KEY!,
pricing: { "gpt-4o": { input: 2.5, output: 10 } },
});
const anthropic = new AnthropicSDK({
apiKey: process.env.ANTHROPIC_API_KEY!,
pricing: { "claude-sonnet-4-6": { input: 3, output: 15 } },
});
// Tries OpenAI first; on a transient provider error, falls over to Anthropic.
const resilientModel = ai.fallbackModel([
openai.model({ name: "gpt-4o" }),
anthropic.model({ name: "claude-sonnet-4-6" }),
]);
const agent = ai.agent({
model: resilientModel,
systemPrompt: ai.systemPrompt().instruction("Answer the support question concisely."),
});

What the default fails over on — and what it doesn’t

Section titled “What the default fails over on — and what it doesn’t”

With no retryOn, the chain advances only on the built-in transient set:

CodeFalls over?Why
PROVIDER_RATE_LIMITyes429 — the backup likely has headroom
PROVIDER_TIMEOUTyesslow upstream — try a different one
PROVIDER_ERRORyesgeneric 5xx / network failure
PROVIDER_AUTHnoa bad key fails identically downstream
CONTEXT_LENGTH_EXCEEDEDnoan oversized prompt fails on every model
CONTENT_FILTERnoa blocked prompt is blocked everywhere
PROVIDER_INVALID_REQUESTnoa malformed request fails everywhere

Non-transient failures re-throw immediately with their original typed AIError — falling over on them only burns the backup’s budget on input that fails the same way.

execute() never throws — a success on any model in the chain returns normally, and the usage is aggregated across every model that was attempted. When the whole chain is exhausted, the final model’s error surfaces on result.error.

const result = await agent.execute("How do I reset my password?", {
sessionId: tenantId,
});
if (result.error) {
// Every model in the chain failed. The error is the LAST model's, with
// its original code preserved.
logger.error("fallback chain exhausted", {
code: result.error.code,
category: result.error.category,
});
throw result.error;
}
// Which providers were burned before we got an answer? Empty when the
// primary succeeded outright; one entry per failed-over model otherwise.
for (const attempt of resilientModel.lastAttempts) {
metrics.increment("ai.provider.failover", {
from: attempt.provider,
model: attempt.modelName,
});
}
return result.text;

lastAttempts is a FallbackAttempt[] ({ modelName, provider, error }), overwritten on each call — read it right after the execute() that produced it.

retryOn is additive over the default. Pass an explicit AIErrorCode[] to pin exactly which codes fail over — for example, rate-limit and timeout only, treating generic 5xx as fatal:

const rateLimitAndTimeoutOnly = ai.fallbackModel(
[openai.model({ name: "gpt-4o" }), anthropic.model({ name: "claude-sonnet-4-6" })],
{ retryOn: ["PROVIDER_RATE_LIMIT", "PROVIDER_TIMEOUT"] },
);

Or pass a predicate for arbitrary branching — e.g. fall over on any rate-limit, but only on timeouts after the provider has been slow for a while:

import { ProviderRateLimitError, ProviderTimeoutError } from "@warlock.js/ai";
const smartFallback = ai.fallbackModel(
[openai.model({ name: "gpt-4o" }), anthropic.model({ name: "claude-sonnet-4-6" })],
{
retryOn: (error) =>
error instanceof ProviderRateLimitError ||
error instanceof ProviderTimeoutError,
},
);

The list takes any number of models, tried in order. A common shape is cheap-primary, premium-backup, different-provider-last — the chain only reaches the expensive model when the cheap ones are genuinely down:

const tieredModel = ai.fallbackModel([
openai.model({ name: "gpt-4o-mini" }), // cheap, tried first
openai.model({ name: "gpt-4o" }), // same provider, more capable
anthropic.model({ name: "claude-sonnet-4-6" }), // different provider, last resort
]);

Each successful call’s usage is aggregated across every model attempted, and cost merges per channel — an unpriced model in the chain never erases a priced one’s cost in the rolled-up usage.cost.

  • Pick a provider — one SDK per provider, capabilities, and ModelPricing.
  • Handle errors — the AIErrorCode union and which codes are transient.
  • Cost per tenant — aggregating the usage a failed-over run still reports.