Skip to content
Warlock.js v4

Benchmark

Performance problems hide in the operations you’re not looking at. The endpoint that’s usually fast but spikes to a second twice a day. The cache lookup that’s instant in dev but takes 300ms when the cache is cold. The third-party API that started getting slower last week and nobody noticed.

measure() is the wrapper that surfaces all of it. Time any function, classify the result against latency thresholds you pick, fire hooks for downstream observability, and optionally accumulate percentile stats over time. Around the same primitive, two optional helpers — BenchmarkProfiler for p50/p95/p99, BenchmarkSnapshots for raw error captures.

measure(name, fn, options)
│ runs fn(), times it
┌──────────────────────┐
│ BenchmarkResult │ ← discriminated by `success`
│ - success: true │
│ - value: T │
│ - latency: 247 │
│ - state: "good" │
│ - tags: { … } │
└──────┬───────────────┘
├─ profiler.record(result)? → adds latency to rolling p50/p95/p99 buckets
├─ snapshotContainer.record(…)? → stores the full result for post-mortem
├─ onComplete(result)? → success-only hook
├─ onError(result)? → error-only hook
└─ onFinish(result) → always fires last

Three things to internalize before we go further:

  1. measure() never throws. It always returns. If fn() throws, you get a BenchmarkErrorResultresult.success === false, result.error holds the thrown value. The one exception is shouldBenchmarkError: false, which re-throws (covered below).
  2. The return type is discriminated. result.success narrows the type — result.value only exists on success, result.error only on failure. TypeScript will refuse to let you access the wrong one.
  3. Hooks are fire-and-forget. onComplete/onError run, then onFinish runs, then the result is returned. Throwing inside a hook crashes the call — keep them side-effect-only.
import { measure } from "@warlock.js/core";
const result = await measure("db.findUser", () => db.users.findOne({ id }));
if (result.success) {
console.log(result.value); // the user object
console.log(result.latency); // 42
console.log(result.state); // "good"
} else {
console.error(result.error); // the thrown error
}

That’s the entire happy path. Wrap a function, get a typed result, branch on success.

Both success and error share these fields:

{
name: string; // your measurement name
latency: number; // ms (rounded)
state: "excellent" | "good" | "poor"; // see latencyRange
tags?: Record<string, string>; // whatever you passed in options.tags
startedAt: Date;
endedAt: Date;
}

Plus, discriminated by success:

// success
{ success: true, value: T }
// error
{ success: false, error: unknown }

That’s the lot. value only on success, error only on failure — the discriminant on success narrows the type for you.

Without thresholds, every successful result is "good" and every failure is "poor". Set latencyRange and you get a meaningful state:

await measure("db.findUser", () => db.users.findOne({ id }), {
latencyRange: { excellent: 100, poor: 500 },
});
// latency <= 100ms → state: "excellent"
// 100ms < latency < 500 → state: "good"
// latency >= 500ms → state: "poor"

The two boundary fields are exactly excellent (the upper bound for “excellent”) and poor (the lower bound for “poor”). Anything between is "good".

Most apps set this globally rather than per-call. Drop it in src/config/benchmark.ts and measure() reads it as the fallback — only override per-call when one operation has wildly different expectations.

src/config/benchmark.ts
import {
BenchmarkProfiler,
ConsoleChannel,
type BenchmarkConfigurations,
} from "@warlock.js/core";
const benchmarkConfig: BenchmarkConfigurations = {
enabled: true,
latencyRange: { excellent: 100, poor: 500 },
profiler: new BenchmarkProfiler({
maxSamples: 1000,
channels: [new ConsoleChannel()],
flushEvery: 60_000,
}),
};
export default benchmarkConfig;

Every measure() call now defaults to those thresholds and records into that profiler. Per-call options always win — override latencyRange on a specific measurement and the global is ignored for that call.

Three optional callbacks. onComplete or onError fires (never both), then onFinish always fires:

await measure("send-email", () => mailer.send(payload), {
latencyRange: { excellent: 200, poor: 2000 },
onComplete: (result) => metrics.record(result.latency),
onError: (result) => logger.error("email failed", result.error),
onFinish: (result) => logger.info(`${result.name}${result.state}`),
});

onFinish is the most useful default — it sees both branches and runs unconditionally. Reach for onComplete/onError when the success and failure paths need different observability (e.g., bump a different metric).

Tags — metadata you’ll thank yourself for

Section titled “Tags — metadata you’ll thank yourself for”

Tags ride along on the result and on every profiler/snapshot record. They don’t change behavior; they’re metadata you grep on later:

await measure("http.outbound", () => fetch(url), {
tags: { service: "stripe", endpoint: "/charges" },
});

When you’re staring at three operations with the same name across five services, tags are what tell you which one is which. Use them.

Selective error capture — shouldBenchmarkError

Section titled “Selective error capture — shouldBenchmarkError”

Business errors (400 Bad Request, ValidationError) are not infrastructure problems. They’re not slow because the database is hot — they’re “fast” because the validator rejected the input in two milliseconds. Including them in latency stats poisons the percentiles.

Return false to re-throw the error without producing a benchmark result:

import { ValidationError } from "@warlock.js/seal";
await measure("create-user", () => createUser(input), {
shouldBenchmarkError: (error) => !(error instanceof ValidationError),
});

The default is true — every thrown error becomes a BenchmarkErrorResult. Override only when you have a specific class of errors that should bypass benchmarking entirely.

This is the one case where measure() does throw. If shouldBenchmarkError returns false, the error propagates up to the caller — the wrapper is no longer in the picture.

Wrapping costs almost nothing — one performance.now() and a closure — but if you want a literal no-op for a hot path:

const result = await measure("hot-path", () => work(), { enabled: false });
// result.latency === 0
// result.state === "excellent"
// no hooks fire
// no profiler record, no snapshot

fn() still runs and its return value still lands in result.value. The wrapper just skips timing, hooks, and recording. Useful for keeping uniform call sites — the same code reads cleaner whether timing is on or off.

You can flip the global enabled field in src/config/benchmark.ts to turn timing off framework-wide; per-call enabled: false overrides for one site.

For high-volume operations where per-call hooks are too noisy, you want p50/p95/p99 across a window:

import { BenchmarkProfiler, ConsoleChannel, measure } from "@warlock.js/core";
const profiler = new BenchmarkProfiler({
maxSamples: 1000, // ring buffer per operation name
channels: [new ConsoleChannel()], // where stats go on flush()
flushEvery: 60_000, // auto-flush every minute
});
for (let i = 0; i < 100; i++) {
await measure(
"db.findUser",
() => db.users.findOne({ id: i }),
{ profiler, latencyRange: { excellent: 50, poor: 300 } },
);
}
const stats = profiler.stats("db.findUser");
// { p50, p90, p95, p99, avg, min, max, count, errors, errorRate, firstSeenAt, lastSeenAt }

profiler.stats(name) returns the snapshot synchronously — useful for ad-hoc inspection. profiler.flush() hands allStats() to every registered channel, then continues accumulating into the same ring buffer.

Wire it in src/config/benchmark.ts and every measure() call records by default — no per-call wiring needed.

A channel is anything that implements onFlush(stats):

interface BenchmarkChannel {
onFlush(stats: Record<string, BenchmarkStats>): void | Promise<void>;
}

Two are built in:

  • ConsoleChannel — pretty-prints a console.table per operation. Useful in dev.
  • NoopChannel — the default. Drops the call. Use when you want stats accessible via profiler.stats(name) without any external emission.

Custom channels are a one-class job:

import type { BenchmarkChannel, BenchmarkStats } from "@warlock.js/core";
export class DatadogChannel implements BenchmarkChannel {
public async onFlush(stats: Record<string, BenchmarkStats>): Promise<void> {
for (const [name, operationStats] of Object.entries(stats)) {
await datadog.gauge(`latency.${name}.p95`, operationStats.p95);
await datadog.gauge(`latency.${name}.p99`, operationStats.p99);
await datadog.gauge(`latency.${name}.error_rate`, operationStats.errorRate);
}
}
}

Pass it via channels: [new DatadogChannel()] and every flush() (manual or auto) pushes percentiles to Datadog.

Percentiles tell you “we got slow.” Snapshots tell you “here are the exact requests that got slow.” For post-mortem work:

import { BenchmarkSnapshots, measure } from "@warlock.js/core";
const snapshots = new BenchmarkSnapshots({
maxSnapshots: 100,
capture: "error", // "error" (default, safe) | "value" | "all"
});
await measure("payment.charge", () => stripe.charge(payload), {
snapshotContainer: snapshots,
});
const failed = snapshots.getSnapshots("payment.charge");
// array of full BenchmarkErrorResult — error, latency, startedAt, tags

The capture setting matters:

captureWhat’s storedMemory profile
"error"Only BenchmarkErrorResult. The default.Bounded by failure rate. Safe in production.
"value"Only BenchmarkSuccessResult<T> with the full return value.Stores T in memory.
"all"Both.Stores T in memory.

"value" and "all" keep references to whatever fn() returned. If that’s a database row, that’s fine. If it’s a streamed response or a large buffer, you’ve now kept it in memory until the ring buffer evicts it. Default to "error" unless you have a specific reason.

Wire snapshots globally in src/config/benchmark.ts the same way as the profiler.

Every useCase() wraps its pipeline in measure() by default. You get a per-use-case latency and a benchmarkResult field on every execution snapshot without writing any timing code. The defaults are reasonable:

{
enabled: true,
latencyRange: {
excellent: 100,
poor: 200,
},
}

Override per use-case via benchmarkOptions:

import { useCase } from "@warlock.js/core";
export const placeOrder = useCase<Order, PlaceOrderInput>({
name: "place_order",
handler: async (input) => placeOrderService(input),
benchmarkOptions: {
latencyRange: { excellent: 200, poor: 1000 },
tags: { domain: "orders" },
onComplete: (result) => metrics.histogram("place_order.latency", result.latency),
},
});

Set benchmarkOptions: false to disable benchmarking for one use-case — useful for use-cases that wrap genuinely long-running work where latency stats don’t carry meaning.

When the use-case has both retryOptions and benchmarkOptions, the latency you get is the total wall-clock time including retries. That’s almost always what you want for SLO tracking — your customers don’t care that you retried three times, they care it took 1.2 seconds.

const result = await measure("create-order", () => createOrderService(input));
if (!result.success) {
return response.badRequest({ error: t("order.failed") });
}
return response.successCreate({ order: result.value });

The result is your error-handling fork and your latency tracker. Less ceremony than a try/catch.

const result = await measure(
"stripe.charge",
() => stripe.charges.create({ amount, currency, source }),
{
latencyRange: { excellent: 200, poor: 3000 },
tags: { gateway: "stripe", currency },
shouldBenchmarkError: (error) => error instanceof NetworkError,
},
);

Higher thresholds (Stripe round-trips are slow), tagged for slicing, validation errors skipped.

import { measure, retry } from "@warlock.js/core";
const result = await measure("publish-event", () =>
retry(() => bus.publish(event), { count: 3, delay: 200 }),
);
console.log(result.latency); // total wall-clock, including all retry attempts

measure() on the outside captures the SLO you actually care about. See Retry for the composition story in full.

import { config, BenchmarkProfiler } from "@warlock.js/core";
const profiler = config.get("benchmark").profiler as BenchmarkProfiler;
console.table(profiler.allStats());

If you’ve wired a profiler in src/config/benchmark.ts, you can drop into a debug endpoint and dump the current percentiles at any time. No flush needed — allStats() reads the live ring buffers.

  • Name collisions aggregate. Two calls to measure("foo", …) from different code paths share one profiler bucket. Make name specific — "db.findUser", not "db.query" — so percentiles actually mean something.
  • measure() doesn’t propagate AbortSignal. If fn is cancellable, plumb the signal through yourself. The wrapper only times.
  • Don’t measure() synchronous trivia. A Math.round call isn’t worth the microsecond of overhead and the noise in your stats. Reserve measure() for things that can be slow — I/O, computation that scales with input size, anything crossing a network or disk.
  • Snapshots with "value" retain references. If value holds a streamed body or a large buffer, you’ve kept it in memory until eviction. Default capture: "error" keeps you safe.
  • shouldBenchmarkError re-throws. Make sure the caller is ready for an unwrapped throw on that error class. The discriminated-result contract holds for every other path; this one carves out an exception.
  • Hook errors crash the call. Throwing inside onComplete/onError/onFinish propagates up and discards the measurement. Keep hooks side-effect-only; wrap risky work in their own try/catch.