Benchmark
Performance problems hide in the operations you’re not looking at. The endpoint that’s usually fast but spikes to a second twice a day. The cache lookup that’s instant in dev but takes 300ms when the cache is cold. The third-party API that started getting slower last week and nobody noticed.
measure() is the wrapper that surfaces all of it. Time any function, classify the result against latency thresholds you pick, fire hooks for downstream observability, and optionally accumulate percentile stats over time. Around the same primitive, two optional helpers — BenchmarkProfiler for p50/p95/p99, BenchmarkSnapshots for raw error captures.
The mental model
Section titled “The mental model” measure(name, fn, options) │ │ runs fn(), times it ▼ ┌──────────────────────┐ │ BenchmarkResult │ ← discriminated by `success` │ - success: true │ │ - value: T │ │ - latency: 247 │ │ - state: "good" │ │ - tags: { … } │ └──────┬───────────────┘ │ ├─ profiler.record(result)? → adds latency to rolling p50/p95/p99 buckets ├─ snapshotContainer.record(…)? → stores the full result for post-mortem ├─ onComplete(result)? → success-only hook ├─ onError(result)? → error-only hook └─ onFinish(result) → always fires lastThree things to internalize before we go further:
measure()never throws. It always returns. Iffn()throws, you get aBenchmarkErrorResult—result.success === false,result.errorholds the thrown value. The one exception isshouldBenchmarkError: false, which re-throws (covered below).- The return type is discriminated.
result.successnarrows the type —result.valueonly exists on success,result.erroronly on failure. TypeScript will refuse to let you access the wrong one. - Hooks are fire-and-forget.
onComplete/onErrorrun, thenonFinishruns, then the result is returned. Throwing inside a hook crashes the call — keep them side-effect-only.
The 30-second look
Section titled “The 30-second look”import { measure } from "@warlock.js/core";
const result = await measure("db.findUser", () => db.users.findOne({ id }));
if (result.success) { console.log(result.value); // the user object console.log(result.latency); // 42 console.log(result.state); // "good"} else { console.error(result.error); // the thrown error}That’s the entire happy path. Wrap a function, get a typed result, branch on success.
BenchmarkResult — what you get back
Section titled “BenchmarkResult — what you get back”Both success and error share these fields:
{ name: string; // your measurement name latency: number; // ms (rounded) state: "excellent" | "good" | "poor"; // see latencyRange tags?: Record<string, string>; // whatever you passed in options.tags startedAt: Date; endedAt: Date;}Plus, discriminated by success:
// success{ success: true, value: T }
// error{ success: false, error: unknown }That’s the lot. value only on success, error only on failure — the discriminant on success narrows the type for you.
latencyRange — classifying speed
Section titled “latencyRange — classifying speed”Without thresholds, every successful result is "good" and every failure is "poor". Set latencyRange and you get a meaningful state:
await measure("db.findUser", () => db.users.findOne({ id }), { latencyRange: { excellent: 100, poor: 500 },});
// latency <= 100ms → state: "excellent"// 100ms < latency < 500 → state: "good"// latency >= 500ms → state: "poor"The two boundary fields are exactly excellent (the upper bound for “excellent”) and poor (the lower bound for “poor”). Anything between is "good".
Most apps set this globally rather than per-call. Drop it in src/config/benchmark.ts and measure() reads it as the fallback — only override per-call when one operation has wildly different expectations.
import { BenchmarkProfiler, ConsoleChannel, type BenchmarkConfigurations,} from "@warlock.js/core";
const benchmarkConfig: BenchmarkConfigurations = { enabled: true, latencyRange: { excellent: 100, poor: 500 }, profiler: new BenchmarkProfiler({ maxSamples: 1000, channels: [new ConsoleChannel()], flushEvery: 60_000, }),};
export default benchmarkConfig;Every measure() call now defaults to those thresholds and records into that profiler. Per-call options always win — override latencyRange on a specific measurement and the global is ignored for that call.
Three optional callbacks. onComplete or onError fires (never both), then onFinish always fires:
await measure("send-email", () => mailer.send(payload), { latencyRange: { excellent: 200, poor: 2000 }, onComplete: (result) => metrics.record(result.latency), onError: (result) => logger.error("email failed", result.error), onFinish: (result) => logger.info(`${result.name} → ${result.state}`),});onFinish is the most useful default — it sees both branches and runs unconditionally. Reach for onComplete/onError when the success and failure paths need different observability (e.g., bump a different metric).
Tags — metadata you’ll thank yourself for
Section titled “Tags — metadata you’ll thank yourself for”Tags ride along on the result and on every profiler/snapshot record. They don’t change behavior; they’re metadata you grep on later:
await measure("http.outbound", () => fetch(url), { tags: { service: "stripe", endpoint: "/charges" },});When you’re staring at three operations with the same name across five services, tags are what tell you which one is which. Use them.
Selective error capture — shouldBenchmarkError
Section titled “Selective error capture — shouldBenchmarkError”Business errors (400 Bad Request, ValidationError) are not infrastructure problems. They’re not slow because the database is hot — they’re “fast” because the validator rejected the input in two milliseconds. Including them in latency stats poisons the percentiles.
Return false to re-throw the error without producing a benchmark result:
import { ValidationError } from "@warlock.js/seal";
await measure("create-user", () => createUser(input), { shouldBenchmarkError: (error) => !(error instanceof ValidationError),});The default is true — every thrown error becomes a BenchmarkErrorResult. Override only when you have a specific class of errors that should bypass benchmarking entirely.
This is the one case where measure() does throw. If shouldBenchmarkError returns false, the error propagates up to the caller — the wrapper is no longer in the picture.
enabled: false — pass-through mode
Section titled “enabled: false — pass-through mode”Wrapping costs almost nothing — one performance.now() and a closure — but if you want a literal no-op for a hot path:
const result = await measure("hot-path", () => work(), { enabled: false });// result.latency === 0// result.state === "excellent"// no hooks fire// no profiler record, no snapshotfn() still runs and its return value still lands in result.value. The wrapper just skips timing, hooks, and recording. Useful for keeping uniform call sites — the same code reads cleaner whether timing is on or off.
You can flip the global enabled field in src/config/benchmark.ts to turn timing off framework-wide; per-call enabled: false overrides for one site.
BenchmarkProfiler — rolling percentiles
Section titled “BenchmarkProfiler — rolling percentiles”For high-volume operations where per-call hooks are too noisy, you want p50/p95/p99 across a window:
import { BenchmarkProfiler, ConsoleChannel, measure } from "@warlock.js/core";
const profiler = new BenchmarkProfiler({ maxSamples: 1000, // ring buffer per operation name channels: [new ConsoleChannel()], // where stats go on flush() flushEvery: 60_000, // auto-flush every minute});
for (let i = 0; i < 100; i++) { await measure( "db.findUser", () => db.users.findOne({ id: i }), { profiler, latencyRange: { excellent: 50, poor: 300 } }, );}
const stats = profiler.stats("db.findUser");// { p50, p90, p95, p99, avg, min, max, count, errors, errorRate, firstSeenAt, lastSeenAt }profiler.stats(name) returns the snapshot synchronously — useful for ad-hoc inspection. profiler.flush() hands allStats() to every registered channel, then continues accumulating into the same ring buffer.
Wire it in src/config/benchmark.ts and every measure() call records by default — no per-call wiring needed.
BenchmarkChannel — where stats go
Section titled “BenchmarkChannel — where stats go”A channel is anything that implements onFlush(stats):
interface BenchmarkChannel { onFlush(stats: Record<string, BenchmarkStats>): void | Promise<void>;}Two are built in:
ConsoleChannel— pretty-prints aconsole.tableper operation. Useful in dev.NoopChannel— the default. Drops the call. Use when you want stats accessible viaprofiler.stats(name)without any external emission.
Custom channels are a one-class job:
import type { BenchmarkChannel, BenchmarkStats } from "@warlock.js/core";
export class DatadogChannel implements BenchmarkChannel { public async onFlush(stats: Record<string, BenchmarkStats>): Promise<void> { for (const [name, operationStats] of Object.entries(stats)) { await datadog.gauge(`latency.${name}.p95`, operationStats.p95); await datadog.gauge(`latency.${name}.p99`, operationStats.p99); await datadog.gauge(`latency.${name}.error_rate`, operationStats.errorRate); } }}Pass it via channels: [new DatadogChannel()] and every flush() (manual or auto) pushes percentiles to Datadog.
BenchmarkSnapshots — raw captures
Section titled “BenchmarkSnapshots — raw captures”Percentiles tell you “we got slow.” Snapshots tell you “here are the exact requests that got slow.” For post-mortem work:
import { BenchmarkSnapshots, measure } from "@warlock.js/core";
const snapshots = new BenchmarkSnapshots({ maxSnapshots: 100, capture: "error", // "error" (default, safe) | "value" | "all"});
await measure("payment.charge", () => stripe.charge(payload), { snapshotContainer: snapshots,});
const failed = snapshots.getSnapshots("payment.charge");// array of full BenchmarkErrorResult — error, latency, startedAt, tagsThe capture setting matters:
capture | What’s stored | Memory profile |
|---|---|---|
"error" | Only BenchmarkErrorResult. The default. | Bounded by failure rate. Safe in production. |
"value" | Only BenchmarkSuccessResult<T> with the full return value. | Stores T in memory. |
"all" | Both. | Stores T in memory. |
"value" and "all" keep references to whatever fn() returned. If that’s a database row, that’s fine. If it’s a streamed response or a large buffer, you’ve now kept it in memory until the ring buffer evicts it. Default to "error" unless you have a specific reason.
Wire snapshots globally in src/config/benchmark.ts the same way as the profiler.
Auto-integration with use-cases
Section titled “Auto-integration with use-cases”Every useCase() wraps its pipeline in measure() by default. You get a per-use-case latency and a benchmarkResult field on every execution snapshot without writing any timing code. The defaults are reasonable:
{ enabled: true, latencyRange: { excellent: 100, poor: 200, },}Override per use-case via benchmarkOptions:
import { useCase } from "@warlock.js/core";
export const placeOrder = useCase<Order, PlaceOrderInput>({ name: "place_order", handler: async (input) => placeOrderService(input), benchmarkOptions: { latencyRange: { excellent: 200, poor: 1000 }, tags: { domain: "orders" }, onComplete: (result) => metrics.histogram("place_order.latency", result.latency), },});Set benchmarkOptions: false to disable benchmarking for one use-case — useful for use-cases that wrap genuinely long-running work where latency stats don’t carry meaning.
When the use-case has both retryOptions and benchmarkOptions, the latency you get is the total wall-clock time including retries. That’s almost always what you want for SLO tracking — your customers don’t care that you retried three times, they care it took 1.2 seconds.
Common patterns
Section titled “Common patterns”Time a service call
Section titled “Time a service call”const result = await measure("create-order", () => createOrderService(input));
if (!result.success) { return response.badRequest({ error: t("order.failed") });}
return response.successCreate({ order: result.value });The result is your error-handling fork and your latency tracker. Less ceremony than a try/catch.
Time an external HTTP call
Section titled “Time an external HTTP call”const result = await measure( "stripe.charge", () => stripe.charges.create({ amount, currency, source }), { latencyRange: { excellent: 200, poor: 3000 }, tags: { gateway: "stripe", currency }, shouldBenchmarkError: (error) => error instanceof NetworkError, },);Higher thresholds (Stripe round-trips are slow), tagged for slicing, validation errors skipped.
Compose with retry()
Section titled “Compose with retry()”import { measure, retry } from "@warlock.js/core";
const result = await measure("publish-event", () => retry(() => bus.publish(event), { count: 3, delay: 200 }),);
console.log(result.latency); // total wall-clock, including all retry attemptsmeasure() on the outside captures the SLO you actually care about. See Retry for the composition story in full.
Inspect what’s slow, ad-hoc
Section titled “Inspect what’s slow, ad-hoc”import { config, BenchmarkProfiler } from "@warlock.js/core";
const profiler = config.get("benchmark").profiler as BenchmarkProfiler;
console.table(profiler.allStats());If you’ve wired a profiler in src/config/benchmark.ts, you can drop into a debug endpoint and dump the current percentiles at any time. No flush needed — allStats() reads the live ring buffers.
Gotchas
Section titled “Gotchas”- Name collisions aggregate. Two calls to
measure("foo", …)from different code paths share one profiler bucket. Makenamespecific —"db.findUser", not"db.query"— so percentiles actually mean something. measure()doesn’t propagate AbortSignal. Iffnis cancellable, plumb the signal through yourself. The wrapper only times.- Don’t
measure()synchronous trivia. AMath.roundcall isn’t worth the microsecond of overhead and the noise in your stats. Reservemeasure()for things that can be slow — I/O, computation that scales with input size, anything crossing a network or disk. - Snapshots with
"value"retain references. Ifvalueholds a streamed body or a large buffer, you’ve kept it in memory until eviction. Defaultcapture: "error"keeps you safe. shouldBenchmarkErrorre-throws. Make sure the caller is ready for an unwrapped throw on that error class. The discriminated-result contract holds for every other path; this one carves out an exception.- Hook errors crash the call. Throwing inside
onComplete/onError/onFinishpropagates up and discards the measurement. Keep hooks side-effect-only; wrap risky work in their own try/catch.
Going further
Section titled “Going further”guides/retry.md— composes insidemeasure(). The latency story for retried operations.guides/use-cases-deep.md—benchmarkOptionsfield onuseCase()and how the pipeline-level wrap stacks.guides/configuration-deep.md—src/config/benchmark.tsand the global config surface.