Recipe — Build a trace cost & latency dashboard
Finance asks the question every AI team eventually hears: “What did this feature cost us last week, and which sessions were the expensive ones?” You have the runs — they flowed through your agents and orchestrator all week. What you need is to turn the retained traces into a few numbers: per-run cost, per-session spend, p95 latency, failure rate. No new instrumentation; just query what Panoptic already collected.
This recipe builds that dashboard view on top of the in-memory trace store — using aggregate for the rollups, query for the per-run drill-down, and the orchestrator’s collect path so multi-turn sessions land in the same store as single-shot agents.
yarn add @warlock.js/ai @warlock.js/ai-openai @warlock.js/ai-panopticPricing must be configured on the SDK (or per model) for cost to populate — an unpriced run reports usage.cost === undefined, and aggregate().cost stays undefined until at least one priced run lands. Honest absence over a false zero.
import { OpenAISDK } from "@warlock.js/ai-openai";
const openai = new OpenAISDK({ apiKey: process.env.OPENAI_API_KEY!, pricing: { "gpt-4o-mini": { input: 0.15, output: 0.6, cachedInput: 0.075 }, "gpt-4o": { input: 5.0, output: 15.0 }, },});Collect from both an agent and an orchestrator
Section titled “Collect from both an agent and an orchestrator”The agent emits a terminal agent.completed event, so attach captures it automatically. The orchestrator emits orchestrator.turn.* events that carry only session identity — no result — so you feed each turn’s report in directly with collect(result.report). Both converge on the same store.
import { ai } from "@warlock.js/ai";import { panoptic, createInMemoryTraceStore } from "@warlock.js/ai-panoptic";
const store = createInMemoryTraceStore({ capacity: 50_000 });const observe = panoptic({ exporters: [store] });
// 1) An agent — captured via the event stream.const triageAgent = ai.agent({ name: "ticket-triage", model: openai.model({ name: "gpt-4o-mini" }),});
observe.attach(triageAgent);
// 2) An orchestrator — multi-turn session, collected directly per turn.const supportBot = ai.orchestrator<{ resolved: boolean }>({ name: "support-session", intents: { triage: triageAgent, }, route: () => "triage", state: { resolved: false },});// Single-shot agent run — store fills via the attached event.await triageAgent.execute("Card declined at checkout", { sessionId: "session-7",});
// Orchestrator turn — no result-bearing event, so collect the report.const turn = await supportBot.execute("I was double-charged", { sessionId: "session-7", history: [],});
await observe.collect(turn.report);The dashboard query layer
Section titled “The dashboard query layer”A dashboard is a handful of aggregate / query calls. aggregate(filter?) rolls usage + cost + status counts over whatever the filter selects; query(filter?) returns the matching traces newest-started-first for the per-run table.
import type { TraceAggregate, TraceQuery } from "@warlock.js/ai-panoptic";import { totalCostUsd } from "@warlock.js/ai-panoptic";
/** Collapse an aggregate's per-channel cost into one USD scalar. */function aggregateCostUsd(stats: TraceAggregate): number { const cost = stats.cost;
if (!cost) { return 0; }
return ( cost.input + cost.output + (cost.cachedInput ?? 0) + (cost.cachedOutput ?? 0) );}
/** Top-line numbers for any slice (a week, a session, a status). */function summarize(filter: TraceQuery) { const stats = store.aggregate(filter);
return { runs: stats.traces, completed: stats.completed, failed: stats.failed, cancelled: stats.cancelled, failureRate: stats.traces === 0 ? 0 : stats.failed / stats.traces, totalTokens: stats.usage.total, cachedTokens: stats.usage.cachedTokens ?? 0, totalCostUsd: aggregateCostUsd(stats), totalDurationMs: stats.totalDuration, avgDurationMs: stats.traces === 0 ? 0 : stats.totalDuration / stats.traces, };}Week-to-date totals
Section titled “Week-to-date totals”const startOfWeek = new Date();startOfWeek.setDate(startOfWeek.getDate() - startOfWeek.getDay());startOfWeek.setHours(0, 0, 0, 0);
const week = summarize({ startedAfter: startOfWeek });
console.log( `This week: ${week.runs} runs, ` + `$${week.totalCostUsd.toFixed(4)}, ` + `${(week.failureRate * 100).toFixed(1)}% failed, ` + `avg ${Math.round(week.avgDurationMs)}ms`,);The per-run table (with p95 latency)
Section titled “The per-run table (with p95 latency)”aggregate gives you the rollup but not percentiles — those come from walking the queried traces, where each trace’s root carries the whole-run cost and duration.
function perRunRows(filter: TraceQuery) { return store.query(filter).map((trace) => ({ traceId: trace.traceId, sessionId: trace.sessionId, type: trace.root.type, // "agent" | "workflow" | "orchestrator" | ... status: trace.root.status, startedAt: trace.startedAt, durationMs: trace.duration, tokens: trace.usage.total, costUsd: totalCostUsd(trace.usage) ?? 0, // root usage = whole-run rollup }));}
function p95(values: number[]) { if (values.length === 0) { return 0; }
const sorted = [...values].sort((a, b) => a - b); const index = Math.ceil(sorted.length * 0.95) - 1;
return sorted[Math.min(index, sorted.length - 1)];}
const rows = perRunRows({ startedAfter: startOfWeek });const p95LatencyMs = p95(rows.map((row) => row.durationMs));
console.log(`p95 latency: ${Math.round(p95LatencyMs)}ms`);Drill into one expensive session
Section titled “Drill into one expensive session”const sessionStats = summarize({ sessionId: "session-7" });
console.log( `session-7: $${sessionStats.totalCostUsd.toFixed(4)} ` + `over ${sessionStats.runs} runs`,);
// Find the single most expensive run in that session:const ranked = perRunRows({ sessionId: "session-7" }).sort( (a, b) => b.costUsd - a.costUsd,);
console.log("priciest run:", ranked[0]);Where did the cost go inside a run?
Section titled “Where did the cost go inside a run?”For a flame-graph-style cost attribution, walk one trace’s span tree — each span carries its own rolled-up usage, so you can see which trip or tool dominated.
import { walkSpans } from "@warlock.js/ai-panoptic";
const trace = store.get(ranked[0].traceId);
if (trace) { for (const span of walkSpans(trace.root)) { console.log( `${span.type}:${span.name} — ` + `${span.duration}ms, ` + `$${(totalCostUsd(span.usage) ?? 0).toFixed(5)}`, ); }}Failure cost — the question finance actually asked next
Section titled “Failure cost — the question finance actually asked next”“How much are we spending on runs that fail?” is one filter away, because status accepts an array:
const failedSpend = summarize({ status: ["failed", "cancelled"], startedAfter: startOfWeek,});
console.log( `Burned $${failedSpend.totalCostUsd.toFixed(4)} on ` + `${failedSpend.runs} failed/cancelled runs this week`,);Production notes
Section titled “Production notes”aggregatesums each trace’s rootusage, which is already a rollup of own cost + children — so the totals reflect the whole run tree without re-walking spans.input/output/totalare always present; the optionalcachedTokens/reasoningTokens/cacheWriteTokenschannels are summed only when a matched run reported them.completed + failed + cancelledneed not equaltraces. Non-terminal statuses (awaiting-inputfrom a paused orchestrator session,max-iterations) count towardtracesbut none of the three headline counters — guard your failure-rate math against that, assummarizedoes by dividingfailed / traces.coststaysundefineduntil something is priced. An unpriced run never erases the cost of priced siblings (the merge uses the framework’s own cost-rollup logic), so a single unpriced model in the mix won’t zero out your dashboard — it just won’t contribute.- The in-memory store is O(n) on
query/aggregatewith no secondary indexes — perfect for a dev dashboard or a modest-volume admin panel, but for a high-traffic finance report back it with a real datastore: write a customTraceStoreContract, or export traces to a warehouse via the OTel / Langfuse exporters and aggregate there. - Bound retention with
createInMemoryTraceStore({ capacity })in any long-lived process; the default keeps everything untilclear().
Related
Section titled “Related”- Observability — wire Panoptic — attach the subscriber and the store query basics.
- Observability — export OTel & Langfuse — ship traces to a backend and aggregate at warehouse scale.
- Cost tracking — configuring
pricingand theusage.costper-channel shape.