Skip to content
Warlock.js v4.4.0

Recipe — Extract structured data with self-repair

You run an applicant-tracking import. Recruiters paste raw resume text — copied out of a PDF, an email body, or a LinkedIn profile — and you need a clean, typed record: name, email, years of experience, and a list of skills. The text is never consistent. Sometimes the email is missing, sometimes “experience” is buried in a sentence, sometimes the model returns prose around the JSON.

Two things make this robust instead of brittle:

  1. A v schema describing the exact shape you want. The agent extracts the JSON Schema from it, asks the model for structured output, then validates the model’s response against the same schema — so a malformed field is caught before it reaches your database.
  2. repair: { maxAttempts } — when validation fails, the agent feeds the bad response plus the validation error back to the model and re-asks, up to N times. The recruiter never sees the retry; they get a clean record or a typed error.
Terminal window
yarn add @warlock.js/ai @warlock.js/ai-openai @warlock.js/seal

@warlock.js/seal’s v builder is Standard Schema-compatible, so the agent can both extract a JSON Schema from it (to drive native structured-output mode) and validate the model’s response against it.

import { v } from "@warlock.js/seal";
const resumeSchema = v.object({
fullName: v.string(),
email: v.string().email(),
yearsOfExperience: v.int(),
skills: v.array(v.string()),
currentTitle: v.string().optional(),
});

The model here is gpt-4o-mini, which advertises structuredOutput capability — so the agent sends the extracted JSON Schema as a native response_format rather than padding the system prompt with a soft instruction. You still pass the v schema as output; that’s what drives client-side validation into result.data.

import { ai } from "@warlock.js/ai";
import { OpenAISDK } from "@warlock.js/ai-openai";
const openai = new OpenAISDK({ apiKey: process.env.OPENAI_API_KEY! });
const resumeExtractor = ai.agent({
name: "resume-extractor",
model: openai.model({ name: "gpt-4o-mini" }),
systemPrompt: ai.systemPrompt()
.persona("You extract structured candidate data from raw resume text.")
.instruction("Pull the candidate's full name, email, total years of professional experience, and skills.")
.instruction("If a field is genuinely absent from the text, omit optional fields; never invent an email or a number."),
});

The raw text below is the kind of thing a recruiter actually pastes — line breaks in the wrong places, the email run together with a phone number, experience expressed as a sentence.

const rawResume = `
Sara El-Masry | Senior Frontend Engineer
sara.elmasry@example.com +20 100 555 0199
About: 8 years building React and TypeScript apps at fintech startups.
Stack: React, TypeScript, Next.js, GraphQL, Tailwind, Vitest
`;
const { data, error, report } = await resumeExtractor.execute(rawResume, {
output: resumeSchema,
repair: { maxAttempts: 2 },
});
if (error) {
// Validation still failed after every repair attempt — surface a typed error,
// don't push a half-parsed record downstream.
console.error(`extraction failed: ${error.code}${error.message}`);
} else {
// `data` is fully typed and validated against resumeSchema.
console.log(data.fullName); // "Sara El-Masry"
console.log(data.email); // "sara.elmasry@example.com"
console.log(data.yearsOfExperience); // 8
console.log(data.skills); // ["React", "TypeScript", ...]
}
console.log(`took ${report.duration}ms across ${report.trips.length} trip(s)`);

When the model’s first response fails to parse as JSON or fails schema validation, and repair is set, the agent:

  1. Appends the bad assistant response to the conversation so the model can see exactly what it produced.
  2. Appends a corrective user message naming the validation error (e.g. email: must be a valid email).
  3. Runs another trip and re-validates.

Each repair attempt counts as a normal trip and is bounded by the agent’s maxTrips cap, so a model stuck producing garbage can’t loop forever. The final outcome — a clean data or the last validation error — is what surfaces.

If you want to observe the retry, subscribe to trip events:

const { data } = await resumeExtractor.execute(rawResume, {
output: resumeSchema,
repair: { maxAttempts: 2 },
on: {
"agent.trip.completed": ({ trip }) => {
console.log(`trip ${trip.index}: ${trip.finishReason}`);
},
},
});
  • execute() never throws. A validation failure that survives every repair attempt lands on result.error as a SchemaValidationError, with the original issues preserved under error.issues. Branch on error — don’t wrap the call in try/catch expecting a throw.
  • Keep maxAttempts low. One or two attempts catches the common “model wrapped the JSON in prose” and “model fat-fingered one field” cases. Beyond that you’re usually fighting a prompt problem, and every attempt is a paid LLM trip.
  • Cost is visible per run. report.trips.length tells you whether a repair fired; usage.total and usage.cost (when the model has a pricing table) let you alarm on extractions that needed retries.
  • Optional fields are the safety valve. Marking currentTitle and similar fields .optional() lets the model legitimately omit data that isn’t in the text, instead of being forced to hallucinate a value to satisfy the schema.