Recipe — Extract structured data with self-repair
You run an applicant-tracking import. Recruiters paste raw resume text — copied out of a PDF, an email body, or a LinkedIn profile — and you need a clean, typed record: name, email, years of experience, and a list of skills. The text is never consistent. Sometimes the email is missing, sometimes “experience” is buried in a sentence, sometimes the model returns prose around the JSON.
Two things make this robust instead of brittle:
- A
vschema describing the exact shape you want. The agent extracts the JSON Schema from it, asks the model for structured output, then validates the model’s response against the same schema — so a malformed field is caught before it reaches your database. repair: { maxAttempts }— when validation fails, the agent feeds the bad response plus the validation error back to the model and re-asks, up to N times. The recruiter never sees the retry; they get a clean record or a typed error.
yarn add @warlock.js/ai @warlock.js/ai-openai @warlock.js/sealThe schema
Section titled “The schema”@warlock.js/seal’s v builder is Standard Schema-compatible, so the agent can both extract a JSON Schema from it (to drive native structured-output mode) and validate the model’s response against it.
import { v } from "@warlock.js/seal";
const resumeSchema = v.object({ fullName: v.string(), email: v.string().email(), yearsOfExperience: v.int(), skills: v.array(v.string()), currentTitle: v.string().optional(),});The agent
Section titled “The agent”The model here is gpt-4o-mini, which advertises structuredOutput capability — so the agent sends the extracted JSON Schema as a native response_format rather than padding the system prompt with a soft instruction. You still pass the v schema as output; that’s what drives client-side validation into result.data.
import { ai } from "@warlock.js/ai";import { OpenAISDK } from "@warlock.js/ai-openai";
const openai = new OpenAISDK({ apiKey: process.env.OPENAI_API_KEY! });
const resumeExtractor = ai.agent({ name: "resume-extractor", model: openai.model({ name: "gpt-4o-mini" }), systemPrompt: ai.systemPrompt() .persona("You extract structured candidate data from raw resume text.") .instruction("Pull the candidate's full name, email, total years of professional experience, and skills.") .instruction("If a field is genuinely absent from the text, omit optional fields; never invent an email or a number."),});Run it
Section titled “Run it”The raw text below is the kind of thing a recruiter actually pastes — line breaks in the wrong places, the email run together with a phone number, experience expressed as a sentence.
const rawResume = `Sara El-Masry | Senior Frontend Engineersara.elmasry@example.com +20 100 555 0199About: 8 years building React and TypeScript apps at fintech startups.Stack: React, TypeScript, Next.js, GraphQL, Tailwind, Vitest`;
const { data, error, report } = await resumeExtractor.execute(rawResume, { output: resumeSchema, repair: { maxAttempts: 2 },});
if (error) { // Validation still failed after every repair attempt — surface a typed error, // don't push a half-parsed record downstream. console.error(`extraction failed: ${error.code} — ${error.message}`);} else { // `data` is fully typed and validated against resumeSchema. console.log(data.fullName); // "Sara El-Masry" console.log(data.email); // "sara.elmasry@example.com" console.log(data.yearsOfExperience); // 8 console.log(data.skills); // ["React", "TypeScript", ...]}
console.log(`took ${report.duration}ms across ${report.trips.length} trip(s)`);What self-repair actually does
Section titled “What self-repair actually does”When the model’s first response fails to parse as JSON or fails schema validation, and repair is set, the agent:
- Appends the bad assistant response to the conversation so the model can see exactly what it produced.
- Appends a corrective user message naming the validation error (e.g.
email: must be a valid email). - Runs another trip and re-validates.
Each repair attempt counts as a normal trip and is bounded by the agent’s maxTrips cap, so a model stuck producing garbage can’t loop forever. The final outcome — a clean data or the last validation error — is what surfaces.
If you want to observe the retry, subscribe to trip events:
const { data } = await resumeExtractor.execute(rawResume, { output: resumeSchema, repair: { maxAttempts: 2 }, on: { "agent.trip.completed": ({ trip }) => { console.log(`trip ${trip.index}: ${trip.finishReason}`); }, },});Production notes
Section titled “Production notes”execute()never throws. A validation failure that survives every repair attempt lands onresult.erroras aSchemaValidationError, with the original issues preserved undererror.issues. Branch onerror— don’t wrap the call intry/catchexpecting a throw.- Keep
maxAttemptslow. One or two attempts catches the common “model wrapped the JSON in prose” and “model fat-fingered one field” cases. Beyond that you’re usually fighting a prompt problem, and every attempt is a paid LLM trip. - Cost is visible per run.
report.trips.lengthtells you whether a repair fired;usage.totalandusage.cost(when the model has a pricing table) let you alarm on extractions that needed retries. - Optional fields are the safety valve. Marking
currentTitleand similar fields.optional()lets the model legitimately omit data that isn’t in the text, instead of being forced to hallucinate a value to satisfy the schema.