# pluck — documentation > Source: https://acalejos.github.io/pluck/docs.html · Repo: https://github.com/acalejos/pluck Turn any website into a type-safe, source-verified API. You bring a Zod schema; pluck fetches the page, fills the schema, traces every field back to the source, and caches the result. pluck owns the crawl, verify, and cache — the model call is yours to plug in. ## Install pluck is an ESM, TypeScript-first package. It needs `zod` as a peer for schemas. ``` npm install pluck zod ``` Two optional add-ons, only if you want them: ``` # policy-driven model routing (the swooshRouter adapter) npm install swoosh-router ``` > **Pre-release** pluck is at v0.1.0 — the library core is complete and tested, but the package is not yet published and the network-facing edges (Firecrawl, swoosh) are not yet covered by automated tests. The API below is real and current. ## Quickstart Define what you want as a Zod schema, then call `pluck`. The result is a discriminated union — check `res.ok` before reading data. ``` import { createPluck, callbackRouter } from 'pluck' import { z } from 'zod' const Article = z.object({ headline: z.string(), author: z.string().nullable(), words: z.number(), }) const client = createPluck({ router: callbackRouter(async (req) => callYourLLM(req)), }) const res = await client.pluck('https://example.com/post', Article) if (res.ok) { console.log(res.data.headline) // string console.log(res.source) // 'jsonld' | 'llm' console.log(res.provenance.verifiedRatio) // 0..1 } else { console.warn(res.reason) // why it failed } ``` ## The pipeline A single `pluck` call runs up to six stages. The model is only touched when it has to be. - **Fetch** — the configured `Fetcher` retrieves the page (plain HTTP by default). - **Cache lookup** — keyed on content hash + schema hash. A hit returns immediately. - **JSON-LD fast path** — if the page publishes structured data that satisfies your schema (and clears verification), pluck uses it and skips the model. `source: 'jsonld'`. - **Reduce + extract** — otherwise the HTML is reduced to clean markdown and handed to the `Router`, which fills the schema. `source: 'llm'`. - **Verify** — every leaf field is traced back to the source content and scored. - **Cache write** — successful results are persisted for next time. ## ExtractResult Every public method resolves to an `ExtractResult`. It never throws — even network errors and verification failures come back as `{ ok: false }`. ``` type ExtractResult = | { ok: true; data: T; provenance: Provenance; cached: boolean; source: ExtractSource } | { ok: false; reason: string; partial?: Partial } ``` Because it's a discriminated union, TypeScript narrows `res.data` to `T` the moment you check `res.ok`. There is no untyped escape hatch. ## Verification Type-safe is not the same as correct — a model can return a perfectly-typed but invented value. pluck treats verification as a separate job: it walks every leaf field in the result, looks for the value in the page's own text, and records a `FieldProvenance` (found, source span, confidence) per field. The aggregate is `provenance.verifiedRatio`. If verification is on (the default) and the ratio falls below `minRatio` (default `0.5`), the call returns `{ ok: false }` with the unverified data attached as `partial`. This applies to the JSON-LD path too — structured data can drift or lie, so it must clear the same bar. ``` // tighten the bar, or turn it off entirely createPluck({ verify: { minRatio: 0.8 } }) createPluck({ verify: false }) // trust the model / json-ld outright ``` ## JSON-LD fast path Most recipe, product, and article pages embed `schema.org` data in a `