Hand pluck a URL and a schema. Get back a typed object where every field has been traced to the page it came from. No selectors. No scraping glue. No silent any.
import { createPluck } from 'pluck' import { z } from 'zod' const Recipe = z.object({ title: z.string(), ingredients: z.array(z.string()), minutes: z.number(), }) const client = createPluck({ router }) const res = await client.pluck(url, Recipe) // ^? ExtractResult<Recipe> if (res.ok) { res.data.minutes // number — verified ✓ res.source // 'jsonld' | 'llm' }
When the page already publishes structured data, pluck takes the fast path and skips the model entirely — most recipe, product, and article pages do.
The schema is yours, and it's decoupled from the page's markup. A site can re-skin its entire layout — your Recipe type doesn't move, because pluck reads meaning, not CSS selectors.
Every call resolves to a discriminated ExtractResult<T>. Success is typed data; failure is a reason and an optional partial — never an untyped blob to guess at.
An LLM will happily invent a price. pluck won't ship one. Every extracted field is traced back to a span in the page's own text and scored — the result carries a verifiedRatio.
Fall below the threshold and the call returns { ok: false } with the partial attached, rather than handing you a confident fabrication. Type-safe is not the same as correct — pluck treats them as two separate jobs.
The model call is the expensive part, so pluck avoids it whenever it can. Pages that already publish schema.org JSON-LD take the fast path — no tokens spent, no hallucination surface.
What does hit the model is cached on a content + schema hash. Re-run against an unchanged page and you pay nothing the second time. That economy is the whole reason a shared service beats a hand-rolled script.
json-ld first · hashed cacheFetcher, Router, and Cache are plain interfaces. Start on plain fetch; graduate to firecrawlFetcher against your own crawl stack. Mock the model with callbackRouter; wire real policy with swooshRouter.
The in-memory cache implements the same Cache interface a Redis or Postgres store would — so the library you run locally is the service you run hosted, untouched.
pluck owns the crawl, the verify, and the cache. swoosh owns which model, under what policy.
Install it, define a schema, call pluck. The JSON-LD path runs with no model at all.
# install npm install pluck zod # optional: policy-driven model routing npm install swoosh-router