The web, as a typed API

Pluck the data.
Leave the page.

Hand pluck a URL and a schema. Get back a typed object where every field has been traced to the page it came from. No selectors. No scraping glue. No silent any.

Read the docs → How it works

llms.txt

recipe.ts

import { createPluck } from 'pluck'
import { z } from 'zod'

const Recipe = z.object({
  title:       z.string(),
  ingredients: z.array(z.string()),
  minutes:     z.number(),
})

const client = createPluck({ router })

const res = await client.pluck(url, Recipe)
//        ^? ExtractResult<Recipe>

if (res.ok) {
  res.data.minutes   // number — verified ✓
  res.source          // 'jsonld' | 'llm'
}

One call. Six stages.

FIG. 01 — THE PIPELINE

Fetch

Plain HTTP, or your self-hosted Firecrawl for JS-heavy pages.

02 · FAST PATH

JSON-LD

Reads schema.org & embedded data. Zero LLM, zero cost.

Reduce

Strips the chrome. Clean markdown, not raw HTML.

Extract

A router fills your schema. pluck never names the model.

Verify

Traces every field back to the source. Sets a ratio.

Cache

Keyed on content + schema. Unchanged page, no re-work.

When the page already publishes structured data, pluck takes the fast path and skips the model entirely — most recipe, product, and article pages do.

§ 01

Type-safe by contract, not by hope.

The schema is yours, and it's decoupled from the page's markup. A site can re-skin its entire layout — your Recipe type doesn't move, because pluck reads meaning, not CSS selectors.

Every call resolves to a discriminated ExtractResult<T>. Success is typed data; failure is a reason and an optional partial — never an untyped blob to guess at.

ExtractResult<T> · never any

§ 02

Verified, not guessed.

An LLM will happily invent a price. pluck won't ship one. Every extracted field is traced back to a span in the page's own text and scored — the result carries a verifiedRatio.

Fall below the threshold and the call returns { ok: false } with the partial attached, rather than handing you a confident fabrication. Type-safe is not the same as correct — pluck treats them as two separate jobs.

provenance per field

§ 03

Cheap on purpose.

The model call is the expensive part, so pluck avoids it whenever it can. Pages that already publish schema.org JSON-LD take the fast path — no tokens spent, no hallucination surface.

What does hit the model is cached on a content + schema hash. Re-run against an unchanged page and you pay nothing the second time. That economy is the whole reason a shared service beats a hand-rolled script.

json-ld first · hashed cache

§ 04

Swap any part. Keep the pipeline.

Fetcher, Router, and Cache are plain interfaces. Start on plain fetch; graduate to firecrawlFetcher against your own crawl stack. Mock the model with callbackRouter; wire real policy with swooshRouter.

The in-memory cache implements the same Cache interface a Redis or Postgres store would — so the library you run locally is the service you run hosted, untouched.

Fetcher · Router · Cache

Pluck the data.Leave the page.