#extraction
7 posts tagged #extraction.
How to get LLM-ready data (markdown or JSON) from any URL
How to get LLM-ready data from any URL: pull specific fields as JSON, or strip a page to its clean content, with PDFs converted to markdown automatically.
What is structured data extraction?
Structured data extraction turns a web page into schema-validated JSON by naming the fields you want, not the CSS selectors that break when a site redesigns.
How to turn a product page into JSON in one call
How to scrape a product page into JSON in one API call: send the URL plus the fields you want, get schema-validated data back, and pay only on success.
How to pull structured data out of an HTML table
How to pull structured data out of an HTML table in one API call: describe the columns you want and get each row back as schema-validated JSON, no selectors.
Why your scraper returns null after a redesign
Why your scraper returns null after a redesign: a catalog of the silent failure modes behind an empty result, and how to turn page drift into a warning instead.
JSON Schema or a plain-language prompt: which to hand the extractor
Plain-language prompt or JSON Schema for AI extraction? You always send a prompt; an output_schema optionally pins field names and types. When to add one.
Why structured extraction beats CSS selectors
Structured extraction vs CSS selectors: hand-written selectors break on a redesign; describing the data survives it. How we keep the AI version repeatable.