Structured extraction with JSON-schema validation
Give a URL and a user_prompt (and optionally an output_schema),
get JSON back. Handles long content, noisy pages, and complex
schemas.
Cost: 5 credits (+5 with stealth: true).
Authorizations
Body
"https://news.ycombinator.com"
Plain-English description of what to extract.
"Extract the front-page stories with title, url, and score."
JSON Schema enforced on the extracted output. When supplied,
the result is validated against the schema and one repair
attempt is made before returning validation_failed.
Providing both user_prompt and output_schema gives the
best results.
{
"type": "object",
"properties": {
"stories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": { "type": "string" },
"url": { "type": "string" },
"score": { "type": "integer" }
}
}
}
}
}low (default) is faster and cheaper. Bump to high for
visually busy pages or schemas with many nested fields.
low, high How exhaustively to populate the result.
low, medium, high Cleaner mode. accurate is more forgiving on malformed pages;
speed is faster.
accurate, speed Return the raw extracted text under result instead of a
parsed JSON object. Bypasses output_schema validation.
Whitelist of HTML tags to keep before extraction.
Blacklist of HTML tags to drop before extraction.
Trim long content before extraction. Helps on pages with lots of repetitive boilerplate. Uses a sensible default when omitted.
Opt in to an alternate extraction path that can do better on hard-to-parse pages. Behavior may change without notice.
Custom request headers forwarded to the fetcher. Providing any headers disables URL caching for this request.
URL-cache opt-in — same three-state semantics as /scrape
(omitted = no cache, 0 = bypass read but write on miss,
>0 = return entries fresher than N seconds).
Stealth requests, custom-headered requests, and URLs with query strings or fragments are never cached.
x >= 0Use stealth mode. +5 credits.
Response
Successfully extracted.
Discriminator for the envelope variant.
completed Per-request id. Also returned as the X-Request-ID response
header. Include it when reporting issues.
"req_aB3xY9Kp"
5
495