POST /v1/smartscraper gives you structured JSON from any URL. You always send a user_prompt. Add an output_schema if you want the result validated into a fixed shape. The minimum call is in the API reference; this page is the deep dive — prompt and schema design, page complexity, and what to do when validation fails.
Prompt first, schema second
Theuser_prompt is required — that’s where the intent goes. output_schema is optional and pins down the shape. Use both together for the cleanest output. If you just want free-shape JSON, send the prompt alone.
Schema design
When you do supply anoutput_schema, the clearer it is, the cleaner the extraction.
Use the most specific types you can
Don’t saystring if you mean integer. SmartScraper validates against your schema, so score: {"type": "integer"} produces cleaner results than score: {"type": "string"} and survives downstream typing.
Mark required fields
Adding"required": ["title", "price"] forces validation to fail when the field is missing. Much better than silently returning null and finding out three steps later.
Nest objects for related data
Group fields that belong together. A product page schema:Arrays for repeating elements
For listings — articles, products, search results — wrap the repeating element in anarray of object:
Sharpen the prompt
A specificuser_prompt keeps the model focused. Tell it what you want and what to ignore.
Page complexity
page_complexity controls how much effort goes into the page.
| Value | When to use |
|---|---|
low (default) | Most pages. Fast and cheap. |
high | Visually busy pages, long articles, schemas with many nested fields. |
high if extraction misses content you can clearly see in the browser. Stick with low everywhere else.
Caching with max_age
max_age tells us whether your fetch can come from cache:
| Value | Behavior |
|---|---|
| Omitted | No cache. Always a fresh extraction. |
0 | Run the call, but cache the result so the next one’s a hit. |
> 0 (seconds) | If we have a cached result newer than this, return it; otherwise extract fresh. |
max_age:
- Stealth requests
- Requests with custom headers
- URLs with query strings or fragments
max_age is worth setting.
Validation failures
Schema validation runs after extraction, and we automatically take one repair pass at a failure. If the output still doesn’t match, you get a422 Unprocessable Entity with error.code: validation_failed.
When that happens, start with error.details — it tells you which field failed and why (“expected integer, got string”). From there, a few common fixes:
- Loosen the type if the source is genuinely ambiguous. A price like
"$19.99"may need"type": "string"when you’re keeping the currency symbol — add apatternregex to keep validation tight. - Trim overly aggressive
requiredlists. If a field is sometimes missing, don’t require it. - Flatten deeply nested objects if the model seems to be getting lost. Two flat fields usually beat one three-level object.
Common patterns
E-commerce listings
Outer
products array of {name, price, url, image}. Use user_prompt to skip ads and recommendations.Article extraction
Schema of
{title, author, published_at, body, tags}. The default page_complexity: low handles most news and blog sites.Structured tables
Mirror the table:
{rows: [{col1, col2, ...}]}. Bump to page_complexity: high for tables past a few hundred rows.Interactive pages
Data behind a click, login, or paginated UI? SmartBrowse is the tool — recipes run in real Chrome.
Cost
5 credits per call (+5 withstealth: true). Failed requests cost 0. See Credits for the full table.