How is this different from a traditional scraper?

Traditional scrapers break the moment a site changes its HTML. SmartScraper understands the page semantically — you describe the data you want in plain language and get back validated JSON. No selectors, no XPath, no maintenance.

How do you handle failed scrapes?

Most scraping tools fail silently. We retry with exponential backoff on 403/429/5xx, and if that fails, we return the best partial result we can. One thing we never do is charge for failed scrapes.

Can it handle really long pages?

Yes. Long pages are split with one of seven intelligent chunking strategies — headings, sliding window, cosine similarity, TextTiling and more — processed in parallel and merged with deduplication.

Can I enforce a JSON schema?

Pass any JSON Schema and we guarantee the response conforms to it. Validation failures are automatically repaired before the response is returned, so you never have to write defensive parsing.

What inputs are supported?

URLs, raw HTML, Markdown, and PDFs. We handle fetching, cleaning, format conversion, and noise removal end-to-end — the same API works for any of them.

Do you store my data?

Requests are processed in memory and dropped after the response. Nothing is persisted unless you explicitly opt in to dataset capture.

● ● ●extract.json

{
  "section": "hero",
  "type": "intro"
}

Now live · free credits on signup

The web,
as an API.

The web scraping API that turns any URL into clean, schema-validated JSON. Reliable fetching, data scraping, and one endpoint your team never has to maintain.

Get an API key 500 free credits · no card

Try in playground

Billed only on success
Browser-grade fetching

runs live · no signup

https://

// smartscraper · smartbrowsePrompt it, or record itSmartScraper reads the page from a prompt; SmartBrowse records the clicks & logins that reach it.

~/wsai — smartscraperlive

reddit.com/r/formula1POST/v1/smartscraper

rendered page

r/r/formula14.1m membersJoin

▲9.2k▼

Verstappen wins chaotic Japanese GP

1.4kshareaward

▲–▼

F1 TV Pro — stream every session

dropped · noise

▲3.6k▼

Hamilton on the Suzuka comeback

260shareaward

+ 47 more posts on this page

⠿⠷⠦⠂⠶⠿⠷⠽⠯⠾⠷⠦⠿⠂⠶⠷⠿⠦⠽⠯⠿⠷⠦⠂⠶⠿⠷⠽⠯⠾⠷⠦⠿⠂⠶⠷⠿⠦⠽⠯⠿⠷⠦⠂⠶⠿⠷⠽⠯⠾⠷⠦⠿⠂⠶⠷⠿⠦⠽⠯⠿⠷⠦⠂⠶⠿⠷⠽⠯⠾⠷⠦⠿⠂⠶⠷⠿⠦⠽⠯⠿⠷⠦⠂⠶⠿⠷⠽⠯⠾⠷⠦⠿⠂⠶⠷⠿⠦⠽⠯⠿⠷⠦⠂⠶⠿⠷⠽⠯⠾⠷⠦⠿⠂⠶⠷⠿⠦⠽⠯⠿⠷⠦⠂⠶⠿⠷⠽⠯⠾⠷⠦⠿⠂⠶⠷⠿⠦⠽⠯⠿⠷⠦⠂⠶⠿⠷⠽⠯⠾⠷⠦⠿⠂⠶⠷⠿⠦⠽⠯

response.json200 OK

{

"url": "reddit.com/r/formula1",

"posts": [

{

"title": "Verstappen wins chaotic Japanese GP",

"upvotes": 9233,

"comments": 1438

{

"title": "Hamilton on the Suzuka comeback",

"upvotes": 3612,

"comments": 260

}

"count": 2

}

✓ schema valid · 520ms · 1 promoted dropped

SmartScraper — a URL and a plain-English prompt in, schema-validated JSON out.

One SDK, every stack — drop it in and go

❯pnpm addwebscrape-ai▊

// by the numberssnapshot

Requests processed

128k

since launch

First-attempt success

96%

before retries

Median latency

520ms

full pipeline

Schema-valid first pass

98%

benchmark corpus

Features

Built for developers who need reliable scraped data

A complete extraction toolkit — every feature audited, versioned, and shipped from one endpoint.

AI Agent Extraction

Describe what you want in plain English. Our agents understand the page semantically and return exactly the structured data you asked for.

user_prompt: "Extract top 5 stories
               with title, url, points"

✓ 5 items · 312ms
  [{ title, url, points }, …]

SmartBrowse

Record clicks and navigation once in a visual studio, then replay the recipe as an API on real Chrome. No code, no selectors.

rec ▸ click "Next page"rec ▸ extract 24 rows✓ saved recipe✓ replay ×1000

Intelligent Chunking

Long pages are split, processed in parallel, and deduped on merge — so even huge documents extract fast and completely.

split▸parallel▸merge▸dedup

Content Reduction

A local NLP layer strips navbars, ads, and noise before extraction — cutting cost and latency by 50–80% with no accuracy loss.

noise removed80%

1.48M chars → 340k chars·1.2 MB → 240 KB

Schema Enforcement

Validate, repair, guarantee. JSON Schema in, conforming data out — with one automatic repair pass when a field misses.

validate schema✓ 5/5 fields matched✓ no repair needed

Auto-recovery

Failed fetches retry automatically with backoff — and you are never charged for a failed request.

free retriesautomatic backoff

Any Input Source

URLs, raw HTML, Markdown, or PDFs — one API for all of them.

URL·HTML·MD·PDF

How It Works

One request. Six stages. Clean JSON.

Every request flows through the same deterministic pipeline — fetch, clean, reduce, extract, validate. No black box.

~/wsai — pipeline.trace live

horizontal · trace6 stages

URLPOST /v1/smartscraper—request

request

fetchworker fetch12ms405 KB html

405 KB html

cleanhtml dom cleaner3ms80 KB dom

80 KB dom

reducenlp filter23ms5 KB text

5 KB text

extractvlm layer480ms5 fields

5 fields

JSONschema · validated2ms200 OK

▸total0/6 stages·~520ms

Use Cases

What you can build

Teams use SmartScraper to power product feeds, agents, dashboards, and entire web scraping pipelines.

E-commerce monitoring

Track prices, stock, and reviews across thousands of product pages with consistent JSON schemas.

Lead generation

Extract names, emails, titles, and company info from directories and profiles at scale.

News & content intel

Pull articles, authors, dates, and entities from any publisher into clean, queryable data.

AI agent tooling

Plug structured web data into LangChain, n8n, or your own agents — no scrapers to maintain.

Why SmartScraper

Stop fighting selectors and broken scrapers

A side-by-side look at what you actually get out of the box.

~/wsai — compare

compare.grid8 capabilities

capability	SmartScraper	DIY scraper	Headless browser
any site, no selectors	SmartScraper: yes — no selectors needed	DIY scraper: no — hand-write selectors	Headless browser: partial — you write the parser
structured output	SmartScraper: yes — schema-valid JSON	DIY scraper: no — validate it yourself	Headless browser: partial — raw HTML, no schema
auto-recovery	SmartScraper: yes — retries & repairs	DIY scraper: no — build your own retries	Headless browser: partial — reloads, never re-parses
long pages	SmartScraper: yes — auto-chunked to fit	DIY scraper: no — split it yourself	Headless browser: partial — whole DOM, you chunk
noise removal	SmartScraper: yes — nav & ads stripped	DIY scraper: no — pay tokens for noise	Headless browser: partial — full DOM, nothing trimmed
input formats	SmartScraper: yes — PDF, HTML & Markdown	DIY scraper: no — wire each format	Headless browser: partial — HTML only, no PDF
maintenance	SmartScraper: yes — survives redesigns	DIY scraper: no — breaks on redesign	Headless browser: partial — markup shift breaks it
infrastructure	SmartScraper: yes — one API call	DIY scraper: no — run your own fleet	Headless browser: partial — you host the browser

Developer Experience

One endpoint. Any language.

Add web scraping to any stack — official SDKs for Python, Node, Go, Rust & Java, or any plain HTTP client. Schema-validated output means no post-processing.

~/wsai — request

# fetch the top 5 HN stories
curl -X POST https://api.webscrape.ai/v1/smartscraper \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $API_KEY" \
  -d '{
    "website_url": "https://news.ycombinator.com",
    "user_prompt": "Extract the top 5 stories with title, url, and points"
  }'

# fetch the top 5 HN stories
from webscrape_ai import Client

with Client() as client:  # auth via WEBSCRAPE_API_KEY
    resp = client.smartscraper(
        website_url="https://news.ycombinator.com",
        user_prompt="Extract the top 5 stories with title, url, and points",
    )
    print(resp.data.result)

// fetch the top 5 HN stories
import { Webscrape } from "webscrape-ai";

const client = new Webscrape();  // auth via WEBSCRAPE_API_KEY
const res = await client.smartscraper({
  website_url: "https://news.ycombinator.com",
  user_prompt: "Extract the top 5 stories with title, url, and points",
});
console.log(res.data.result);

// fetch the top 5 HN stories
// go get github.com/webscrape-ai/webscrape-ai/sdk/go
client, _ := webscrape.New()  // auth via WEBSCRAPE_API_KEY

resp, _ := client.SmartScraper(context.Background(), &webscrape.SmartScraperRequest{
  WebsiteURL: "https://news.ycombinator.com",
  UserPrompt: "Extract the top 5 stories with title, url, and points",
})
fmt.Println(string(resp.Data.Result))

// fetch the top 5 HN stories
use webscrape_ai::{blocking::Client, SmartScraperRequest};

let client = Client::from_env()?;  // auth via WEBSCRAPE_API_KEY
let resp = client.smartscraper(
    SmartScraperRequest::new(
        "https://news.ycombinator.com",
        "Extract the top 5 stories with title, url, and points",
    ),
)?;
println!("{}", resp.data.result);

// extraction pipeline

▸fetch12ms
▸clean3ms
▸reduce23ms
▸extract480ms
▸validate2ms

total~520ms

~/wsai — response
response.json200 OK
// 200 OK · extracted in ~520ms
{
  "status": "completed",
  "data": {
    "result": {
      "stories": [
        {
          "title": "Show HN: I built a real-time code editor",
          "url": "https://example.com/editor",
          "points": 342
        },
        {
          "title": "Why Rust is the future of systems programming",
          "url": "https://example.com/rust",
          "points": 281
        },
        // … 3 more stories
      ]
    }
  },
  "credits_used": 5,
  "credits_remaining": 495,
  "request_id": "req_aB3xY9Kp"
}

Pricing

Simple, transparent pricing

Start free, scale as you grow. Pay only for what you use, no hidden fees.

Founding launch — 30% off for life on any subscription with code FOUNDING

Free

Free forever — credits refill every month, not a one-time trial.

≈ 60 AI extractions / month

Get Started

500 starting credits
300 free credits every month
1 concurrent request
10 requests / minute
7-day data retention
Limited SmartBrowse

Hobby

For side projects and prototypes.

$19

≈ 1,000 AI extractions / month

5,000 credits / month
10 concurrent requests
100 requests / minute
Standard fetch tier
30-day data retention
2 GB cloud browser / month
20% off extra credit

Startup

Cost Efficient

For growing teams in production.

$79

≈ 6,000 AI extractions / month

30,000 credits / month
50 concurrent requests
500 requests / minute
Priority routing
30-day data retention
10 GB cloud browser / month
Priority support
40% off extra credit

Enterprise

Tailored solutions for large organizations.

Custom

Unlimited extractions

Contact Sales

Unlimited credits
Custom rate limits
Dedicated infrastructure
Dedicated routing
99.9% SLA guarantee
Dedicated account manager
On-premise deployment

Base rate: 1 credit / Scrape · 5 / SmartScraper AI extraction · 2 / SmartBrowse run replay

AI agent? Read the plain-text version at /pricing.md.

FAQ

Frequently asked questions

Everything teams ask before going to production.

// frequently asked · 6 entries

→Traditional scrapers break the moment a site changes its HTML. SmartScraper understands the page semantically — you describe the data you want in plain language and get back validated JSON. No selectors, no XPath, no maintenance.
→Most scraping tools fail silently. We retry with exponential backoff on 403/429/5xx, and if that fails, we return the best partial result we can. One thing we never do is charge for failed scrapes.
→Yes. Long pages are split with one of seven intelligent chunking strategies — headings, sliding window, cosine similarity, TextTiling and more — processed in parallel and merged with deduplication.
→Pass any JSON Schema and we guarantee the response conforms to it. Validation failures are automatically repaired before the response is returned, so you never have to write defensive parsing.
→URLs, raw HTML, Markdown, and PDFs. We handle fetching, cleaning, format conversion, and noise removal end-to-end — the same API works for any of them.
→Requests are processed in memory and dropped after the response. Nothing is persisted unless you explicitly opt in to dataset capture.

// changelog

What's new

The latest platform updates, newest first.

~/wsai — changelog

shipping.log5 releases

▸2026-07-07Public launch — API + dashboard livev1.0
▸2026-07-06Official SDKs — Go, Python, Node, Rust & Java5 languages
▸2026-07-01Extraction engine rebuilt in Gop50 −40%
▸2026-06-19Private beta — early access opensinvite-only
▸2026-06-18Isolated microVM browsers + managed networkingincluded

Ready to extract structured data?

Get an API key in seconds — 500 free credits, no card required. Or try it first in the playground.

~/wsai — extract live

extract · run5 stages

Get an API key

Try in playground Read the docs