Launching next week

The web,as an API.

Reliable fetching, intelligent extraction, and schema-validated JSON — in one endpoint your team will never have to maintain.
  • No credit card
  • Anti-bot fetching
  • Schema-validated JSON

One SDK, every stack — drop it in and go

pnpm add@webscrape/sdk
// platform telemetry live
Bot detection rate
<0%
12mo · n=1.2M req
HTML noise reduction
0-0%
with reduce_content
Preset recipes
+0
for popular sites
Features

Built for developers who need reliable data

A complete extraction toolkit — every feature audited, versioned, and shipped from one endpoint.

AI Agent Extraction

Describe what you want in plain English. Our agents understand the page semantically and return exactly the structured data you asked for.

user_prompt: "Extract top 5 stories
               with title, url, points"

✓ 5 items · 312ms
  [{ title, url, points }, …]

Anti-Bot Browser

Our own Chromium fork, fingerprint-patched at the source — not a bolt-on script anti-bot can spot.

JA4HTTP/2CanvasWebGL

Schema Enforcement

Validate, repair, guarantee. JSON Schema in, conforming data out.

validate schema✓ 5/5 fields matched✓ no repair needed

Intelligent Chunking

Long pages split, processed in parallel, deduped on merge.

splitparallelmergededup

Any Input Source

URLs, raw HTML, Markdown, or PDFs — one API for all of them.

URL·HTML·MD·PDF

Content Reduction

A local NLP layer strips navbars, ads, and noise before extraction — cutting cost and latency by 50–80% with no accuracy loss.

noise removed0%
1.48M chars → 340k chars·1.2 MB → 240 KB
How It Works

One request. Six stages. Clean JSON.

Every request flows through the same deterministic pipeline — fetch, clean, reduce, extract, validate. No black box.
~/wsai — pipeline.trace live
horizontal · trace6 stages
URLPOST /v1/smartscraperrequest
request
fetchworker fetch12ms405 KB html
405 KB html
cleanselectolax3ms80 KB dom
80 KB dom
reducenlp filter23ms5 KB text
5 KB text
extractvlm layer480ms5 fields
5 fields
JSONschema · validated2ms200 OK
total0/6 stages·~520ms
Use Cases

What you can build

Teams use SmartScraper to power product feeds, agents, dashboards, and entire data pipelines.

E-commerce monitoring

Track prices, stock, and reviews across thousands of product pages with consistent JSON schemas.

Lead generation

Extract names, emails, titles, and company info from directories and profiles at scale.

News & content intel

Pull articles, authors, dates, and entities from any publisher into clean, queryable data.

AI agent tooling

Plug structured web data into LangChain, n8n, or your own agents — no scrapers to maintain.

Why SmartScraper

Stop fighting with selectors and broken scripts

A side-by-side look at what you actually get out of the box.
SmartScraper
DIY scraperHeadless browser
Works on any site without writing selectors
Schema-validated structured output
Anti-bot browser
Automatic chunking for long pages
50–80% noise removed before processing
PDF + HTML + Markdown input
Zero maintenance when sites change
No infrastructure to run
Developer Experience

One endpoint. Any language.

Integrate in minutes with any HTTP client. Schema-validated output means no post-processing.
~/wsai — request
# fetch the top 5 HN stories
curl -X POST https://api.webscrape.ai/v1/smartscraper \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "website_url": "https://news.ycombinator.com",
    "user_prompt": "Extract the top 5 stories with title, url, and points"
  }'
# fetch the top 5 HN stories
import requests

resp = requests.post(
    "https://api.webscrape.ai/v1/smartscraper",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "website_url": "https://news.ycombinator.com",
        "user_prompt": "Extract the top 5 stories with title, url, and points",
    },
)
data = resp.json()
print(data["result"])
// fetch the top 5 HN stories
const resp = await fetch("https://api.webscrape.ai/v1/smartscraper", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${API_KEY}`,
  },
  body: JSON.stringify({
    website_url: "https://news.ycombinator.com",
    user_prompt: "Extract the top 5 stories with title, url, and points",
  })
});

const { result } = await resp.json();
console.log(result);
// extraction pipeline
  • fetch12ms
  • clean3ms
  • reduce23ms
  • extract480ms
  • validate2ms
total~520ms
~/wsai — response
response.json200 OK
// 200 OK · extracted in ~520ms
{
  "stories": [
    {
      "title": "Show HN: I built a real-time code editor",
      "url": "https://example.com/editor",
      "points": 342
    },
    {
      "title": "Why Rust is the future of systems programming",
      "url": "https://example.com/rust",
      "points": 281
    },
    {
      "title": "PostgreSQL 18 released with major improvements",
      "url": "https://postgresql.org/18",
      "points": 256
    },
    {
      "title": "A deep dive into WebAssembly garbage collection",
      "url": "https://example.com/wasm-gc",
      "points": 198
    },
    {
      "title": "Open source alternative to Figma",
      "url": "https://example.com/penpot",
      "points": 175
    }
  ]
}
Pricing

Simple, transparent pricing

Start free, scale as you grow. Pay only for what you use, no hidden fees.

Free

Try the API with no commitment.

  • 500 starting credits
  • 300 credits / month thereafter
  • 1 concurrent request
  • 10 requests / minute
  • 7-day data retention
  • Limited SmartBrowse

Hobby

For side projects and prototypes.

  • 5,000 credits / month
  • 10 concurrent requests
  • 100 requests / minute
  • Standard proxy rotation
  • 30-day data retention
  • 20% off extra credit

Enterprise

Tailored solutions for large organizations.

  • Unlimited credits
  • Custom rate limits
  • Dedicated infrastructure
  • Premium proxy pool
  • 99.9% SLA guarantee
  • Dedicated account manager
  • On-premise deployment

AI agent? Read the plain-text version at /pricing.md.

FAQ

Frequently asked questions

Everything teams ask before going to production.
// frequently asked · 6 entries

Ready to extract structured data?

Try the live playground or integrate the API in minutes. No credit card required.

~/wsai — extract live
extract · run5 stages