all posts
Guide

How to scrape a site that keeps blocking you

A 403, a 429, or a blank page usually means a site blocked your scraper. How to read the failure and get the page back, from a retry to a stealth browser.

What the block means and what to do
What you seeWhat to do
403 ForbiddenLet the default fetch retry; enable stealth if it persists
429 Too Many RequestsSlow down and back off before retrying
A 200 with a blank pageThe page renders client side; run it with stealth
A challenge or interstitialEnable stealth so the request reads as a real visitor
A login or paywallPass session headers, or accept the page needs an account

Your scraper ran fine for weeks. Then the same URL started answering with a 403, or a page that loads in your browser but comes back blank through your code. The site is blocking you, and the right move depends on which block it is. This walks through reading the failure and getting the page back, from a plain retry up to a stealth browser, without reaching for the heaviest tool on the first attempt.

Why a site is blocking your scraper

A site blocks a scraper when the request doesn't look like a real visitor. Plain HTTP clients send a different TLS and header fingerprint than a browser, request pages faster than a person would, and skip the JavaScript a real session runs. Detection systems read those signals and return a 403, a 429, or a challenge page instead of the content. The deeper mechanics are in beating bot detection without overpaying.

A few things to hold onto before you start:

  • A 403 or a challenge page is an access problem. A blank 200 is usually a rendering problem. The fix is different for each.
  • The default fetch already retries and escalates to a browser on a 403, 429, or timeout, so try it before you change anything.
  • Persistent blocks call for the stealth tier, which reads as a real visitor. It costs more, and only when it succeeds.
  • A 429 means slow down, not escalate.

What you'll need

An API key (new accounts start with free credits) and the URL that's failing. By the end you'll have a request that gets the page back from a site that was refusing your scraper, plus a way to tell when to stop trying.

1. Send a plain request and read the response

Start at the cheapest tier. Send the URL with no stealth flag and look at what comes back.

bash
curl https://api.webscrape.ai/v1/scrape \
  -H "X-API-Key: wsg_live_your_key_here" \
  -d '{ "website_url": "https://tough-target.example/product/42" }'

What just happened: one call already ran the whole retry ladder. It started with a fast HTTP request carrying a real browser's fingerprint, and if the site answered with a 403, a 429, or a timeout, the retry escalated, with its final attempt running a browser. If your data is in the response, you're done at the base rate of 1 credit, and there's nothing else to do.

2. If it still blocks, turn on stealth

When the default path comes back blocked anyway, the site is rejecting requests that don't read as a real browser session. Enable the stealth tier.

bash
curl https://api.webscrape.ai/v1/scrape \
  -H "X-API-Key: wsg_live_your_key_here" \
  -d '{ "website_url": "https://tough-target.example/product/42", "stealth": true }'

What just happened: the request now runs through a hardened, anti-detection browser from the first attempt, so it presents as an ordinary visitor instead of climbing a ladder the site is built to reject. This is the tier that gets the hardest targets back. It bills at 3 credits on /v1/scrape instead of 1, and the surcharge applies only when stealth runs.

3. If the page is blank rather than blocked, it's rendering

A different failure looks almost the same from your code: no error, but the body is empty. The status reads completed and there's still no data in it. The page is built client side. The server sent a shell, and a browser was supposed to fill it in, so a plain fetch came back with the layout and none of the values.

bash
# Blank body on the default path → the same stealth flag runs the page's JavaScript.
curl https://api.webscrape.ai/v1/scrape \
  -H "X-API-Key: wsg_live_your_key_here" \
  -d '{ "website_url": "https://app.example/dashboard", "stealth": true }'

What just happened: the stealth tier runs a real browser, so it executes the page's scripts and returns the rendered content the plain fetch never saw. If you want to know which pages need this before you hit one, when you need a headless browser covers the signals.

4. If you see a 429, slow down

A 429 is the one block you don't answer with a heavier tier. It means you're requesting pages faster than the site permits, and a stealth browser hitting it just as fast gets the same response at a higher cost.

bash
# Space requests out and honor Retry-After; don't reach for stealth on a 429.
sleep 5 && curl https://api.webscrape.ai/v1/scrape \
  -H "X-API-Key: wsg_live_your_key_here" \
  -d '{ "website_url": "https://tough-target.example/page" }'

What just happened: backing off and lowering concurrency lets the limit reset. If you're running a batch, add a delay between requests and respect any Retry-After value the response returns.

When this breaks

Some failures don't have a tier that fixes them, and it's worth knowing them so you don't burn credits chasing a page that won't come.

  • A blank 200 that reads as success. Because the status can be completed with an empty body, a naive pipeline logs a win and stores nothing. Check the body, not just the status, and treat an empty result as a signal to escalate or investigate.
  • A login or paywall. The stealth tier gets you to the page, not past authentication. You can pass session headers for content you're entitled to, but a page that needs an account stays behind it.
  • A challenge you shouldn't be solving. Some blocks are a site saying no. Respect the target's terms of service and robots.txt, pick targets deliberately, and don't treat getting through as permission.

What it costs to get through

The escalation has a price, and it's bounded. On /v1/scrape, a request is 1 credit on the default path and 3 with stealth enabled. On /v1/smartscraper, it's 5 credits, or 10 with stealth, and the same stealth flag works on /v1/search and /v1/crawl. Credits are charged on success only, so a fetch that never returns the page costs nothing. That's what makes escalating safe to try: you pay the stealth surcharge only when it actually gets you the content.

Grab a key and point it at the site that keeps returning 403s. The free credits cover real attempts, and a fetch that fails costs nothing, so you can find the tier that works without spending anything to learn which one it is.

Frequently asked questions

Why is my web scraper getting a 403 error?

A 403 means the site refused the request, usually because it didn't look like a real visitor. Plain HTTP clients carry a different TLS and header fingerprint than a browser and request pages faster than a human. Detection systems read those signals and block the request before it reaches the page.

What does a 429 error mean when scraping?

A 429 means too many requests: you're hitting the site faster than it allows. The fix is to slow down, not to escalate to a heavier tier. Add a delay between requests, lower your concurrency, and respect any Retry-After header the response includes before trying the page again.

Why does my scraper return an empty page with no error?

The page renders client side. The server sent a 200 with an HTML shell, then JavaScript filled in the content in the browser, so a plain fetch sees the skeleton but not the data. Run the page in a real browser by enabling the stealth tier to get the rendered values.

How do I get past bot detection when scraping?

Start with the default fetch, which retries and escalates to a browser on a 403, 429, or timeout. If a site blocks every non-browser request, enable stealth so the request runs through a hardened browser that reads as an ordinary visitor. Going through the cheap tiers first keeps the easy pages cheap.

Does a failed or blocked scrape cost credits?

No. Credits are charged on success only, so a fetch that never returns the page costs nothing. That makes escalating to the stealth tier a bounded bet: a /v1/scrape call is 1 credit on the default path and 3 with stealth, and you pay the surcharge only when stealth actually gets the page.

Is it legal to scrape a site that blocks you?

It depends on the site and your jurisdiction. A block is also a signal about the site's wishes. Read the target's terms of service and robots.txt, avoid data behind a login you're not entitled to, and pick targets deliberately. Getting through a technical block doesn't settle whether you should.

Will a headless browser always get past a block?

No. A browser handles client-side rendering and many automation checks, but a site can still refuse a request, gate content behind a login, or present a challenge a scraper shouldn't be defeating. The stealth tier widens what's reachable; it isn't a guarantee, and some pages stay out of reach by design.