all posts
Engineering

When you need a full browser, and when a plain fetch is enough

A headless browser is the right tool for a fraction of pages: those that render with JavaScript or actively block a plain fetch. Here's how to tell which.

Two requests hit the same site. One asks for a static article and gets the whole thing back from a single fetch. The other asks for a price that only appears after the page's JavaScript runs, and a plain fetch returns the HTML skeleton with an empty slot where the number should be. One site, two requests, and only one of them needed a browser.

The deciding question is never "browser or not" in the abstract. It's whether the bytes you want are already in the response, or only show up after something runs them.

What a plain fetch already gives you

The cheapest tier is a fast HTTP request that carries a real browser's TLS and header fingerprint, so it isn't flagged the moment it connects. That one request clears more of the web than people expect: static pages, server-rendered HTML, most articles, most blog content, JSON APIs, and a large share of product pages whose details are baked into the markup the server sends.

When the data is in the response body, a browser adds latency and cost and buys you nothing. It runs scripts that don't change the answer, waits for a render that already happened on the server, and bills you for the privilege.

What actually forces a browser

Three things, and only these three, mean the HTTP tier can't finish the job on its own.

  1. Client-side rendering. The server sends a shell, and JavaScript fills in the content after the page loads. A single-page app is the common case. The raw HTML has the layout but not the data.
  2. Interaction-gated content. The values you want appear only after a click, a scroll, or a step through a flow. Nothing in the first response contains them, because the site never rendered them until someone acted.
  3. Active blocking. The site serves a challenge or a redirect to anything that doesn't look like a real browser session, so the request never reaches the content at all.

The first two are rendering problems: the data isn't there yet. The third is an access problem: the data is there, and you're being kept out. They feel similar from the outside, since you get a page without your data, but they need different tools.

You rarely pick the tier by hand

The engine escalates on its own. A request starts on fast HTTP, and when a site answers with a 403, a 429, or a timeout, the retry climbs, and its final attempt runs a headless browser that executes the page's JavaScript. Most client-side rendering and most transient blocks resolve here without you choosing anything. This is the same escalate-only-as-far-as-forced logic we wrote about in beating bot detection without overpaying.

The one tier you do choose is the hardened, anti-detection browser. You enable it with stealth: true when you already know a target blocks the cheaper path, or when a plain fetch keeps coming back empty on a page you know renders client side. It runs a real browser from the first attempt, so it reads as an ordinary visitor instead of climbing a ladder the site is built to reject. Working through a site that keeps blocking you is its own troubleshooting flow.

What you're seeingWhat it meansWhat gets it back
A 200 with your data in the HTMLServer-renderedPlain fetch, default path
A 200 with an empty shellClient-rendered appA browser; stealth: true runs one end to end
A 403, 429, or challenge pageActive blockingDefault retry escalates; stealth: true for persistent blocks
Content that appears only after a click or scrollInteraction-gatedA real browser

What the browser tier costs

Cost tracks how much machinery a fetch needs, and the stealth tier is the only one that adds a surcharge. On /v1/scrape, a request is 1 credit on the default path and 3 credits with stealth enabled. On /v1/smartscraper, it's 5 credits, or 10 with stealth. The surcharge applies only when the stealth tier actually runs, and credits are charged on success only, so a fetch that fails costs you nothing.

That pricing is the whole argument for not defaulting to a browser. Send every page through the stealth tier and you pay the top rate on the static pages that would have answered a plain fetch at base cost.

bash
# No stealth flag: the engine starts cheap and escalates only if a site forces it.
curl https://api.webscrape.ai/v1/scrape \
  -H "X-API-Key: wsg_live_your_key_here" \
  -d '{ "website_url": "https://example.com/article" }'

Where this gets fuzzy

The clean signals above have one blind spot, and it's worth naming. A page can return a 200 with an empty shell that looks exactly like success: a client-rendered app that loaded fine and simply needs a browser to fill in. No status code flags it, so the automatic ladder can't always catch it the way it catches a 403. When a fetch comes back empty on a page you know renders client side, that's your cue to set stealth: true and let a real browser run it, rather than waiting for an escalation that a 200 never triggers.

The trade we're making is the occasional wasted cheap attempt against never having to fingerprint a site's defenses before you call it. We think that's the right default, because the site won't tell you which kind it is until you try.

The rule worth keeping

Let the response decide which tier you needed. If the data is already in it, a plain fetch is the right tool and the cheap one. If the page renders client side or actively blocks the cheap request, that's when a browser earns its cost. Point a key at a page you're unsure about and read what comes back, with the free credits a new account starts with: the response itself tells you which tier you needed.

Frequently asked questions

Do I need a headless browser to scrape a website?

Usually not. A plain HTTP fetch that carries a real browser's fingerprint returns the data on most static and server-rendered pages. You need a browser when content is rendered by JavaScript after load, gated behind interaction, or a site actively blocks the cheaper request.

How do I know if a page is rendered with JavaScript?

Fetch the raw HTML and look for the data you want. If the response is a near-empty shell, with the content missing or sitting inside a script tag as JSON, the page assembles itself in the browser, and a plain fetch will not see the rendered values.

What is the difference between a headless browser and a stealth browser?

A headless browser runs a page's JavaScript without a visible window, which renders most client-side content. A stealth browser adds anti-detection hardening so it reads as an ordinary visitor on sites that block automation. On webscrape.ai the stealth tier is the one you opt into.

Does using a browser cost more than a plain fetch?

On the stealth tier, yes. A /v1/scrape call is 1 credit on the default path and 3 credits with stealth enabled. /v1/smartscraper is 5 credits, or 10 with stealth. Credits are charged on success only, so a failed fetch costs nothing.

Can a plain fetch handle pages behind bot detection?

Some of it. The default fetch retries automatically when a site returns a 403, a 429, or times out, and its last attempt uses a browser. For sites that block every non-browser request, you enable the stealth tier so the request reads as a real visitor.