Skip to main content
Two scenarios on this page: feeding an AI assistant enough docs context to call the API, and wiring webscrape.ai in as a tool inside your own agent.

Read these docs with AI

There are two agent-friendly entry points.

llms.txt

A flat, machine-readable index of every page is at /llms.txt. Drop the URL into any assistant that supports doc ingestion and it can answer against the current docs.
https://webscrape.ai/docs/llms.txt

One-click handoff

Every page has a context menu in the top right with handoff buttons for ChatGPT, Claude, Perplexity, Cursor, VSCode, and any MCP-aware client. Click Open in Cursor or Ask Claude and you start with the current page already in context.

Use webscrape.ai inside your AI agent

webscrape.ai works as a tool in any agent runtime that supports HTTP function calls. The examples below show the minimum wiring you need.

Anthropic tool use

import os, requests
from anthropic import Anthropic

client = Anthropic()

def smartscraper(website_url: str, user_prompt: str, output_schema: dict | None = None) -> dict:
    body = {"website_url": website_url, "user_prompt": user_prompt}
    if output_schema is not None:
        body["output_schema"] = output_schema
    return requests.post(
        "https://api.webscrape.ai/v1/smartscraper",
        headers={"X-API-Key": os.environ["WEBSCRAPE_API_KEY"]},
        json=body,
    ).json()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    tools=[{
        "name": "smartscraper",
        "description": "Extract structured JSON from a URL. user_prompt describes what to extract; output_schema (optional) constrains the result shape. Use when the user wants data from a webpage.",
        "input_schema": {
            "type": "object",
            "properties": {
                "website_url": {"type": "string"},
                "user_prompt": {"type": "string"},
                "output_schema": {"type": "object"},
            },
            "required": ["website_url", "user_prompt"],
        },
    }],
    messages=[{"role": "user", "content": "Get the top 5 stories from Hacker News with title and score."}],
)

OpenAI function calling

import os, json, requests
from openai import OpenAI

client = OpenAI()

tools = [{
    "type": "function",
    "function": {
        "name": "smartscraper",
        "description": "Extract structured JSON from a URL. user_prompt describes what to extract; output_schema (optional) constrains the result shape.",
        "parameters": {
            "type": "object",
            "properties": {
                "website_url": {"type": "string"},
                "user_prompt": {"type": "string"},
                "output_schema": {"type": "object"},
            },
            "required": ["website_url", "user_prompt"],
        },
    },
}]

response = client.chat.completions.create(
    model="gpt-5",
    tools=tools,
    messages=[{"role": "user", "content": "Get the top 5 stories from Hacker News."}],
)

Tool descriptions that work

When you wire webscrape.ai into an agent, tell the model when to pick each endpoint. Vague descriptions like “scrape a webpage” lead to tool misroutes. Specific descriptions help the model choose.

SmartScraper

“Extract structured JSON from a URL. user_prompt describes what to extract; output_schema (optional) constrains the result shape. Use when the user wants specific fields back, not the whole page.”

Scrape

“Fetch a webpage’s raw HTML or markdown. Use when the user wants to read or analyze the page content directly, not specific fields.”

SmartBrowse

“Run a saved recipe (clicks, typing, pagination) on a real Chrome session. Use when the data is behind a login or interaction.”

Cost-aware patterns

A few habits that keep agent loops from burning credits:
  • Try Scrape before SmartScraper. Scrape is 1 credit, SmartScraper is 5. If the agent only needs to read the page and reason over it, Scrape with clean: true is the cheaper call.
  • Cache reads with max_age. On /scrape and /smartscraper, max_age tells us to serve any fetch fresher than N seconds from cache. Omit it for no cache; either way, a failed call costs nothing.

Handling the envelope

Every webscrape.ai response is wrapped in the same envelope. Tell the agent to check status first, then read data (success) or error (failure):
{ "status": "completed", "data": { "...": "..." }, "credits_used": 5, "credits_remaining": 495, "request_id": "req_..." }
{ "status": "queued",    "data": { "run_id": "k7Xb9dRmQ2p", "recipe_id": "m3Yc2tFvN8q", "run_status": "running", "poll_url": "...", "created_at": "..." }, "request_id": "req_..." }
{ "status": "error",     "error": { "code": "...", "message": "...", "details": { "...": "..." } }, "request_id": "req_..." }
For queued envelopes (SmartBrowse), set up a webhook so the agent isn’t stuck polling inside its tool loop.