Scrape do

Learn how to use Scrape do with Composio

Overview

SLUG: SCRAPE_DO

Description

Scrape.do is a web scraping API offering rotating residential, data-center, and mobile proxies with headless browser support and session management to bypass anti-bot protections (e.g., Cloudflare, Akamai) and extract data at scale in formats like JSON and HTML.

Authentication Details

generic_api_key
stringRequired
generic_token
stringRequired

Connecting to Scrape do

Create an auth config

Use the dashboard to create an auth config for the Scrape do toolkit. This allows you to connect multiple Scrape do accounts to Composio for agents to use.

1

Select App

Navigate to [Scrape do](https://platform.composio.dev/marketplace/Scrape do).

2

Configure Auth Config Settings

Select among the supported auth schemes of and configure them here.

3

Create and Get auth config ID

Click “Create Scrape do Auth Config”. After creation, copy the displayed ID starting with ac_. This is your auth config ID. This is not a sensitive ID — you can save it in environment variables or a database. This ID will be used to create connections to the toolkit for a given user.

Connect Your Account

Using API Key

1from composio import Composio
2
3# Replace these with your actual values
4scrape_do_auth_config_id = "ac_YOUR_SCRAPE_DO_CONFIG_ID" # Auth config ID created above
5user_id = "0000-0000-0000" # UUID from database/app
6
7composio = Composio()
8
9def authenticate_toolkit(user_id: str, auth_config_id: str):
10 # Replace this with a method to retrieve an API key from the user.
11 # Or supply your own.
12 user_api_key = input("[!] Enter API key")
13
14 connection_request = composio.connected_accounts.initiate(
15 user_id=user_id,
16 auth_config_id=auth_config_id,
17 config={"auth_scheme": "API_KEY", "val": user_api_key}
18 )
19
20 # API Key authentication is immediate - no redirect needed
21 print(f"Successfully connected Scrape do for user {user_id}")
22 print(f"Connection status: {connection_request.status}")
23
24 return connection_request.id
25
26
27connection_id = authenticate_toolkit(user_id, scrape_do_auth_config_id)
28
29# You can verify the connection using:
30connected_account = composio.connected_accounts.get(connection_id)
31print(f"Connected account: {connected_account}")

Tools

Executing tools

To prototype you can execute some tools to see the responses and working on the [Scrape do toolkit’s playground](https://app.composio.dev/app/Scrape do)

Python
1from composio import Composio
2from openai import OpenAI
3import json
4
5openai = OpenAI()
6composio = Composio()
7
8# User ID must be a valid UUID format
9user_id = "0000-0000-0000" # Replace with actual user UUID from your database
10
11tools = composio.tools.get(user_id=user_id, toolkits=["SCRAPE_DO"])
12
13print("[!] Tools:")
14print(json.dumps(tools))
15
16def invoke_llm(task = "What can you do?"):
17 completion = openai.chat.completions.create(
18 model="gpt-4o",
19 messages=[
20 {
21 "role": "user",
22 "content": task, # Your task here!
23 },
24 ],
25 tools=tools,
26 )
27
28 # Handle Result from tool call
29 result = composio.provider.handle_tool_calls(user_id=user_id, response=completion)
30 print(f"[!] Completion: {completion}")
31 print(f"[!] Tool call result: {result}")
32
33invoke_llm()

Tool List

Tool Name: Get Account Information

Description

Retrieves account information and usage statistics from scrape.do. this action makes a get request to the scrape.do info endpoint to fetch: - subscription status - concurrent request limits and usage - monthly request limits and remaining requests - real-time usage statistics rate limit: maximum 10 requests per minute

Action Parameters

token
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Get rendered page content

Description

This tool allows you to scrape web pages with javascript rendering enabled. it's particularly useful for scraping dynamic websites where content is loaded through javascript. the tool will wait for the javascript to execute and return the fully rendered html content.

Action Parameters

blockResources
booleanDefaults to True
customHeaders
boolean
customWait
integer
device
stringDefaults to desktop
height
integerDefaults to 1080
render
booleanDefaults to True
timeout
integerDefaults to 60000
url
stringRequired
waitSelector
string
waitUntil
stringDefaults to domcontentloaded
width
integerDefaults to 1920

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Scrape webpage using scrape.do

Description

A tool to scrape web pages using scrape.do's api service. it makes a basic get request to fetch the content of a target webpage while handling anti-bot protections and proxy rotation automatically.

Action Parameters

block_resources
booleanDefaults to True
custom_headers
boolean
device
stringDefaults to desktop
disable_redirection
boolean
extra_headers
boolean
geo_code
string
height
integerDefaults to 1080
output
stringDefaults to raw
render
boolean
retry_timeout
integer
return_json
boolean
set_cookies
string
super
boolean
timeout
integer
url
stringRequired
width
integerDefaults to 1920

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Use Scrape.do Proxy Mode

Description

This tool implements the proxy mode functionality of scrape.do, which allows routing requests through their proxy server. it provides an alternative way to access web scraping capabilities by handling complex javascript-rendered pages, geolocation-based routing, device simulation, and built-in anti-bot and retry mechanisms.

Action Parameters

custom_headers
booleanDefaults to True
device
stringDefaults to desktop
geo_code
string
render
boolean
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Cookies for Scraping

Description

This tool allows users to set specific cookies for their scraping requests to a target website. it is useful for maintaining session states or authentication through cookies.

Action Parameters

cookies
stringRequired
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Scrape.do Super Mode

Description

The scrape do set super mode tool enables enhanced scraping by using residential and mobile proxies, bypassing blocks and restrictions associated with datacenter ips. when the 'super' parameter is set to true, it activates a mode that leverages a network of residential ip addresses, which is particularly useful to bypass strict anti-bot measures and for accessing websites that block datacenter ips.

Action Parameters

super_mode
booleanRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Block specific URLs during scraping

Description

This tool allows users to block specific urls during the scraping process. it's particularly useful for blocking unwanted resources like analytics scripts, advertisements, or any other urls that might interfere with the scraping process or slow it down. it provides granular control by allowing users to specify url patterns to block, thereby improving scraping performance and maintaining privacy.

Action Parameters

urls
arrayRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set custom headers for scrape.do request

Description

A tool to send custom headers with scrape.do requests. this allows simulating specific browser behaviors or adding authentication headers by controlling all headers sent to the target website.

Action Parameters

custom_headers
booleanDefaults to True
headers
objectRequired
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Custom Wait Time

Description

This tool sets the custom wait time in milliseconds after page load when using the render option in scrape.do. it is particularly useful for dealing with dynamic content to ensure that it is fully loaded before scraping, especially on javascript-heavy websites or single-page applications. the action allows fine-tuned control over the rendering wait time and must be used with render=true.

Action Parameters

custom_wait
integerDefaults to 5000

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Device Type for Scraping

Description

This tool allows users to set the device type (desktop, mobile, or tablet) for making scraping requests. it is used to emulate different devices, which helps in testing responsive designs or fetching device-specific content.

Action Parameters

device_type
stringRequired
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Disable Redirection

Description

Controls the automatic redirection behavior of scrape.do requests. when enabled (disable redirection=true), prevents the automatic following of redirects during web scraping operations. this allows: - inspection of the redirect chain - capturing intermediate redirect responses - manual control of redirection flow - analysis of http status codes of redirect responses the redirect url will be available in the scrape.do-target-redirected-location response header.

Action Parameters

disable_redirection
booleanDefaults to True

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Pure Cookies Mode

Description

This tool enables getting the original set-cookie headers from target websites instead of the processed scrape.do-cookies format. when enabled, this parameter returns the original set-cookie headers from the target website rather than using the default scrape.do-cookies header format.

Action Parameters

pure_cookies
booleanRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Regional Geolocation for Scraping

Description

This tool allows users to set a broader geographical targeting by specifying a region code instead of a specific country code. this is useful when you want to scrape content from an entire region rather than a specific country. note that this feature requires super mode to be enabled and is only available for business plan or higher subscriptions.

Action Parameters

regional_geo_code
stringRequired
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Retry Timeout

Description

This tool allows users to set the maximum wait time (in milliseconds) before retrying a failed request in scrape.do. it requires a parameter 'retry timeout' (integer) which specifies the maximum time to wait before retrying, with a default of 15000 ms. it is designed to improve the reliability of web scraping operations, especially when dealing with unstable or slow-responding websites.

Action Parameters

retry_timeout
integerDefaults to 15000

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Screenshot Capture for Scraping

Description

This tool enables the screenshot functionality for the scrape.do api, allowing users to capture a visual representation of the scraped webpage. when enabled, the api will return a screenshot of the rendered page along with the regular response. features: - basic screenshot capture - full page screenshot capture - capture specific area using css selector

Action Parameters

enabled
booleanDefaults to True
full_page
boolean
selector
string
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Session ID for Sticky Sessions

Description

This tool implements the session id functionality for scrape.do to maintain a sticky session with the same proxy ip across multiple requests. it achieves this by adding a sessionid parameter to the query parameters of any scraping request, which is crucial for ensuring session consistency when scraping websites with stringent session requirements.

Action Parameters

session_id
integerRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Wait For Selector

Description

This action allows setting a css selector to wait for before considering the page load complete. it is particularly useful when scraping javascript-heavy pages to ensure that certain elements have loaded dynamically.

Action Parameters

selector
stringRequired
timeout
integer

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Set Wait Until Condition

Description

This tool sets the waituntil parameter for the scrape.do api, defining when the rendering should consider the page loaded during javascript execution. it is particularly useful for handling dynamic websites by specifying conditions such as 'domcontentloaded', 'networkidle0', or 'networkidle2'.

Action Parameters

wait_until
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Monitor WebSocket requests using scrape.do

Description

This tool provides the ability to view websocket requests made by a webpage. it requires using render=true and returnjson=true parameters along with showwebsocketrequests=true to enable logging of websocket requests.

Action Parameters

session_id
string
timeout
integer
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired