Brightdata

Learn how to use Brightdata with Composio

Overview

SLUG: BRIGHTDATA

Description

Bright Data provides the world’s #1 web data platform with Web Unlocker for bypassing anti-bot systems, SERP API for search engine data, and pre-made scrapers for popular websites. Collect any web data at scale.

Authentication Details

generic_api_key
stringRequired

Connecting to Brightdata

Create an auth config

Use the dashboard to create an auth config for the Brightdata toolkit. This allows you to connect multiple Brightdata accounts to Composio for agents to use.

1

Select App

Navigate to Brightdata.

2

Configure Auth Config Settings

Select among the supported auth schemes of and configure them here.

3

Create and Get auth config ID

Click “Create Brightdata Auth Config”. After creation, copy the displayed ID starting with ac_. This is your auth config ID. This is not a sensitive ID — you can save it in environment variables or a database. This ID will be used to create connections to the toolkit for a given user.

Connect Your Account

Using API Key

1from composio import Composio
2
3# Replace these with your actual values
4brightdata_auth_config_id = "ac_YOUR_BRIGHTDATA_CONFIG_ID" # Auth config ID created above
5user_id = "0000-0000-0000" # UUID from database/app
6
7composio = Composio()
8
9def authenticate_toolkit(user_id: str, auth_config_id: str):
10 # Replace this with a method to retrieve an API key from the user.
11 # Or supply your own.
12 user_api_key = input("[!] Enter API key")
13
14 connection_request = composio.connected_accounts.initiate(
15 user_id=user_id,
16 auth_config_id=auth_config_id,
17 config={"auth_scheme": "API_KEY", "val": {"generic_api_key": user_api_key}}
18 )
19
20 # API Key authentication is immediate - no redirect needed
21 print(f"Successfully connected Brightdata for user {user_id}")
22 print(f"Connection status: {connection_request.status}")
23
24 return connection_request.id
25
26
27connection_id = authenticate_toolkit(user_id, brightdata_auth_config_id)
28
29# You can verify the connection using:
30connected_account = composio.connected_accounts.get(connection_id)
31print(f"Connected account: {connected_account}")

Tools

Executing tools

To prototype you can execute some tools to see the responses and working on the Brightdata toolkit’s playground

Python
1from composio import Composio
2from openai import OpenAI
3import json
4
5openai = OpenAI()
6composio = Composio()
7
8# User ID must be a valid UUID format
9user_id = "0000-0000-0000" # Replace with actual user UUID from your database
10
11tools = composio.tools.get(user_id=user_id, toolkits=["BRIGHTDATA"])
12
13print("[!] Tools:")
14print(json.dumps(tools))
15
16def invoke_llm(task = "What can you do?"):
17 completion = openai.chat.completions.create(
18 model="gpt-4o",
19 messages=[
20 {
21 "role": "user",
22 "content": task, # Your task here!
23 },
24 ],
25 tools=tools,
26 )
27
28 # Handle Result from tool call
29 result = composio.provider.handle_tool_calls(user_id=user_id, response=completion)
30 print(f"[!] Completion: {completion}")
31 print(f"[!] Tool call result: {result}")
32
33invoke_llm()

Tool List

Tool Name: Trigger Site Crawl

Description

Tool to trigger a site crawl job to extract content across multiple pages or entire domains. use when you need to start a crawl for a given dataset and list of urls.

Action Parameters

custom_output_fields
string
dataset_id
stringRequired
include_errors
boolean
items
arrayRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Browse Available Scrapers

Description

Tool to list all available pre-made scrapers (datasets) from bright data's marketplace. use when you need to browse available data sources for structured scraping.

Action Parameters

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Filter Dataset

Description

Tool to apply custom filter criteria to a marketplace dataset (beta). use after selecting a dataset to generate a filtered snapshot.

Action Parameters

dataset_id
stringRequired
files
array
filter
objectRequired
records_limit
integer

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Get Available Cities

Description

Tool to get available static network cities for a given country. use when you need to configure static proxy endpoints after selecting a country.

Action Parameters

country
stringRequired
pool_ip_type
stringDefaults to dc

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Get Available Countries

Description

Tool to list available countries and their iso 3166-1 alpha-2 codes. use when you need to configure zones with valid country codes before provisioning proxies.

Action Parameters

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Download Scraped Data

Description

Tool to retrieve the scraped data from a completed crawl job by snapshot id. use after triggering a crawl or filtering a dataset to download the collected data.

Action Parameters

format
stringDefaults to json
limit
integer
offset
integer
snapshot_id
stringRequired

Action Response

content
string
data
error
string
successful
booleanRequired

Tool Name: Check Crawl Status

Description

Tool to check the processing status of a crawl job using snapshot id. call before attempting to download results to ensure data collection is complete.

Action Parameters

snapshot_id
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: List Unlocker Zones

Description

Tool to list your configured web unlocker zones and proxy endpoints. use to view available zones for web scraping and bot protection bypass.

Action Parameters

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Web Unlocker

Description

Tool to bypass bot detection, captcha, and other anti-scraping measures to extract content from websites. use when you need to scrape websites that block automated access or require javascript rendering.

Action Parameters

country
string
device
stringDefaults to desktop
format
stringDefaults to html
render_js
booleanDefaults to True
timeout
integerDefaults to 30
url
stringRequired
wait_for
string

Action Response

data
objectRequired
error
string
successful
booleanRequired