Firecrawl

Learn how to use Firecrawl with Composio

Overview

SLUG: FIRECRAWL

Description

Firecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale

Authentication Details

full
stringDefaults to https://api.firecrawl.dev/v1Required
generic_api_key
stringRequired

Connecting to Firecrawl

Create an auth config

Use the dashboard to create an auth config for the Firecrawl toolkit. This allows you to connect multiple Firecrawl accounts to Composio for agents to use.

1

Select App

Navigate to the Firecrawl toolkit page and click “Setup Integration”.

2

Configure Auth Config Settings

Select among the supported auth schemes of and configure them here.

3

Create and Get auth config ID

Click “Create Integration”. After creation, copy the displayed ID starting with ac_. This is your auth config ID. This is not a sensitive ID — you can save it in environment variables or a database. This ID will be used to create connections to the toolkit for a given user.

Connect Your Account

Using API Key

1from composio import Composio
2from composio.types import auth_scheme
3
4# Replace these with your actual values
5firecrawl_auth_config_id = "ac_YOUR_FIRECRAWL_CONFIG_ID" # Auth config ID created above
6user_id = "0000-0000-0000" # UUID from database/app
7
8composio = Composio()
9
10def authenticate_toolkit(user_id: str, auth_config_id: str):
11 # Replace this with a method to retrieve an API key from the user.
12 # Or supply your own.
13 user_api_key = input("[!] Enter API key")
14
15 connection_request = composio.connected_accounts.initiate(
16 user_id=user_id,
17 auth_config_id=auth_config_id,
18 config={"auth_scheme": "API_KEY", "val": user_api_key}
19 )
20
21 # API Key authentication is immediate - no redirect needed
22 print(f"Successfully connected Firecrawl for user {user_id}")
23 print(f"Connection status: {connection_request.status}")
24
25 return connection_request.id
26
27
28connection_id = authenticate_toolkit(user_id, firecrawl_auth_config_id)
29
30# You can verify the connection using:
31connected_account = composio.connected_accounts.get(connection_id)
32print(f"Connected account: {connected_account}")

Tools

Executing tools

To prototype you can execute some tools to see the responses and working on the Firecrawl toolkit’s playground

Python
1from composio import Composio
2from openai import OpenAI
3import json
4
5openai = OpenAI()
6composio = Composio()
7
8# User ID must be a valid UUID format
9user_id = "0000-0000-0000" # Replace with actual user UUID from your database
10
11tools = composio.tools.get(user_id=user_id, toolkits=["FIRECRAWL"])
12
13print("[!] Tools:")
14print(json.dumps(tools))
15
16def invoke_llm(task = "What can you do?"):
17 completion = openai.chat.completions.create(
18 model="gpt-4o",
19 messages=[
20 {
21 "role": "user",
22 "content": task, # Your task here!
23 },
24 ],
25 tools=tools,
26 )
27
28 # Handle Result from tool call
29 result = composio.provider.handle_tool_calls(user_id=user_id, response=completion)
30 print(f"[!] Completion: {completion}")
31 print(f"[!] Tool call result: {result}")
32
33invoke_llm()

Tool List

Tool Name: Cancel a crawl job

Description

Cancels an active or queued web crawl job using its id; attempting to cancel completed, failed, or previously canceled jobs will not change their state.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Get the status of a crawl job

Description

Retrieves the current status, progress, and details of a web crawl job, using the job id obtained when the crawl was initiated.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Map multiple URLs

Description

Maps a website by discovering urls from a starting base url, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.

Action Parameters

ignoreSitemap
booleanDefaults to True
includeSubdomains
boolean
limit
integerDefaults to 5000
search
string
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Start a web crawl

Description

Initiates a firecrawl web crawl from a given url, applying various filtering and content extraction rules, and polls until the job is complete; ensure the url is accessible and any regex patterns for paths are valid.

Action Parameters

allowBackwardLinks
boolean
allowExternalLinks
boolean
delay
integer
excludePaths
array
ignoreQueryParameters
boolean
ignoreSitemap
booleanDefaults to True
includePaths
array
limit
integerDefaults to 10
maxDepth
integerDefaults to 2
maxDiscoveryDepth
integer
scrapeOptions_actions
array
scrapeOptions_blockAds
boolean
scrapeOptions_changeTrackingOptions
object
scrapeOptions_excludeTags
array
scrapeOptions_formats
arrayDefaults to ['markdown']
scrapeOptions_headers
object
scrapeOptions_includeTags
array
scrapeOptions_jsonOptions
object
scrapeOptions_location
object
scrapeOptions_maxAge
integer
scrapeOptions_mobile
boolean
scrapeOptions_onlyMainContent
booleanDefaults to True
scrapeOptions_parsePDF
boolean
scrapeOptions_proxy
string
scrapeOptions_removeBase64Images
boolean
scrapeOptions_skipTlsVerification
boolean
scrapeOptions_storeInCache
boolean
scrapeOptions_timeout
integer
scrapeOptions_waitFor
integerDefaults to 123
url
stringRequired
webhook
string

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Extract structured data

Description

Extracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a json `schema` (one must be provided).

Action Parameters

enable_web_search
boolean
prompt
string
schema
object
urls
arrayRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Scrape URL

Description

Scrapes a publicly accessible url, optionally performing pre-scrape browser actions or extracting structured json using an llm, to retrieve content in specified formats.

Action Parameters

actions
array
excludeTags
array
formats
arrayDefaults to ['markdown']
includeTags
array
jsonOptions
object
location
object
onlyMainContent
booleanDefaults to True
timeout
integerDefaults to 30000
url
stringRequired
waitFor
integer

Action Response

data
objectRequired
error
string
successful
booleanRequired