Firecrawl

Learn how to use Firecrawl with Composio

Overview

SLUG: FIRECRAWL

Description

Firecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale

Authentication Details

full
stringDefaults to https://api.firecrawl.dev/v1Required
generic_api_key
stringRequired

Connecting to Firecrawl

Create an auth config

Use the dashboard to create an auth config for the Firecrawl toolkit. This allows you to connect multiple Firecrawl accounts to Composio for agents to use.

1

Select App

Navigate to Firecrawl.

2

Configure Auth Config Settings

Select among the supported auth schemes of and configure them here.

3

Create and Get auth config ID

Click “Create Firecrawl Auth Config”. After creation, copy the displayed ID starting with ac_. This is your auth config ID. This is not a sensitive ID — you can save it in environment variables or a database. This ID will be used to create connections to the toolkit for a given user.

Connect Your Account

Using API Key

1from composio import Composio
2
3# Replace these with your actual values
4firecrawl_auth_config_id = "ac_YOUR_FIRECRAWL_CONFIG_ID" # Auth config ID created above
5user_id = "0000-0000-0000" # UUID from database/app
6
7composio = Composio()
8
9def authenticate_toolkit(user_id: str, auth_config_id: str):
10 # Replace this with a method to retrieve an API key from the user.
11 # Or supply your own.
12 user_api_key = input("[!] Enter API key")
13
14 connection_request = composio.connected_accounts.initiate(
15 user_id=user_id,
16 auth_config_id=auth_config_id,
17 config={"auth_scheme": "API_KEY", "val": {"generic_api_key": user_api_key}}
18 )
19
20 # API Key authentication is immediate - no redirect needed
21 print(f"Successfully connected Firecrawl for user {user_id}")
22 print(f"Connection status: {connection_request.status}")
23
24 return connection_request.id
25
26
27connection_id = authenticate_toolkit(user_id, firecrawl_auth_config_id)
28
29# You can verify the connection using:
30connected_account = composio.connected_accounts.get(connection_id)
31print(f"Connected account: {connected_account}")

Tools

Executing tools

To prototype you can execute some tools to see the responses and working on the Firecrawl toolkit’s playground

For code examples, see the Tool calling guide and Provider examples.

Tool List

Tool Name: Batch scrape multiple URLs

Description

Tool to scrape multiple URLs in batch with concurrent processing. Use when you need to scrape multiple web pages efficiently with customizable formats and content filtering.

Action Parameters

actions
blockAds
booleanDefaults to True
excludeTags
formats
arrayDefaults to ['markdown']
headers
ignoreInvalidURLs
booleanDefaults to True
includeTags
location
maxAge
maxConcurrency
mobile
boolean
onlyMainContent
booleanDefaults to True
proxy
removeBase64Images
booleanDefaults to True
skipTlsVerification
booleanDefaults to True
storeInCache
booleanDefaults to True
timeout
urls
arrayRequired
waitFor
integer
webhook
zeroDataRetention
boolean

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Cancel a batch scrape job

Description

Tool to cancel a running batch scrape job using its unique identifier. Use when you need to terminate an in-progress batch scrape operation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Get batch scrape status

Description

Retrieves the current status and results of a batch scrape job using the job ID. Use this to check batch scrape progress and retrieve scraped data.

Action Parameters

id
stringRequired

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
expiresAt
stringRequired
next
status
stringRequired
successful
booleanRequired
total
integerRequired

Tool Name: Get errors from batch scrape job

Description

Tool to retrieve error details from a batch scrape job, including failed URLs and URLs blocked by robots.txt. Use when you need to debug or understand why certain pages failed to scrape in a batch operation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Cancel a crawl job

Description

Cancels an active or queued web crawl job using its ID; attempting to cancel completed, failed, or previously canceled jobs will not change their state.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Start a web crawl

Description

Initiates a Firecrawl web crawl from a given URL, applying various filtering and content extraction rules, and polls until the job is complete; ensure the URL is accessible and any regex patterns for paths are valid.

Action Parameters

allowBackwardLinks
allowExternalLinks
crawlEntireDomain
delay
excludePaths
ignoreQueryParameters
ignoreSitemap
includePaths
limit
integerDefaults to 10
maxDepth
maxDiscoveryDepth
scrapeOptions_actions
scrapeOptions_blockAds
scrapeOptions_changeTrackingOptions
scrapeOptions_excludeTags
scrapeOptions_formats
arrayDefaults to ['markdown']
scrapeOptions_headers
scrapeOptions_includeTags
scrapeOptions_jsonOptions
scrapeOptions_location
scrapeOptions_maxAge
scrapeOptions_mobile
scrapeOptions_onlyMainContent
booleanDefaults to True
scrapeOptions_parsePDF
scrapeOptions_proxy
scrapeOptions_removeBase64Images
scrapeOptions_skipTlsVerification
scrapeOptions_storeInCache
scrapeOptions_timeout
scrapeOptions_waitFor
integerDefaults to 123
url
stringRequired
webhook

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
expiresAt
stringRequired
next
status
stringRequired
success
booleanRequired
successful
booleanRequired
total
integerRequired
warning

Tool Name: Get all active crawl jobs

Description

Tool to retrieve all active crawl jobs for the authenticated team. Use when you need to see which crawl operations are currently running.

Action Parameters

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Cancel a crawl job

Description

Tool to cancel a running crawl job by its ID. Use when you need to stop an active crawl operation. The API returns a status of 'cancelled' upon successful cancellation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Get crawl job status

Description

Tool to retrieve the status and results of a Firecrawl crawl job. Use when you need to check the progress or get data from an ongoing or completed crawl operation. Returns crawl status, progress metrics, credits used, and the crawled page data.

Action Parameters

id
stringRequired

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
expiresAt
stringRequired
next
status
stringRequired
successful
booleanRequired
total
integerRequired

Tool Name: Get errors from a crawl job

Description

Tool to retrieve errors from a Firecrawl crawl job. Use when you need to understand why certain pages failed to scrape or which URLs were blocked by robots.txt during a crawl operation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Preview crawl parameters

Description

Preview crawl parameters before starting a crawl by generating optimal configuration from natural language instructions. Use this tool to understand what crawl settings will be applied based on your requirements before executing a full crawl operation. The endpoint intelligently interprets natural language prompts to configure crawl parameters like include/exclude paths, depth limits, and domain scope.

Action Parameters

prompt
stringRequired
url
stringRequired

Action Response

data
objectRequired
error
success
booleanRequired
successful
booleanRequired

Tool Name: Start a web crawl (v2) [NEW]

Description

[NEW v2 API] Initiates a Firecrawl v2 web crawl with enhanced features over v1: natural language prompts for automatic crawler configuration, crawlEntireDomain for sibling/parent page discovery, better depth control with maxDiscoveryDepth, subdomain support, and full webhook configuration. Polls until crawl is complete.

Action Parameters

allowExternalLinks
boolean
allowSubdomains
boolean
crawlEntireDomain
boolean
delay
excludePaths
ignoreQueryParameters
boolean
includePaths
limit
integerDefaults to 10
maxConcurrency
maxDiscoveryDepth
prompt
scrapeOptions_actions
scrapeOptions_blockAds
scrapeOptions_excludeTags
scrapeOptions_formats
arrayDefaults to ['markdown']
scrapeOptions_headers
scrapeOptions_includeTags
scrapeOptions_jsonOptions
scrapeOptions_location
scrapeOptions_maxAge
scrapeOptions_mobile
scrapeOptions_onlyMainContent
booleanDefaults to True
scrapeOptions_parsers
scrapeOptions_proxy
scrapeOptions_removeBase64Images
scrapeOptions_skipTlsVerification
scrapeOptions_storeInCache
scrapeOptions_timeout
scrapeOptions_waitFor
sitemap
stringDefaults to include
url
stringRequired
webhook
zeroDataRetention
boolean

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
expiresAt
stringRequired
next
status
stringRequired
successful
booleanRequired
total
integerRequired

Tool Name: Get team credit usage

Description

Tool to get current team credit usage information. Use when you need to check remaining credits or billing period details.

Action Parameters

Action Response

data
objectRequired
error
success
booleanRequired
successful
booleanRequired

Tool Name: Get historical team credit usage

Description

Tool to retrieve historical team credit usage on a monthly basis. Use when you need to analyze credit consumption patterns over time, optionally segmented by API key.

Action Parameters

byApiKey

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Perform deep research

Description

Initiates an AI-powered deep research operation on any topic with autonomous web exploration and analysis. Use when you need comprehensive research with synthesized findings from multiple sources. Note: This API is in Alpha and being deprecated in favor of Search API after June 30, 2025.

Action Parameters

analysisPrompt
jsonOptions
maxUrls
Defaults to 20
max_depth
query
stringRequired
systemPrompt
time_limit

Action Response

currentDepth
data
error
id
maxDepth
status
success
booleanRequired
successful
booleanRequired

Tool Name: Extract structured data

Description

Extracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a JSON `schema` (one must be provided).

Action Parameters

enable_web_search
boolean
prompt
schema
urls
arrayRequired

Action Response

creditsUsed
data
objectRequired
error
expiresAt
stringRequired
id
invalidURLs
sources
status
stringRequired
success
booleanRequired
successful
booleanRequired
tokensUsed
urlTrace
warning

Tool Name: Get extract job status

Description

Tool to retrieve the status and results of a previously submitted extract job. Use when you need to check the progress or get the final results of an extraction operation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
expiresAt
stringRequired
invalidURLs
sources
status
stringRequired
success
booleanRequired
successful
booleanRequired
tokensUsed
urlTrace
warning

Tool Name: Generate LLMs.txt for a website

Description

Tool to generate an LLMs.txt file for a website, making content accessible to language models in a standardized format. Use when you need to create an LLM-friendly representation of website content.

Action Parameters

maxUrls
Defaults to 2
showFullText
url
stringRequired

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Get deep research status

Description

Retrieves the status and results of a deep research job by its ID. Use when you need to check the progress or retrieve the final analysis of a deep research operation.

Action Parameters

id
stringRequired

Action Response

activities
currentDepth
data
objectRequired
error
expiresAt
maxDepth
sources
status
stringRequired
success
booleanRequired
successful
booleanRequired
totalUrls

Tool Name: Get LLMs.txt generation job status

Description

Tool to get the status and results of an LLMs.txt generation job. Use when you need to check if a job has completed and retrieve the generated content.

Action Parameters

id
stringRequired

Action Response

data
error
expiresAt
status
stringRequired
success
booleanRequired
successful
booleanRequired

Tool Name: Get the status of a crawl job

Description

Retrieves the current status, progress, and details of a web crawl job, using the job ID obtained when the crawl was initiated.

Action Parameters

id
stringRequired

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
expiresAt
stringRequired
next
status
stringRequired
successful
booleanRequired
total
integerRequired

Tool Name: Map multiple URLs

Description

Maps a website by discovering URLs from a starting base URL, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.

Action Parameters

ignoreQueryParameters
includeSubdomains
limit
search
timeout
url
stringRequired

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Get team queue status

Description

Tool to retrieve metrics about the team's scrape queue. Use when you need to check queue status, job counts, or concurrency limits.

Action Parameters

Action Response

data
objectRequired
error
successful
booleanRequired

Tool Name: Scrape URL

Description

Scrapes a publicly accessible URL, optionally performing pre-scrape browser actions or extracting structured JSON using an LLM, to retrieve content in specified formats.

Action Parameters

actions
excludeTags
formats
arrayDefaults to ['markdown']
includeTags
jsonOptions
location
onlyMainContent
booleanDefaults to True
timeout
integerDefaults to 30000
url
stringRequired
waitFor
integer

Action Response

data
objectRequired
error
success
booleanRequired
successful
booleanRequired

Tool Name: Get team token usage

Description

Tool to retrieve the current team's token usage and balance information for Firecrawl's Extract feature. Use when you need to check remaining token credits, plan allocation, or billing period details.

Action Parameters

Action Response

data
objectRequired
error
success
booleanRequired
successful
booleanRequired

Tool Name: Get historical team token usage

Description

Tool to retrieve historical team token usage on a monthly basis. Use when you need to analyze token consumption patterns over time, optionally segmented by API key.

Action Parameters

byApiKey

Action Response

data
objectRequired
error
successful
booleanRequired