Firecrawl

Learn how to use Firecrawl with Composio

Overview

SLUG: FIRECRAWL

Description

Firecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale

Authentication Details

full
stringDefaults to https://api.firecrawl.dev/v1Required
generic_api_key
stringRequired

Connecting to Firecrawl

Create an auth config

Use the dashboard to create an auth config for the Firecrawl toolkit. This allows you to connect multiple Firecrawl accounts to Composio for agents to use.

1

Select App

Navigate to Firecrawl.

2

Configure Auth Config Settings

Select among the supported auth schemes of and configure them here.

3

Create and Get auth config ID

Click “Create Firecrawl Auth Config”. After creation, copy the displayed ID starting with ac_. This is your auth config ID. This is not a sensitive ID — you can save it in environment variables or a database. This ID will be used to create connections to the toolkit for a given user.

Connect Your Account

Using API Key

1from composio import Composio
2
3# Replace these with your actual values
4firecrawl_auth_config_id = "ac_YOUR_FIRECRAWL_CONFIG_ID" # Auth config ID created above
5user_id = "0000-0000-0000" # UUID from database/app
6
7composio = Composio()
8
9def authenticate_toolkit(user_id: str, auth_config_id: str):
10 # Replace this with a method to retrieve an API key from the user.
11 # Or supply your own.
12 user_api_key = input("[!] Enter API key")
13
14 connection_request = composio.connected_accounts.initiate(
15 user_id=user_id,
16 auth_config_id=auth_config_id,
17 config={"auth_scheme": "API_KEY", "val": {"generic_api_key": user_api_key}}
18 )
19
20 # API Key authentication is immediate - no redirect needed
21 print(f"Successfully connected Firecrawl for user {user_id}")
22 print(f"Connection status: {connection_request.status}")
23
24 return connection_request.id
25
26
27connection_id = authenticate_toolkit(user_id, firecrawl_auth_config_id)
28
29# You can verify the connection using:
30connected_account = composio.connected_accounts.get(connection_id)
31print(f"Connected account: {connected_account}")

Tools

Executing tools

To prototype you can execute some tools to see the responses and working on the Firecrawl toolkit’s playground

For code examples, see the Tool calling guide and Provider examples.

Tool List

Tool Name: Batch scrape multiple URLs

Description

Tool to scrape multiple URLs in batch with concurrent processing. Use when you need to scrape multiple web pages efficiently with customizable formats and content filtering.

Action Parameters

actions
array
blockAds
booleanDefaults to True
excludeTags
array
formats
arrayDefaults to ['markdown']
headers
object
ignoreInvalidURLs
booleanDefaults to True
includeTags
array
location
object
maxAge
integer
maxConcurrency
integer
mobile
boolean
onlyMainContent
booleanDefaults to True
proxy
string
removeBase64Images
booleanDefaults to True
skipTlsVerification
booleanDefaults to True
storeInCache
booleanDefaults to True
timeout
integer
urls
arrayRequired
waitFor
integer
webhook
object
zeroDataRetention
boolean

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Cancel a batch scrape job

Description

Tool to cancel a running batch scrape job using its unique identifier. Use when you need to terminate an in-progress batch scrape operation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Get batch scrape status

Description

Retrieves the current status and results of a batch scrape job using the job ID. Use this to check batch scrape progress and retrieve scraped data.

Action Parameters

id
stringRequired

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
string
expiresAt
stringRequired
next
string
status
stringRequired
successful
booleanRequired
total
integerRequired

Tool Name: Get errors from batch scrape job

Description

Tool to retrieve error details from a batch scrape job, including failed URLs and URLs blocked by robots.txt. Use when you need to debug or understand why certain pages failed to scrape in a batch operation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Cancel a crawl job

Description

Cancels an active or queued web crawl job using its ID; attempting to cancel completed, failed, or previously canceled jobs will not change their state.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Start a web crawl

Description

Initiates a Firecrawl web crawl from a given URL, applying various filtering and content extraction rules, and polls until the job is complete; ensure the URL is accessible and any regex patterns for paths are valid.

Action Parameters

allowBackwardLinks
boolean
allowExternalLinks
boolean
delay
integer
excludePaths
array
ignoreQueryParameters
boolean
ignoreSitemap
booleanDefaults to True
includePaths
array
limit
integerDefaults to 10
maxDepth
integerDefaults to 2
maxDiscoveryDepth
integer
scrapeOptions_actions
array
scrapeOptions_blockAds
boolean
scrapeOptions_changeTrackingOptions
object
scrapeOptions_excludeTags
array
scrapeOptions_formats
arrayDefaults to ['markdown']
scrapeOptions_headers
object
scrapeOptions_includeTags
array
scrapeOptions_jsonOptions
object
scrapeOptions_location
object
scrapeOptions_maxAge
integer
scrapeOptions_mobile
boolean
scrapeOptions_onlyMainContent
booleanDefaults to True
scrapeOptions_parsePDF
boolean
scrapeOptions_proxy
string
scrapeOptions_removeBase64Images
boolean
scrapeOptions_skipTlsVerification
boolean
scrapeOptions_storeInCache
boolean
scrapeOptions_timeout
integer
scrapeOptions_waitFor
integerDefaults to 123
url
stringRequired
webhook
string

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
string
expiresAt
stringRequired
next
string
status
stringRequired
success
booleanRequired
successful
booleanRequired
total
integerRequired
warning
string

Tool Name: Get all active crawl jobs

Description

Tool to retrieve all active crawl jobs for the authenticated team. Use when you need to see which crawl operations are currently running.

Action Parameters

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Cancel a crawl job

Description

Tool to cancel a running crawl job by its ID. Use when you need to stop an active crawl operation. The API returns a status of 'cancelled' upon successful cancellation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Get crawl job status

Description

Tool to retrieve the status and results of a Firecrawl crawl job. Use when you need to check the progress or get data from an ongoing or completed crawl operation. Returns crawl status, progress metrics, credits used, and the crawled page data.

Action Parameters

id
stringRequired

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
string
expiresAt
stringRequired
next
string
status
stringRequired
successful
booleanRequired
total
integerRequired

Tool Name: Get errors from a crawl job

Description

Tool to retrieve errors from a Firecrawl crawl job. Use when you need to understand why certain pages failed to scrape or which URLs were blocked by robots.txt during a crawl operation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Preview crawl parameters

Description

Preview crawl parameters before starting a crawl by generating optimal configuration from natural language instructions. Use this tool to understand what crawl settings will be applied based on your requirements before executing a full crawl operation. The endpoint intelligently interprets natural language prompts to configure crawl parameters like include/exclude paths, depth limits, and domain scope.

Action Parameters

prompt
stringRequired
url
stringRequired

Action Response

data
objectRequired
error
string
success
booleanRequired
successful
booleanRequired

Tool Name: Start a web crawl (v2) [NEW]

Description

[NEW v2 API] Initiates a Firecrawl v2 web crawl with enhanced features over v1: natural language prompts for automatic crawler configuration, crawlEntireDomain for sibling/parent page discovery, better depth control with maxDiscoveryDepth, subdomain support, and full webhook configuration. Polls until crawl is complete.

Action Parameters

allowExternalLinks
boolean
allowSubdomains
boolean
crawlEntireDomain
boolean
delay
integer
excludePaths
array
ignoreQueryParameters
boolean
includePaths
array
limit
integerDefaults to 10
maxConcurrency
integer
maxDiscoveryDepth
integer
prompt
string
scrapeOptions_actions
array
scrapeOptions_blockAds
boolean
scrapeOptions_excludeTags
array
scrapeOptions_formats
arrayDefaults to ['markdown']
scrapeOptions_headers
object
scrapeOptions_includeTags
array
scrapeOptions_jsonOptions
object
scrapeOptions_location
object
scrapeOptions_maxAge
integer
scrapeOptions_mobile
boolean
scrapeOptions_onlyMainContent
booleanDefaults to True
scrapeOptions_parsers
array
scrapeOptions_proxy
string
scrapeOptions_removeBase64Images
boolean
scrapeOptions_skipTlsVerification
boolean
scrapeOptions_storeInCache
boolean
scrapeOptions_timeout
integer
scrapeOptions_waitFor
integer
sitemap
stringDefaults to include
url
stringRequired
webhook
object
zeroDataRetention
boolean

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
string
expiresAt
stringRequired
next
string
status
stringRequired
successful
booleanRequired
total
integerRequired

Tool Name: Get team credit usage

Description

Tool to get current team credit usage information. Use when you need to check remaining credits or billing period details.

Action Parameters

Action Response

data
objectRequired
error
string
success
booleanRequired
successful
booleanRequired

Tool Name: Get historical team credit usage

Description

Tool to retrieve historical team credit usage on a monthly basis. Use when you need to analyze credit consumption patterns over time, optionally segmented by API key.

Action Parameters

byApiKey
boolean

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Perform deep research

Description

Initiates an AI-powered deep research operation on any topic with autonomous web exploration and analysis. Use when you need comprehensive research with synthesized findings from multiple sources. Note: This API is in Alpha and being deprecated in favor of Search API after June 30, 2025.

Action Parameters

analysisPrompt
string
jsonOptions
object
maxUrls
integerDefaults to 20
max_depth
integer
query
stringRequired
systemPrompt
string
time_limit
integer

Action Response

currentDepth
integer
data
object
error
string
id
string
maxDepth
integer
status
string
success
booleanRequired
successful
booleanRequired

Tool Name: Extract structured data

Description

Extracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a JSON `schema` (one must be provided).

Action Parameters

enable_web_search
boolean
prompt
string
schema
object
urls
arrayRequired

Action Response

data
objectRequired
error
string
expiresAt
stringRequired
id
stringRequired
invalidURLs
array
sources
array
status
stringRequired
success
booleanRequired
successful
booleanRequired
tokensUsed
integer
urlTrace
array
warning
string

Tool Name: Get extract job status

Description

Tool to retrieve the status and results of a previously submitted extract job. Use when you need to check the progress or get the final results of an extraction operation.

Action Parameters

id
stringRequired

Action Response

data
objectRequired
error
string
expiresAt
stringRequired
invalidURLs
array
sources
array
status
stringRequired
success
booleanRequired
successful
booleanRequired
tokensUsed
integer
urlTrace
array
warning
string

Tool Name: Generate LLMs.txt for a website

Description

Tool to generate an LLMs.txt file for a website, making content accessible to language models in a standardized format. Use when you need to create an LLM-friendly representation of website content.

Action Parameters

maxUrls
integerDefaults to 2
showFullText
boolean
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Get deep research status

Description

Retrieves the status and results of a deep research job by its ID. Use when you need to check the progress or retrieve the final analysis of a deep research operation.

Action Parameters

id
stringRequired

Action Response

activities
array
currentDepth
integer
data
objectRequired
error
string
expiresAt
string
maxDepth
integer
sources
array
status
stringRequired
success
booleanRequired
successful
booleanRequired
totalUrls
integer

Tool Name: Get LLMs.txt generation job status

Description

Tool to get the status and results of an LLMs.txt generation job. Use when you need to check if a job has completed and retrieve the generated content.

Action Parameters

id
stringRequired

Action Response

data
object
error
string
expiresAt
string
status
stringRequired
success
booleanRequired
successful
booleanRequired

Tool Name: Get the status of a crawl job

Description

Retrieves the current status, progress, and details of a web crawl job, using the job ID obtained when the crawl was initiated.

Action Parameters

id
stringRequired

Action Response

completed
integerRequired
creditsUsed
integerRequired
data
arrayRequired
error
string
expiresAt
stringRequired
next
string
status
stringRequired
successful
booleanRequired
total
integerRequired

Tool Name: Map multiple URLs

Description

Maps a website by discovering URLs from a starting base URL, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.

Action Parameters

ignoreSitemap
booleanDefaults to True
includeSubdomains
boolean
limit
integerDefaults to 5000
search
string
url
stringRequired

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Get team queue status

Description

Tool to retrieve metrics about the team's scrape queue. Use when you need to check queue status, job counts, or concurrency limits.

Action Parameters

Action Response

data
objectRequired
error
string
successful
booleanRequired

Tool Name: Scrape URL

Description

Scrapes a publicly accessible URL, optionally performing pre-scrape browser actions or extracting structured JSON using an LLM, to retrieve content in specified formats.

Action Parameters

actions
array
excludeTags
array
formats
arrayDefaults to ['markdown']
includeTags
array
jsonOptions
object
location
object
onlyMainContent
booleanDefaults to True
timeout
integerDefaults to 30000
url
stringRequired
waitFor
integer

Action Response

data
objectRequired
error
string
success
booleanRequired
successful
booleanRequired

Tool Name: Get team token usage

Description

Tool to retrieve the current team's token usage and balance information for Firecrawl's Extract feature. Use when you need to check remaining token credits, plan allocation, or billing period details.

Action Parameters

Action Response

data
objectRequired
error
string
success
booleanRequired
successful
booleanRequired

Tool Name: Get historical team token usage

Description

Tool to retrieve historical team token usage on a monthly basis. Use when you need to analyze token consumption patterns over time, optionally segmented by API key.

Action Parameters

byApiKey
boolean

Action Response

data
objectRequired
error
string
successful
booleanRequired