Firecrawl | Composio Docs

Overview

SLUG: FIRECRAWL

Description

Firecrawl automates web crawling and data extraction, enabling organizations to gather content, index sites, and gain insights from online sources at scale

Authentication Details

API Key

full

stringDefaults to https://api.firecrawl.dev/v1Required

generic_api_key

stringRequired

Connecting to Firecrawl

Create an auth config

Use the dashboard to create an auth config for the Firecrawl toolkit. This allows you to connect multiple Firecrawl accounts to Composio for agents to use.

Select App

Navigate to the Firecrawl toolkit page and click “Setup Integration”.

Configure Auth Config Settings

Select among the supported auth schemes of and configure them here.

Create and Get auth config ID

Click “Create Integration”. After creation, copy the displayed ID starting with ac_. This is your auth config ID. This is not a sensitive ID — you can save it in environment variables or a database. This ID will be used to create connections to the toolkit for a given user.

Connect Your Account

Using API Key

1 from composio import Composio
2 from composio.types import auth_scheme
3 
4 # Replace these with your actual values
5 firecrawl_auth_config_id = "ac_YOUR_FIRECRAWL_CONFIG_ID" # Auth config ID created above
6 user_id = "0000-0000-0000"  # UUID from database/app
7 
8 composio = Composio()
9 
10 def authenticate_toolkit(user_id: str, auth_config_id: str):
11     # Replace this with a method to retrieve an API key from the user.
12     # Or supply your own.
13     user_api_key = input("[!] Enter API key")
14 
15     connection_request = composio.connected_accounts.initiate(
16         user_id=user_id,
17         auth_config_id=auth_config_id,
18         config={"auth_scheme": "API_KEY", "val": user_api_key}
19     )
20 
21     # API Key authentication is immediate - no redirect needed
22     print(f"Successfully connected Firecrawl for user {user_id}")
23     print(f"Connection status: {connection_request.status}")
24     
25     return connection_request.id
26 
27 
28 connection_id = authenticate_toolkit(user_id, firecrawl_auth_config_id)
29 
30 # You can verify the connection using:
31 connected_account = composio.connected_accounts.get(connection_id)
32 print(f"Connected account: {connected_account}")

Tools

Executing tools

To prototype you can execute some tools to see the responses and working on the Firecrawl toolkit’s playground

OpenAI (Python)

Anthropic (TypeScript)

Google (Python)

Vercel (TypeScript)

Python

1 from composio import Composio
2 from openai import OpenAI
3 import json
4 
5 openai = OpenAI()
6 composio = Composio()
7 
8 # User ID must be a valid UUID format
9 user_id = "0000-0000-0000"  # Replace with actual user UUID from your database
10 
11 tools = composio.tools.get(user_id=user_id, toolkits=["FIRECRAWL"])
12 
13 print("[!] Tools:")
14 print(json.dumps(tools))
15 
16 def invoke_llm(task = "What can you do?"):
17     completion = openai.chat.completions.create(
18         model="gpt-4o",
19         messages=[
20             {
21                 "role": "user",
22                 "content": task,  # Your task here!
23             },
24         ],
25         tools=tools,
26     )
27 
28     # Handle Result from tool call
29     result = composio.provider.handle_tool_calls(user_id=user_id, response=completion)
30     print(f"[!] Completion: {completion}")
31     print(f"[!] Tool call result: {result}")
32 
33 invoke_llm()

Tool List

FIRECRAWL_CANCEL_CRAWL_JOB

Tool Name: Cancel a crawl job

Description

Cancels an active or queued web crawl job using its id; attempting to cancel completed, failed, or previously canceled jobs will not change their state.

Action Parameters

stringRequired

Action Response

data

objectRequired

error

string

successful

booleanRequired

FIRECRAWL_CRAWL_JOB_STATUS

Tool Name: Get the status of a crawl job

Description

Retrieves the current status, progress, and details of a web crawl job, using the job id obtained when the crawl was initiated.

Action Parameters

stringRequired

Action Response

data

objectRequired

error

string

successful

booleanRequired

FIRECRAWL_MAP_URLS

Tool Name: Map multiple URLs

Description

Maps a website by discovering urls from a starting base url, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.

Action Parameters

ignoreSitemap

booleanDefaults to True

includeSubdomains

boolean

limit

integerDefaults to 5000

string

url

stringRequired

Action Response

data

objectRequired

error

string

successful

booleanRequired

FIRECRAWL_CRAWL_URLS

Tool Name: Start a web crawl

Description

Initiates a firecrawl web crawl from a given url, applying various filtering and content extraction rules, and polls until the job is complete; ensure the url is accessible and any regex patterns for paths are valid.

Action Parameters

allowBackwardLinks

boolean

allowExternalLinks

boolean

delay

integer

excludePaths

array

ignoreQueryParameters

boolean

ignoreSitemap

booleanDefaults to True

includePaths

array

limit

integerDefaults to 10

maxDepth

integerDefaults to 2

maxDiscoveryDepth

integer

scrapeOptions_actions

array

scrapeOptions_blockAds

boolean

scrapeOptions_changeTrackingOptions

object

scrapeOptions_excludeTags

array

scrapeOptions_formats

arrayDefaults to ['markdown']

scrapeOptions_headers

object

scrapeOptions_includeTags

array

scrapeOptions_jsonOptions

object

scrapeOptions_location

object

scrapeOptions_maxAge

integer

scrapeOptions_mobile

boolean

scrapeOptions_onlyMainContent

booleanDefaults to True

scrapeOptions_parsePDF

boolean

scrapeOptions_proxy

string

scrapeOptions_removeBase64Images

boolean

scrapeOptions_skipTlsVerification

boolean

scrapeOptions_storeInCache

boolean

scrapeOptions_timeout

integer

scrapeOptions_waitFor

integerDefaults to 123

url

stringRequired

webhook

string

Action Response

data

objectRequired

error

string

successful

booleanRequired

FIRECRAWL_EXTRACT

Tool Name: Extract structured data

Description

Extracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a json `schema` (one must be provided).

Action Parameters

enable_web_search

boolean

prompt

string

schema

object

urls

arrayRequired

Action Response

data

objectRequired

error

string

successful

booleanRequired

FIRECRAWL_SCRAPE_EXTRACT_DATA_LLM

Tool Name: Scrape URL

Description

Scrapes a publicly accessible url, optionally performing pre-scrape browser actions or extracting structured json using an llm, to retrieve content in specified formats.

Action Parameters

actions

array

excludeTags

array

formats

arrayDefaults to ['markdown']

includeTags

array

jsonOptions

object

location

object

onlyMainContent

booleanDefaults to True

timeout

integerDefaults to 30000

url

stringRequired

waitFor

integer

Action Response

data

objectRequired

error

string

successful

booleanRequired

FIRECRAWL_SEARCH

Tool Name: Search

Description

Performs a web search for a query, scrapes content from the top search results using firecrawl, and returns details in specified formats.

Action Parameters

country

stringDefaults to us

formats

array

lang

stringDefaults to en

limit

integerDefaults to 5

query

stringRequired

timeout

integerDefaults to 60000

Action Response

data

array

error

string

success

boolean

successful

booleanRequired

warning

string