Firecrawl Website Scraper

Firecrawl Website Scraper Flow is designed to send a website URL to the Firecrawl API, retrieve the website’s content in a specified format (e.g., markdown), and store the results for further analysis. This flow is perfect for automating website scraping tasks, allowing you to capture and structure website content efficiently. It can be particularly useful for applications such as research automation, competitive analysis, and content aggregation.

You can find this template in the Services Catalog under these categories:

Contextual Basics, Enrichment

What's Included:

1 Flow
1 Object Type
1 Connection

What You'll Need:

Access to the Firecrawl API
API Key for the Firecrawl service

Ideas for Using the Firecrawl Website Scraper Flow:

Research Automation: Use this flow to automate website scraping on specific topics such as market trends, competitor analysis, or product reviews.
Content Aggregation: Quickly extract and organize website content by sending various URLs to the Firecrawl API and capturing structured markdown data.
Data Enrichment: Implement this flow to enhance internal datasets with additional information scraped from relevant websites.

Flow Overview

Flow Start

The flow begins by injecting a test URL, which can be modified to suit specific scraping needs.

Send URL and Receive Response

The flow sends the URL to the Firecrawl API. The API processes the URL and returns a response, which is then logged and passed on for further formatting.

Format Response & Create Record

The response from the Firecrawl API is structured into a record format that includes the website title, description, and content. The scraped data is then stored in the system.

Error Handling

Any errors encountered during the flow are captured and logged for troubleshooting, ensuring that issues can be quickly identified and resolved.

Flow End

The flow concludes once the records have been successfully created or any errors have been logged.

Firecrawl Website Scraper Flow Details

Inbound Send to Agent Events

Nodes: contextual-start
Purpose: The flow begins by receiving a start signal, typically initiated by an external event or agent.

In-Editor Testing

Nodes: Test URL, Prepare Scrape
Purpose: Allows for testing the flow directly within the editor. The URL is prepared and passed to the Firecrawl API for processing.

Code Example: Prepare Scrape Function

// Prepare the payload for Firecrawl API request
msg.payload = {
    url: msg.payload.url, // URL to be scraped, set in the Inject node above
    formats: ["markdown"] // Specify the output format for the scraped data
};
return msg;

Explanation: This function constructs the payload for the Firecrawl API, specifying the URL to scrape and the desired output format. The payload is then passed to the next node in the flow, where it will be sent to the API.

Send URL and Receive Response

Nodes: Send to Firecrawl, Firecrawl Response
Purpose: The prepared URL is sent to the Firecrawl API. The response is logged and passed on for further formatting and processing.

Code Example: Firecrawl Response Log

// Log the response from the Firecrawl API
msg.payload.response = {
    title: msg.payload.response.data.metadata.title,
    description: msg.payload.response.data.metadata.description,
    content: msg.payload.response.data.markdown
};
return msg;

Explanation: This function logs the response received from the Firecrawl API, capturing the scraped website data along with metadata for further processing and storage.

Format Response & Create Record

Nodes: Prepare Record Data, Create Scraped Data Record, Create Scraped Data Record Log
Purpose: The scraped data is formatted into a structured record and stored in the system, including key metadata and the scraped website content.

Code Example: Prepare Record Data Function

// Prepare data for Create Object node and assign to msg.payload
let body = {
    title: msg.payload.response.title,
    description: msg.payload.response.description,
    content: msg.payload.response.content,
    url: msg.payload.url // Original URL that was scraped
};

// Assign prepared data to msg.payload for use in Create Object node
msg.payload = body;
return msg;

Explanation: This function formats the API response into a structured object, which includes the website title, description, content, and original URL. This data is then ready to be stored as a record.

Error Handling

Nodes: catch, Error Catch Log, contextual-error
Purpose: Catches any errors that occur during the flow and logs them for review, ensuring that issues can be identified and resolved.

Flow End

Nodes: contextual-end
Purpose: The flow completes its process, either after successfully creating records or after logging any errors that occurred.

Summary of Flow:

Flow Start: Initiate the flow with a test URL.
Data Preparation: Prepare the URL for interaction with the Firecrawl API.
API Interaction: Send the URL to the Firecrawl API and log the response.
Record Creation: Format and store the scraped website content as a record for analysis.
Error Handling: Capture and log any errors that occur during the process.
Flow End: Conclude the flow after records are created or errors are logged.

PreviousPerplexity AI Search and Response NextGroq Prompt and Response

Last updated 10 months ago

Was this helpful?