ScrapegraphAI icon

ScrapegraphAI

Consume ScrapegraphAI API

Actions3

Overview

The node integrates with the ScrapegraphAI API to extract structured data from websites based on user instructions. Specifically, for the Smart Scraper - Scrape operation, it sends a target website URL along with a user-defined prompt describing what data to extract. The API then returns the extracted information in JSON format.

This node is useful when you want to automate data extraction from web pages without manually parsing HTML or writing custom scrapers. For example, you can extract product details, contact information, or any specific content by providing clear instructions in the user prompt.

Properties

Name Meaning
Website URL The URL of the website from which data should be scraped.
User Prompt Instructions describing what specific data to extract from the given website URL.

Output

The output is a JSON object containing the response from the ScrapegraphAI Smart Scraper API endpoint. This typically includes the extracted data as structured JSON according to the user prompt instructions.

No binary data output is produced by this operation.

Example output structure (simplified):

{
  "extractedData": {
    // structured data fields as per user prompt
  }
}

Dependencies

  • Requires an active API key credential for the ScrapegraphAI service.
  • The node makes HTTP POST requests to https://api.scrapegraphai.com/v1/smartscraper.
  • Proper network connectivity and valid API credentials are necessary.

Troubleshooting

  • Common issues:

    • Invalid or missing API key will cause authentication errors.
    • Incorrect or unreachable website URLs may result in request failures or empty responses.
    • Ambiguous or unclear user prompts might lead to incomplete or irrelevant data extraction.
  • Error messages:

    • Errors returned from the API are caught and can be output as error messages if "Continue On Fail" is enabled.
    • Typical errors include HTTP 401 Unauthorized (invalid API key), HTTP 400 Bad Request (malformed input), or network timeouts.
  • Resolutions:

    • Verify that the API key credential is correctly configured and active.
    • Ensure the website URL is correct and accessible.
    • Refine the user prompt to clearly specify the desired data.

Links and References

Discussion