WaterCrawl icon

WaterCrawl

Consume WaterCrawl API

Overview

This node integrates with the WaterCrawl API to manage web crawling and scraping tasks. Specifically, the "Get" operation under the "Crawl" resource retrieves details about a specific crawl request by its ID. This is useful when you want to check the status or metadata of an existing crawl job initiated earlier.

Common scenarios include:

  • Monitoring the progress or result status of a crawl request.
  • Fetching crawl metadata for logging or conditional workflow branching.
  • Integrating crawl data retrieval into automated pipelines that depend on crawl completion.

Example: After creating a crawl request to scrape multiple pages from a website, use this node operation to fetch the current state or details of that crawl by providing its unique Crawl ID.

Properties

Name Meaning
Crawl ID The unique identifier of the crawl request to retrieve. This is required to specify which crawl's details to fetch.

Output

The output JSON contains the detailed information of the specified crawl request as returned by the WaterCrawl API. This typically includes fields such as crawl status, start time, end time, number of pages crawled, errors if any, and other metadata related to the crawl job.

The output is structured as an array of JSON objects, each representing one item (in this case, one crawl request detail). The node wraps the API response in n8n’s execution metadata format for downstream processing.

No binary data is produced by this operation.

Dependencies

  • Requires an active WaterCrawl API key credential configured in n8n.
  • Needs network access to the WaterCrawl API endpoint (default: https://app.watercrawl.dev).
  • The node depends on the bundled WaterCrawlAPIClient class to communicate with the API.

Troubleshooting

  • Invalid Crawl ID: If the provided Crawl ID does not exist or is malformed, the API may return an error. Verify the Crawl ID is correct and corresponds to an existing crawl request.
  • Authentication Errors: Ensure the API key credential is valid and has sufficient permissions.
  • Network Issues: Connectivity problems to the WaterCrawl API endpoint will cause failures; check network settings and proxy configurations if applicable.
  • API Rate Limits: Excessive requests might be throttled by the API; consider adding delays or handling rate limit responses gracefully.
  • Error Handling: If the node is set to continue on failure, errors will be returned as JSON with an error field containing the message.

Links and References

Discussion