Actions6
Overview
The node integrates with the WaterCrawl API to manage web crawling and scraping tasks. It allows users to create crawl requests, retrieve crawl request details, list multiple crawl requests, get crawl results, scrape individual URLs directly, and stop ongoing crawl requests.
This node is beneficial for scenarios such as:
- Automating data extraction from websites by crawling multiple pages.
- Scraping specific web pages on demand.
- Monitoring changes or updates on websites by scheduling repeated crawls.
- Collecting structured data for analysis or integration into other workflows.
For example, a user can create a crawl request to extract product information across an e-commerce site, then retrieve and process the crawl results within n8n.
Properties
Name | Meaning |
---|---|
Limit | Maximum number of crawl requests to return when listing many crawl requests. |
Page | Page number to retrieve when paginating through lists of crawl requests or results. |
These properties apply specifically to the "Get Crawl Requests" operation under the "Crawl" resource.
Output
The node outputs JSON data representing the results of the selected operation:
- For Get Crawl Requests, it returns an array of crawl request objects, each containing metadata about individual crawl jobs.
- Pagination metadata is also returned in a separate output, including current page, next page, previous page, total pages, and total results count.
- For other operations like creating a crawl, getting a single crawl, or scraping a URL, the output contains detailed information about the crawl or scrape result.
- When retrieving crawl results, if downloading is enabled, the node fetches and includes the actual scraped data; otherwise, it provides references/URLs to the results.
- The node does not output binary data.
Dependencies
- Requires an API key credential for authenticating with the WaterCrawl API.
- Uses the WaterCrawl API base URL, defaulting to
https://app.watercrawl.dev
if not specified in credentials. - No additional external dependencies beyond the WaterCrawl API.
Troubleshooting
- Invalid JSON Errors: The node expects some inputs (like plugin options and extra headers) to be valid JSON strings. Invalid JSON will cause errors indicating which field is malformed. To fix, ensure these fields contain properly formatted JSON.
- API Authentication Failures: If the API key is missing or invalid, requests will fail. Verify that the API key credential is correctly configured.
- Pagination Issues: When requesting pages beyond available data, the node may return empty results or null pagination fields. Adjust the page number accordingly.
- Network or API Errors: General network issues or API downtime will cause errors. Check connectivity and WaterCrawl service status.
Links and References
- WaterCrawl API Documentation (assumed official docs URL)
- n8n documentation on Creating Custom Nodes