Actions5
Overview
The node interacts with the Browserless API to perform various browser automation tasks such as scraping JSON data from web pages, capturing screenshots, generating PDFs, executing custom JavaScript functions in a browser context, and retrieving page content. It is particularly useful for workflows that require automated web data extraction, visual captures of web pages, or running scripts within a browser environment without managing a local browser instance.
For the JSON Scrape operation specifically, the node visits a target URL and extracts structured data by selecting specified elements on the page using CSS selectors. This is beneficial when you want to programmatically gather data from websites that do not provide APIs or when you need to scrape multiple elements efficiently.
Practical examples:
- Extracting product details (price, name, availability) from an e-commerce site.
- Gathering news headlines and summaries from a news portal.
- Collecting user reviews or comments from a social media page.
Properties
Name | Meaning |
---|---|
URL | The target webpage URL to visit and scrape data from. |
Elements | A collection of elements to scrape, each defined by: - Selector: CSS selector string to identify the element(s) on the page. - Timeout: Time in milliseconds to wait for the element before scraping it. |
Flattened Output | Boolean flag indicating whether to flatten the scraped JSON output into a simpler structure (true ) or keep the nested structure (false ). |
Browser Options | Various options to configure the browser session: - BlockAds: Block advertisement network traffic. - Headless: Run browser without UI. - Ignore HTTPS Errors. - Stealth mode to avoid bot detection. - User Data Dir: Path for session persistence. - TrackingId: Arbitrary ID for tracking. - Keep Alive: Time to keep browser running after session. - Flags: Command-line flags passed to Chrome. |
Additional Options | Extra configurations including: - Http Headers: Custom headers sent with requests. - Inject Script: Add custom JavaScript via script tags. - Inject Style: Add custom CSS styles. - Authentication: HTTP basic auth credentials. - Cookies: HTTP cookies to set. - Goto Options: Navigation timeout and event to wait for. - Reject Request Pattern/Resource Types: Block certain requests. - Request Interceptors: Modify outgoing requests. - User Agent: Custom user agent string. - Wait For: Selector or event to wait for before scraping. - Set JavaScript Enabled: Enable or disable JS execution. - Viewport: Screen size and device emulation settings. |
Output
The output is a JSON array where each item corresponds to one input item processed. For the JSON Scrape operation:
- The
json
field contains the scraped data extracted from the specified elements on the page. - If "Flattened Output" is enabled, the nested scraped results are transformed into a flat structure for easier consumption.
- No binary data is produced for this operation.
Example output snippet (simplified):
[
{
"element1": "Extracted text or attribute",
"element2": "Another extracted value"
}
]
Dependencies
- Requires access to the Browserless API service, which provides headless Chrome instances for browser automation.
- Needs an API key credential configured in n8n to authenticate requests to the Browserless API.
- Network connectivity to the target URLs and Browserless API endpoint.
Troubleshooting
Common issues:
- Invalid or unreachable URL: Ensure the URL is correct and accessible.
- Incorrect CSS selectors: Verify selectors match elements on the page; otherwise, no data will be scraped.
- Timeout errors: Increase element timeout or navigation timeout if pages load slowly.
- Authentication failures: Provide valid HTTP basic authentication if required by the target site.
- API quota limits or invalid API key: Check Browserless API usage and credentials.
Error messages:
"Request failed"
or"Timeout exceeded"
: Usually means the page did not load or element was not found in time. Adjust timeouts or check network."Authentication required"
: Missing or incorrect credentials for the target website."Invalid API key"
: Problem with Browserless API authentication; verify API key setup in n8n.