Browserless icon

Browserless

Interact with Browserless API

Actions5

Overview

The node interacts with the Browserless API to programmatically control a headless browser for various web automation tasks. Specifically, the Content - Get operation allows users to visit a target URL and retrieve the page content as rendered by the browser.

This is useful in scenarios where you need to scrape or extract data from websites that rely heavily on JavaScript for rendering content, or when you want to simulate a real browser environment (e.g., handling cookies, authentication, or dynamic content loading).

Practical examples:

  • Extracting product details from e-commerce sites that load content dynamically.
  • Capturing the fully rendered HTML of a webpage after scripts have executed.
  • Automating login flows and scraping protected pages using HTTP basic authentication or cookies.
  • Testing how a page renders under different viewport sizes or user agents.

Properties

Name Meaning
URL The target webpage URL to visit and retrieve content from.
Browser Options Collection of options controlling browser behavior:
- BlockAds: Block advertisement network traffic.
- Headless: Run browser without UI.
- Ignore HTTPS Errors: Ignore SSL errors.
- Stealth: Avoid bot detection.
- User Data Dir: Path to reuse session data.
- TrackingId: Arbitrary ID for tracking sessions.
- Keep Alive: Time in ms to keep browser running after session.
- Flags: Command-line flags passed to Chrome on startup.
Additional Options Various advanced settings:
- Http Headers: Extra headers sent with requests.
- Inject Script: Add custom JavaScript to the page.
- Inject Style: Add custom CSS styles.
- Authentication: HTTP Basic Auth credentials.
- Cookies: Set HTTP cookies.
- Goto Options: Control navigation timeout and event to wait for.
- Reject Request Pattern: Patterns of requests to block.
- Reject Resource Types: Types of resources to block (e.g., images, scripts).
- Request Interceptors: Modify intercepted requests.
- User Agent: Custom user agent string.
- Wait For: Selector or condition to wait for before proceeding.
- Set JavaScript Enabled: Enable or disable JS execution.
- Viewport: Screen size and device characteristics.

Output

  • The output JSON contains the full content retrieved from the specified URL after rendering by the browser.
  • The exact structure depends on the Browserless API response but generally includes the HTML content or extracted data.
  • No binary data is output for this operation (unlike screenshot or PDF operations).
  • The output is wrapped in an array of items, each corresponding to one input item processed.

Dependencies

  • Requires access to the Browserless API service.
  • An API key credential for authenticating with the Browserless API must be configured in n8n.
  • Network access to the target URLs.
  • Optional: If using features like HTTP Basic Authentication, cookies, or custom headers, these must be properly set in the node properties.

Troubleshooting

  • Common issues:

    • Invalid or unreachable URL: Ensure the URL is correct and accessible.
    • Authentication failures: Verify username/password if HTTP Basic Auth is used.
    • SSL errors: Use "Ignore HTTPS Errors" option if the site has invalid certificates.
    • Bot detection blocking: Enable "Stealth" mode to reduce detection.
    • Timeout errors: Adjust "Goto Options" timeout or "Wait For" selector to allow enough time for page load.
    • Resource blocking misconfiguration: Overly aggressive reject patterns or resource types may prevent page from loading correctly.
  • Error messages:

    • Network errors or timeouts usually indicate connectivity or slow-loading pages.
    • Authentication errors will mention unauthorized access; check credentials.
    • Parsing errors might occur if injected scripts/styles are malformed.

Links and References

Discussion