Actions7
- Content Actions
- Navigation Actions
Overview
This node, named "CloudBrowser," allows users to interact with websites through a cloud-based browser instance. Specifically for the Content resource and the Get HTML From Website operation, it navigates to a specified URL using a remote browser service and retrieves the full HTML content of the webpage.
Common scenarios where this node is beneficial include:
- Extracting raw HTML content from dynamic websites that require JavaScript execution.
- Scraping data from pages that load content asynchronously.
- Automating web navigation tasks where direct HTTP requests are insufficient due to client-side rendering.
Practical example:
- You want to scrape product details from an e-commerce site that loads content dynamically via JavaScript. Using this node, you can navigate to the product page URL and retrieve the fully rendered HTML for further parsing.
Properties
Name | Meaning |
---|---|
URL to Navigate | The URL of the website to open and retrieve HTML content from. This is required. |
Navigation Options | Options controlling how navigation behaves: - Wait Until: When to consider navigation finished (options: Load, Domcontentloaded, Networkidle0, Networkidle2). - Timeout (Ms): Maximum wait time in milliseconds. |
Browser Configuration | Settings for the browser instance: - Browser Type: Choose between Chrome, Chromium, or ChromeHeadlessShell. - Headless Mode: Whether to run browser without UI. - Stealth Mode: Enable stealth to avoid detection. - Keep Open (Seconds): How long to keep browser open before auto-closing (0 means never). - Label: Name for the browser instance. - Save Session: Save session for reuse. - Recover Session: Recover previously saved session. |
Custom Arguments | Additional command-line arguments to pass to the browser on startup. |
Ignored Default Arguments | List of default browser arguments to ignore when launching the browser. |
Proxy Configuration | Proxy server settings: - Host - Port - Username - Password |
Output
The output JSON object contains the following fields:
title
: The title of the loaded webpage.url
: The final URL after navigation (may differ if redirected).content
: The full HTML content of the webpage as a string.
This output provides the complete HTML source of the page after any client-side scripts have executed, enabling downstream processing or scraping.
No binary data is output for this operation.
Dependencies
- Requires access to a cloud-based browser service API, authenticated via an API token credential.
- Uses Puppeteer library internally to connect to the remote browser instance.
- The node requires configuration of the API token credential for the cloud browser service.
- Optional proxy configuration can be used to route browser traffic.
Troubleshooting
- No WebSocket address received from the browser service: Indicates failure to open a browser instance remotely. Check API token validity and service availability.
- Navigation timeout: If the page takes too long to load, increase the "Timeout (Ms)" value in Navigation Options.
- Empty or incomplete HTML content: May occur if the page requires additional interaction or longer wait times; adjust "Wait Until" option accordingly.
- Authentication or permission errors: Ensure the API token credential has proper permissions and is correctly configured.
- Proxy connection issues: Verify proxy host, port, and credentials if using proxy settings.
Links and References
- Puppeteer Documentation
- Cloud Browser Service API (referenced endpoint)
- n8n documentation on Using Credentials