CloudBrowser icon

CloudBrowser

Interact with websites using a cloud-based browser instance

Overview

The node "CloudBrowser" enables interaction with websites through a cloud-based browser instance. Specifically, the Content - Get PDF From Website operation navigates to a specified URL and generates a PDF snapshot of the webpage. This is useful for automating the capture of web pages as PDFs for archiving, reporting, or sharing purposes.

Common scenarios include:

  • Automatically generating PDFs of invoices, reports, or articles from web pages.
  • Archiving web content snapshots for compliance or record-keeping.
  • Creating printable versions of dynamic web pages without manual intervention.

Example: You want to generate a PDF version of a product page on an e-commerce site every day to track changes in pricing or layout. This node can navigate to the URL and produce a PDF file automatically.

Properties

Name Meaning
URL to Navigate The URL of the website to open and convert into a PDF.
Navigation Options Options controlling how navigation behaves:
- Wait Until: When to consider navigation finished (load, domcontentloaded, networkidle0, networkidle2).
- Timeout (Ms): Max time to wait for navigation.
Browser Configuration Settings for the browser instance:
- Browser Type: Chrome, Chromium, or ChromeHeadlessShell.
- Headless Mode: Run browser without UI.
- Stealth Mode: Enable stealth to avoid detection.
- Keep Open (Seconds): How long to keep browser open.
- Label: Name for the browser instance.
- Save Session: Save session for reuse.
- Recover Session: Recover saved session.
Custom Arguments Additional command-line arguments to pass to the browser.
Ignored Default Arguments Default browser arguments to ignore when launching.
Proxy Configuration Proxy server settings:
- Host, Port, Username, Password.
PDF Options PDF generation options:
- Format: Paper size (A0, A1, A2, A3, A4, A5, A6, Legal, Letter, Tabloid).
- Landscape: Generate PDF in landscape orientation.
- Print Background: Include background graphics.
- Scale: Scale factor for rendering (0.1 to 2).
- Margin: Margins in millimeters (top, right, bottom, left).
- Page Ranges: Specific pages to print (e.g., "1-5,8,11-13").

Output

The output JSON object includes:

  • url: The final URL of the loaded page.
  • title: The page title.
  • pdf: A base64-encoded string representing the generated PDF file, prefixed with data:application/pdf;base64,.
  • pdfBinary: The raw binary data buffer of the PDF.
  • filename: Suggested filename for the PDF, e.g., webpage_<timestamp>.pdf.
  • fileExtension: Always "pdf".
  • mimeType: Always "application/pdf".

This output allows downstream nodes to save the PDF file, send it via email, or upload it to storage.

Dependencies

  • Requires access to the external CloudBrowser API service at https://production.cloudbrowser.ai/api/v1/Browser/Open to open and control browser instances.
  • Needs an API token credential for authentication with the CloudBrowser service.
  • Uses Puppeteer library internally to connect to the browser WebSocket endpoint and perform navigation and PDF generation.
  • No local browser installation is required; all browser operations are performed remotely via the cloud service.

Troubleshooting

  • No WebSocket address received from the browser service: Indicates failure to open a browser instance. Check API token validity, network connectivity, and CloudBrowser service status.
  • Timeout errors during navigation: If the page takes too long to load, increase the timeout value in Navigation Options.
  • PDF generation issues: Ensure the URL is accessible and returns valid HTML content. Some sites may block automated browsers or require authentication.
  • Proxy configuration problems: Verify proxy host, port, and credentials if used.
  • Session recovery failures: If recovering a saved session fails, try disabling session recovery or saving a new session.

Links and References

Discussion