Puppeteer icon

Puppeteer

Request a webpage using Puppeteer

Overview

This node provides an interface to control and automate browser actions using Puppeteer, a popular headless browser automation library. It allows users to create or connect to browser instances, open new pages, navigate URLs, interact with page elements (click, type, hover), handle cookies, take screenshots, evaluate JavaScript on pages, solve captchas, and more.

Common scenarios where this node is beneficial include:

  • Web scraping and data extraction from dynamic websites.
  • Automated testing of web applications.
  • Filling out forms and submitting data automatically.
  • Captcha solving integration for automated workflows.
  • Taking screenshots of web pages for monitoring or reporting.
  • Navigating complex multi-page workflows programmatically.

Practical example: A user can launch a new browser page, navigate to a login page, fill in credentials, click the login button, wait for navigation, and then extract some protected content — all within a single workflow.

Properties

Name Meaning
Instance Identifier A unique identifier for the browser and page instance, used to reference the same instance across multiple nodes.
Use 2Captcha solver The API key for the 2Captcha service to solve captchas automatically (only available when creating a new page).
Browser Options Collection of options to configure the browser instance when creating a new page, including:
- Browser WSEndpoint: Connect to an existing browser via websocket endpoint.
- Emulate Device: Choose a device profile.
- Launch Arguments: Additional command line arguments.
- Handle Browser Close/Disconnect/Target: Flags to handle browser lifecycle events.
- Stealth mode: Enable techniques to avoid detection as a headless browser.
- Proxy Server: Configure proxy settings.
- Slow Mo: Delay operations by specified milliseconds.
- Headless mode: Run browser in headless or full UI mode.
Close target Whether to close the current target (page) and switch to it (used in page target handling operation).
Timeout Timeout duration for certain page operations like waiting for targets.
URL The URL to navigate to when performing a page goto operation.
Filename Filename to save screenshots or files chosen/uploaded on the page.
IFrame Selector Optional CSS selector to specify an iframe within the page to interact with.
Selector CSS selector for targeting elements on the page for actions like click, type, hover, wait for selector, or file choosing.
Evaluate Selector Optional selector to narrow down the scope for JavaScript evaluation on the page.
Text Text to type into an element during the page type operation.
Evaluation Loop Until Timeout Timeout duration for repeated evaluation loops during page evaluation.
Cookies JSON object specifying cookies to set or delete on the page.
Options JSON object with additional options for various page operations such as navigation, clicking, typing, waiting, etc.
Javascript Function JavaScript code to run on the page during the page evaluate operation.
Evaluate Args JSON object containing variables to pass as arguments to the JavaScript evaluation function.

Output

The node outputs JSON data representing the result of the performed browser or page operation. The structure varies depending on the operation:

  • For navigation (pageGoto), it returns HTTP headers and status code.
  • For screenshot operations, it returns confirmation or error messages.
  • For element interactions (click, type, hover), it returns success or error details.
  • For page evaluation, it returns the result of the executed JavaScript.
  • For cookie operations, it returns the current cookies or confirmation of changes.
  • For captcha solving, it returns the solved captcha response.
  • For browser and page lifecycle operations (open, close, reload, go back/forward), it returns status information.

If an error occurs, the output JSON contains an error field describing the issue.

Binary data output is not explicitly detailed but screenshots are saved to a filename specified by the user; the node likely handles binary data internally for these cases.

Dependencies

  • Requires Puppeteer library for browser automation.
  • Optionally integrates with the 2Captcha service for captcha solving, requiring a valid 2Captcha API key.
  • May require access to a browser websocket endpoint if connecting to an existing browser instance.
  • No explicit environment variables are mentioned, but network access and permissions to launch browsers are necessary.

Troubleshooting

  • Common issues:

    • Failure to connect to a browser instance if the websocket endpoint is incorrect or unavailable.
    • Navigation failures due to invalid URLs or network issues.
    • Selector not found errors when the specified CSS selector does not match any element on the page.
    • Timeout errors when waiting for selectors or navigation takes too long.
    • Captcha solving failures if the 2Captcha API key is invalid or quota exceeded.
    • Errors when saving screenshots if the filename path is invalid or inaccessible.
  • Error messages:

    • Errors returned from Puppeteer core functions are propagated as error fields in the output JSON or thrown exceptions.
    • HTTP status codes other than 200 during navigation cause errors unless "continue on fail" is enabled.
    • Users should verify input parameters, especially selectors, URLs, and filenames.
    • Ensure that the browser instance identified by the instance identifier is active before performing page operations.

Links and References

Discussion