Overview
The Hyperbrowser node enables interaction with websites through various automated browsing and data extraction operations using the Hyperbrowser SDK. It supports tasks such as scraping webpage content, crawling multiple pages, extracting structured data via AI, and controlling browser actions through different AI agents.
Common scenarios where this node is beneficial include:
- Automatically gathering content or links from websites for research or monitoring.
- Extracting specific data points (e.g., product prices) from web pages using AI-driven queries.
- Automating complex browser interactions like form filling or navigation using AI agents.
- Crawling websites to collect data across multiple linked pages efficiently.
Practical examples:
- Scraping the main article content from a news website in Markdown format.
- Crawling an e-commerce site to collect product listings up to a maximum number of pages.
- Using an AI agent to automate login and data entry on a web application.
- Extracting structured data such as tables or lists from a page by providing an extraction schema.
Properties
Name | Meaning |
---|---|
Operation | The type of action to perform: Browser Use , Claude Computer Use , Crawl , Extract , OpenAI CUA , or Scrape . |
URL | The webpage URL to process (required for scrape , crawl , and extract operations). |
Extraction Query | A natural language query describing what data to extract from the webpage (for extract ). |
Extraction Schema | JSON schema defining the structure of data to extract (for extract ). |
Task | Instructions for browser automation tasks (for browserUse , claudeComputerUse , openaiCua ). |
Options | Collection of additional options depending on the operation: |
- Max Steps | Maximum number of steps the AI agent should take to complete the task (for browser use operations). |
- Maximum Pages | Maximum number of pages to crawl (for crawl ). |
- Only Main Content | Whether to return only the main content of the page (boolean, for crawl and scrape ). |
- Output Format | Output format for scraped or crawled content: HTML , Links , or Markdown (for scrape and crawl ). |
- Proxy Country | Country code for proxy server usage (if proxy is enabled). |
- Solve CAPTCHAs | Whether to solve CAPTCHAs during scraping (boolean). |
- Timeout (Ms) | Maximum timeout in milliseconds for navigating to a page. |
- Use Proxy | Whether to use a proxy server for scraping (boolean). |
- Use Vision | Whether to enable vision capabilities for Browser Use LLM (boolean, for browserUse ). |
Output
The node outputs an array of items, each containing a json
object with results depending on the selected operation:
scrape:
{ "url": "string", "content": "string", // scraped page content in the chosen format (HTML, Markdown, or links) "status": "string" // status of the scrape operation }
crawl:
{ "url": "string", "data": "object", // crawled data including multiple pages' content or links "status": "string" }
extract:
{ "url": "string", "extractedData": "object", // structured data extracted according to the query and schema "status": "string" }
browserUse, claudeComputerUse, openaiCua:
{ "actions": "object", // final result of the AI agent's performed actions "status": "string" }
If an error occurs and the node is configured to continue on failure, the output item will contain an error
message and the related task
.
The node does not explicitly output binary data.
Dependencies
- Requires an API key credential for the Hyperbrowser service.
- Uses the
@hyperbrowser/sdk
package to interact with the Hyperbrowser API. - Network access to target URLs and optionally proxy servers if enabled.
- Optional proxy configuration and CAPTCHA solving support.
- No other external dependencies are required.
Troubleshooting
Common issues:
- Invalid or missing API key credential will cause authentication failures.
- Incorrect URL formats or unreachable URLs may lead to timeouts or errors.
- Providing malformed JSON in the extraction schema will cause parsing errors.
- Exceeding maximum allowed steps or pages may result in incomplete operations.
- Proxy misconfiguration can cause connection failures or blocked requests.
- CAPTCHA challenges may block scraping unless the option to solve CAPTCHAs is enabled.
Error messages:
"Operation \"<operation>\" is not supported"
: Occurs if an unsupported operation value is provided; ensure the operation is one of the supported options.- Timeout or network errors: Check URL validity, network connectivity, and proxy settings.
- JSON parse errors for extraction schema: Validate the JSON syntax before input.
To resolve errors, verify credentials, input parameters, and network configurations. Enable "Continue On Fail" to handle errors gracefully within workflows.
Links and References
- Hyperbrowser SDK Documentation (for API details)
- n8n Documentation (general node usage and credential setup)
- Web Scraping Best Practices