JigsawStack

Use JigsawStack API

Actions20

AI Scrape Actions
- AI Scrape
Analyze Sentiment Actions
- Analyze Sentiment
Convert to SQL Actions
- Convert to SQL
Generate Embedding Actions
- Generate Embedding
HTML to Any Actions
- HTML to Any
Image Generation Actions
- Image Generation
Make Prediction Actions
- Make Prediction
NSFW Detection Actions
- NSFW Detection
Object Detection Actions
- Object Detection
Process Image Actions
- Process Image
Profanity Detection Actions
- Profanity Detection
Search Web Actions
- Search Web
Spam Detection Actions
- Spam Detection
Speech to Text Actions
- Speech to Text
Spell Check Actions
- Spell Check
Summary Actions
- Summary
Text to Speech Actions
- Text to Speech
Translate Actions
- Translate Text
Translate Image Actions
- Translate Image
Web Suggestion Actions
- Web Suggestion

Overview

The AI Scrape operation of the JigsawStack node enables users to extract structured information from web pages or raw HTML content using AI-powered scraping. Instead of manually parsing HTML or writing complex selectors, this node allows specifying "element prompts" (like "titles", "prices", "points") that describe what data to extract. It supports scraping either directly from a URL or from provided HTML content.

This node is beneficial in scenarios such as:

Extracting product details, prices, or reviews from e-commerce sites.
Gathering headlines, summaries, or key points from news articles.
Collecting structured data from paginated listings by specifying page numbers.
Scraping content behind authentication or with custom headers and cookies.
Using advanced configurations for controlling network requests, blocking unwanted resources, or customizing page load behavior.

Practical example: You want to scrape all product titles and prices from an online store's category page. You provide the URL, specify element prompts ["titles", "prices"], optionally set a root CSS selector to narrow scope, and get back structured JSON with those elements extracted.

Properties

Name	Meaning
Scrape Source	Choose the source of the scrape: either `"URL"` to scrape from a live webpage or `"HTML"` to scrape from provided raw HTML content.
URL	The URL of the page to scrape. Required if Scrape Source is `"url"`.
HTML	Raw HTML content to scrape. Required if Scrape Source is `"html"`.
Element Prompts	Array of strings describing the elements to extract from the page, e.g., `"titles"`, `"points"`, `"prices"`. These guide the AI on what data to look for.
Root Element Selector	CSS selector string to limit scraping scope to a specific element and its children, e.g., `"main"`.
Page Position	Number indicating the current page number for pagination purposes (minimum value 1).
Advance Config	JSON object for advanced scraper options including console logging, network settings, cookies, HTTP headers, request blocking patterns, and page navigation options.
Http Headers	JSON object specifying custom HTTP headers to send with the request as key-value pairs.
Reject Request Pattern	Array of string patterns to intercept and block certain resource requests during scraping, e.g., `["jpg", "png"]` to block images.
Goto Options	JSON object defining custom page-load behavior settings, such as waitUntil conditions.
Wait For	JSON object defining wait conditions before scraping, supporting modes like timeout (milliseconds), waiting for a selector, or executing a function.
Cookies	JSON object specifying cookies to set for the page request.
Size Preset	Predefined screen size presets like `"HD"`, `"FHD"`, `"4K UHD"` to simulate different viewport sizes.
Is Mobile	Boolean flag to emulate a mobile device viewport.
Scale	Device scale factor for the viewport (minimum 1).
Width	Viewport width in pixels.
Height	Viewport height in pixels.
Force Rotate Proxy	Boolean flag to force proxy rotation for each request, which may incur additional costs.
BYO Proxy	JSON object for bring-your-own-proxy configuration including server URL and optional authentication credentials.

Output

The node outputs JSON data containing the scraped results based on the specified element prompts. The structure typically includes keys corresponding to each prompt with arrays or values representing the extracted content.

If binary data is involved (not explicitly indicated here), it would represent downloaded files or media related to the scrape, but this node primarily outputs structured JSON.

Dependencies

Requires an active API key credential for the JigsawStack API service.
Internet access to reach target URLs unless scraping raw HTML.
Optional proxy configuration if using BYO Proxy or forcing proxy rotation.
Properly configured HTTP headers, cookies, and advanced options may be necessary for scraping some protected or dynamic websites.

Troubleshooting

Missing or invalid API key: The node requires a valid API key credential; ensure it is correctly set up.
Invalid URL or unreachable site: Verify the URL is correct and accessible from your environment.
Conflicting inputs: Either URL or HTML must be provided, not both. Ensure only one source is set.
Incorrect element prompts: If no data is returned, check that the prompts accurately describe the desired elements.
Pagination issues: Make sure the page position is set correctly starting at 1.
Blocked resources: Overly aggressive reject request patterns might block essential scripts or styles causing incomplete scraping.
Proxy errors: If using proxies, verify proxy server details and credentials are correct.
Timeouts or slow loading: Adjust wait conditions or goto options to allow sufficient time for page load.

Links and References

JigsawStack API Documentation (for detailed API capabilities)
CSS Selectors Reference (to craft root element selectors)
n8n Documentation (general usage and credential setup)