Actions21
Overview
This node provides functionality for document and image parsing under the "Kim" resource with the "文档/图片解析" (Document/Image Parsing) operation. It is designed to analyze images either via URLs or binary data input, optionally using a specified model or intelligent agent. The node can simplify the output response and supports voice options for text-to-speech features.
Common scenarios where this node is beneficial include:
- Extracting information or understanding content from images by providing image URLs or binary image data.
- Using AI models or agents to interpret images in workflows that require automated image recognition or analysis.
- Integrating voice synthesis capabilities to read out results or responses based on parsed content.
Practical examples:
- Uploading product images to automatically extract descriptions or details.
- Feeding screenshots or photos to identify objects or text within them.
- Using the voice feature to generate spoken feedback from the parsed image content.
Properties
Name | Meaning |
---|---|
模型 (assistantId) | Model or intelligent agent used for processing. Can be any string; if unknown, any value can be entered. |
文本输入 (text) | Text input prompt related to the image, e.g., "What is in this picture?" Required when input type is file. |
输入类型 (inputType) | Type of input for the image: "图片链接" (URL) or "二进制文件" (Base64 binary file). |
URL链接 (imageUrls) | Comma-separated list of image URLs to analyze. Required if inputType is URL. |
输入数据字段名称 (binaryPropertyName) | Name of the binary property field containing the image data. Required if inputType is base64. |
简化输出 (simplify) | Boolean flag to simplify the response output. Defaults to true. |
联网搜索 (use_search) | Boolean flag to enable or disable online search. Recommended to keep off in document/image mode. |
语音列表 (voice) | Voice selection for text-to-speech output. Supports official voices and cloned voices. |
Output
The node outputs JSON data representing the parsed results of the provided images or documents. The structure depends on the model's response but can be simplified if the "简化输出" property is enabled.
If binary data is involved, it relates to the input image files rather than output. The node does not explicitly output binary data but processes it internally.
Dependencies
- Requires access to an external service or API capable of image/document parsing and possibly text-to-speech synthesis.
- Needs configuration of an API key or authentication token to interact with the parsing backend.
- The voice list is dynamically fetched via a method (
ttsListSearch
), indicating dependency on a voice synthesis service.
Troubleshooting
- Invalid URL format: When using image URLs, ensure they are valid and accessible. Invalid URLs will cause failures.
- Missing binary data: If input type is base64, the specified binary property name must exist in the input data; otherwise, processing will fail.
- Model or agent misconfiguration: Providing an incorrect or unsupported model identifier may lead to unexpected results or errors.
- Network issues: Enabling online search (
use_search
) might cause delays or failures if network connectivity is poor. - Voice selection errors: Selecting a voice not available in the list may cause TTS failures.
To resolve these:
- Validate all URLs before use.
- Confirm binary data fields are correctly named and populated.
- Use known working model identifiers or leave blank if unsure.
- Disable online search if unnecessary.
- Choose voices from the provided official or cloned lists.
Links and References
- No direct links are present in the source code. For more information, consult the documentation of the external image parsing and TTS services integrated with this node.