Lark AI

Lark AI Management

Actions5

Image Recognition Actions
- Basic Image Recognition Ocr
Speech Recognition Actions
- Streaming Speech Recognition Asr
- Audio File Speech Recognition Asr
Text Actions
- Translate With Machine Translation
- Text Language Recognition

Overview

The node implements a streaming speech recognition feature using Lark AI's speech-to-text capabilities. It allows users to send audio data in chunks (streaming) and receive real-time transcription results. This is particularly useful for applications requiring live transcription such as voice assistants, live captioning, or interactive voice response systems.

Typical use cases include:

Transcribing live meetings or calls.
Real-time subtitles for videos or broadcasts.
Voice command processing in applications.

Properties

Name	Meaning
Authentication	Method of authenticating API requests. Options: Tenant Token, OAuth2.
Config	Configuration parameters for the streaming request, including: - `action`: Numeric action code. - `engine_type`: Type of speech recognition engine. - `format`: Audio format. - `sequence_id`: Sequence number of the current audio chunk. - `stream_id`: Identifier for the audio stream.
Speech	The actual audio data chunk to be sent for recognition, represented as a string (likely base64 encoded).
Custom Body	Optionally provide a fully custom JSON body for the request instead of using the structured Config and Speech fields.
Options	Additional options: - `Use Custom Body`: Boolean flag to enable sending a custom request body.

Output

The node outputs JSON data containing the transcription results from the streaming speech recognition service. The exact structure depends on the API response but typically includes recognized text segments, confidence scores, and possibly timing information.

If binary data is involved (e.g., audio streams), it would represent the audio content being processed, but this node primarily handles JSON transcription results.

Dependencies

Requires an active connection to Lark Suite's Open API endpoint (https://open.larksuite.com/open-apis).
Needs authentication via either a Tenant Token or OAuth2 credentials configured in n8n.
The node sends HTTP requests with JSON payloads formatted according to Lark AI's speech recognition API specifications.

Troubleshooting

Authentication errors: Ensure that the selected authentication method is correctly configured with valid credentials. Expired or invalid tokens will cause failures.
Invalid config parameters: Incorrect values for engine_type, format, or missing required fields like stream_id may result in API errors. Verify these against Lark AI documentation.
Sequence ID issues: The sequence_id should increment properly for each audio chunk; otherwise, the server might reject or misinterpret the stream.
Custom Body usage: When enabling "Use Custom Body," ensure the JSON structure matches the API requirements exactly to avoid malformed request errors.
Network issues: Connectivity problems to the Lark API endpoint will prevent successful transcription.

Links and References

Lark Suite Open API Documentation
Streaming Speech Recognition API Reference (example link, adjust based on actual docs)