Actions2
- Speech Actions
- Voice Actions
Overview
This node integrates with the Google Cloud Text-to-Speech API to convert text into spoken audio. It is useful in scenarios where you want to generate speech from text dynamically, such as creating voice responses for chatbots, generating audio content for accessibility, or producing audio notifications.
For example, you can input a message like "Hello, this is a test message" and receive an audio file of that text spoken aloud in a chosen language and voice style.
Properties
Name | Meaning |
---|---|
Text | The text string to be converted into speech. |
Language Code | The language and locale of the speech output. Options include German (Germany), English (US/UK), French, Spanish, Italian, Portuguese (Brazil), Japanese, Chinese (Mandarin), Korean. |
Voice Name | Specific voice identifier to use for synthesis. Optional; if empty, the system selects automatically. |
Voice Gender | Gender of the voice: Neutral, Male, or Female. |
Audio Encoding | Format of the output audio: MP3, WAV (Linear16), or OGG Opus. |
Speaking Rate | Speed at which the text is spoken, ranging from 0.25 (slow) to 4.0 (fast). |
Pitch | Pitch adjustment of the voice, from -20.0 (lower) to 20.0 (higher). |
Output Format | How the audio data is returned: as a Base64 encoded string or as binary data. |
Output
The node outputs JSON data containing metadata about the synthesized speech and the audio content itself:
When Base64 String output format is selected:
audioData
: The audio content encoded as a base64 string.mimeType
: MIME type corresponding to the audio encoding (e.g.,audio/mpeg
for MP3).size
: Size in bytes of the audio content.text
: The original input text.languageCode
,voiceGender
,audioEncoding
,speakingRate
,pitch
: Echoed parameters used for synthesis.
When Binary Data output format is selected:
- The audio content is provided as binary data attached to the output item under the
binary
property with appropriate filename and MIME type. - The JSON part includes metadata similar to above but without the base64 audio string.
- The audio content is provided as binary data attached to the output item under the
Dependencies
- Requires valid credentials for Google Cloud Text-to-Speech API access. This can be configured either via:
- An OAuth2 API credential providing an access token.
- A service account key JSON for authentication.
- The node uses the official
@google-cloud/text-to-speech
library internally. - Proper n8n credential setup is necessary to authenticate requests.
Troubleshooting
- No valid credentials found error: Occurs if neither OAuth2 nor service account credentials are configured correctly. Ensure one of these credentials is set up in n8n before using the node.
- No audio content received from API: Indicates the API did not return any audio data. Check input text validity and API quota limits.
- If the node fails but "Continue on Fail" is enabled, errors will be returned in the output JSON under an
error
field. - Common issues may include invalid language codes, unsupported voice names, or exceeding API usage quotas.