Google Cloud Text-to-Speech

Consume Google Cloud Text-to-Speech API

Actions2

Speech Actions
- Synthesize
Voice Actions
- List

Overview

This node integrates with the Google Cloud Text-to-Speech API to convert text into spoken audio. It is useful in scenarios where you want to generate speech from text dynamically, such as creating voice responses for chatbots, generating audio content for accessibility, or producing audio notifications.

For example, you can input a message like "Hello, this is a test message" and receive an audio file of that text spoken aloud in a chosen language and voice style.

Properties

Name	Meaning
Text	The text string to be converted into speech.
Language Code	The language and locale of the speech output. Options include German (Germany), English (US/UK), French, Spanish, Italian, Portuguese (Brazil), Japanese, Chinese (Mandarin), Korean.
Voice Name	Specific voice identifier to use for synthesis. Optional; if empty, the system selects automatically.
Voice Gender	Gender of the voice: Neutral, Male, or Female.
Audio Encoding	Format of the output audio: MP3, WAV (Linear16), or OGG Opus.
Speaking Rate	Speed at which the text is spoken, ranging from 0.25 (slow) to 4.0 (fast).
Pitch	Pitch adjustment of the voice, from -20.0 (lower) to 20.0 (higher).
Output Format	How the audio data is returned: as a Base64 encoded string or as binary data.

Output

The node outputs JSON data containing metadata about the synthesized speech and the audio content itself:

When Base64 String output format is selected:
- audioData: The audio content encoded as a base64 string.
- mimeType: MIME type corresponding to the audio encoding (e.g., audio/mpeg for MP3).
- size: Size in bytes of the audio content.
- text: The original input text.
- languageCode, voiceGender, audioEncoding, speakingRate, pitch: Echoed parameters used for synthesis.
When Binary Data output format is selected:
- The audio content is provided as binary data attached to the output item under the binary property with appropriate filename and MIME type.
- The JSON part includes metadata similar to above but without the base64 audio string.

Dependencies

Requires valid credentials for Google Cloud Text-to-Speech API access. This can be configured either via:
- An OAuth2 API credential providing an access token.
- A service account key JSON for authentication.
The node uses the official @google-cloud/text-to-speech library internally.
Proper n8n credential setup is necessary to authenticate requests.

Troubleshooting

No valid credentials found error: Occurs if neither OAuth2 nor service account credentials are configured correctly. Ensure one of these credentials is set up in n8n before using the node.
No audio content received from API: Indicates the API did not return any audio data. Check input text validity and API quota limits.
If the node fails but "Continue on Fail" is enabled, errors will be returned in the output JSON under an error field.
Common issues may include invalid language codes, unsupported voice names, or exceeding API usage quotas.