ElevenLabs icon

ElevenLabs

WIP

Overview

The ElevenLabs node provides text-to-speech (TTS) functionality under the Speech resource with the Text to Speech operation. It converts input text into spoken audio using selectable voice models and various configurable parameters to control voice style, quality, and output format.

This node is useful for automating voice generation in applications such as:

  • Creating voiceovers for videos or presentations.
  • Generating audio content for accessibility purposes.
  • Producing dynamic spoken responses in chatbots or virtual assistants.
  • Experimenting with different voice styles and effects for creative projects.

For example, you can input a motivational quote and generate an MP3 audio file spoken by a chosen voice model, adjusting parameters like stability and similarity boost to fine-tune the voice characteristics.

Properties

Name Meaning
Text The text string that will be converted into speech.
Voice ID The identifier of the voice to use for speech synthesis. Can be selected from a searchable list of available voices or entered manually by ID.
Additional Fields A collection of optional settings to customize the speech output:
- Binary Name Custom name for the binary output data. Defaults to "data".
- File Name Custom filename for the generated audio file. Defaults to "voice".
- Streaming Latency Controls latency optimization level for streaming audio. Values range from 0 (no optimization) to 4 (max optimization with text normalization off). Higher values reduce latency at some cost to quality.
- Output Format Audio output format options including various MP3 and PCM configurations, and μ-law encoding. Default is MP3 44.1kHz 128kbps.
- Language Code ISO 639-1 language code to enforce a specific language model during synthesis. Only effective with certain advanced models.
- Model Name or ID Identifier of the voice model to use. Can be selected from a list or specified by ID.
- Stability Numeric value (0 to 1) defining how stable the voice sounds; affects voice consistency.
- Similarity Boost Numeric value (0 to 1) controlling how closely the voice matches the original speaker's characteristics.
- Style Numeric value (0 to 1) to exaggerate the voice style for more expressive output.
- Speaker Boost Boolean to activate additional speaker enhancement features.
- Seed Numeric seed for deterministic TTS output. Using the same seed with identical text produces the same audio result. Range: 0 to 4294967295.
- Enable Logging Boolean to enable or disable logging. Disabling logging results in zero retention mode, disabling history features.
- Text Normalization Option to control text normalization before synthesis: Auto (system decides), On (always applied), Off (skipped).
- Use PVC as IVC Boolean to choose between two voice versions for generation.
- Stitching Boolean to enable stitching, which provides context by passing previous and next request IDs to improve continuity in generated speech.
- Previous Request IDs Comma-separated list of up to 3 request IDs representing prior generated samples to provide context when stitching is enabled.
- Next Request IDs Comma-separated list of up to 3 request IDs representing subsequent generated samples to provide context when stitching is enabled.

Output

The node outputs the generated speech audio as binary data attached to the output item. The binary data contains the audio file encoded in the selected output format (e.g., MP3 or PCM). The binary property name defaults to "data" but can be customized via the "Binary Name" property. The filename of the audio file can also be customized.

The JSON output field typically includes metadata about the request and response, but the primary payload is the binary audio data representing the synthesized speech.

Dependencies

  • Requires an API key credential for authentication with the ElevenLabs API service.
  • Network access to the ElevenLabs API endpoint (https://api.elevenlabs.io/v1).
  • Proper configuration of the node with valid voice IDs and model identifiers supported by the ElevenLabs service.

Troubleshooting

  • Invalid Voice ID or Model ID: If the voice or model ID is incorrect or not available, the API may return errors. Verify the IDs by listing available voices/models.
  • API Authentication Errors: Ensure the API key credential is correctly configured and has necessary permissions.
  • Unsupported Output Format: Selecting an unsupported output format may cause failures. Use one of the provided options.
  • Latency Optimization Quality Tradeoff: Enabling high levels of streaming latency optimization may degrade audio quality. Adjust the setting based on your needs.
  • Stitching Context Issues: When using stitching, ensure that previous and next request IDs are valid and correspond to actual generated samples to avoid context errors.
  • Logging Disabled: Disabling logging disables history features; if you rely on history, keep logging enabled.

Links and References

Discussion