TTS
- pipecat.services.elevenlabs.tts.language_to_elevenlabs_language(language)[source]
- Parameters:
language (Language)
- Return type:
str | None
- pipecat.services.elevenlabs.tts.output_format_from_sample_rate(sample_rate)[source]
- Parameters:
sample_rate (int)
- Return type:
str
- pipecat.services.elevenlabs.tts.build_elevenlabs_voice_settings(settings)[source]
Build voice settings dictionary for ElevenLabs based on provided settings.
- Parameters:
settings (Dict[str, Any]) – Dictionary containing voice settings parameters
- Returns:
Dictionary of voice settings or None if no valid settings are provided
- Return type:
Dict[str, float | bool] | None
- pipecat.services.elevenlabs.tts.calculate_word_times(alignment_info, cumulative_time)[source]
- Parameters:
alignment_info (Mapping[str, Any])
cumulative_time (float)
- Return type:
List[Tuple[str, float]]
- class pipecat.services.elevenlabs.tts.ElevenLabsTTSService(*, api_key, voice_id, model='eleven_flash_v2_5', url='wss://api.elevenlabs.io', sample_rate=None, params=None, **kwargs)[source]
Bases:
AudioContextWordTTSService
- Parameters:
api_key (str)
voice_id (str)
model (str)
url (str)
sample_rate (int | None)
params (InputParams | None)
- class InputParams(*, language=None, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, speed=None, auto_mode=True, enable_ssml_parsing=None, enable_logging=None)[source]
Bases:
BaseModel
- Parameters:
language (Language | None)
stability (float | None)
similarity_boost (float | None)
style (float | None)
use_speaker_boost (bool | None)
speed (float | None)
auto_mode (bool | None)
enable_ssml_parsing (bool | None)
enable_logging (bool | None)
- language: Language | None
- stability: float | None
- similarity_boost: float | None
- style: float | None
- use_speaker_boost: bool | None
- speed: float | None
- auto_mode: bool | None
- enable_ssml_parsing: bool | None
- enable_logging: bool | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- can_generate_metrics()[source]
- Return type:
bool
- language_to_service_language(language)[source]
Convert a language to the service-specific language format.
- Parameters:
language (Language) – The language to convert.
- Returns:
The service-specific language identifier, or None if not supported.
- Return type:
str | None
- async set_model(model)[source]
Set the TTS model to use.
- Parameters:
model (str) – The name of the TTS model.
- async start(frame)[source]
Start the audio context TTS service.
- Parameters:
frame (StartFrame) – The start frame containing initialization parameters.
- async stop(frame)[source]
Stop the audio context TTS service.
- Parameters:
frame (EndFrame) – The end frame.
- async cancel(frame)[source]
Cancel the audio context TTS service.
- Parameters:
frame (CancelFrame) – The cancel frame.
- async flush_audio()[source]
Flush any buffered audio data.
- async push_frame(frame, direction=FrameDirection.DOWNSTREAM)[source]
Push a frame downstream with TTS-specific handling.
- Parameters:
frame (Frame) – The frame to push.
direction (FrameDirection) – The direction to push the frame.
- async run_tts(text)[source]
Run text-to-speech synthesis on the provided text.
This method must be implemented by subclasses to provide actual TTS functionality.
- Parameters:
text (str) – The text to synthesize into speech.
- Yields:
Frame – Audio frames containing the synthesized speech.
- Return type:
AsyncGenerator[Frame, None]
- class pipecat.services.elevenlabs.tts.ElevenLabsHttpTTSService(*, api_key, voice_id, aiohttp_session, model='eleven_flash_v2_5', base_url='https://api.elevenlabs.io', sample_rate=None, params=None, **kwargs)[source]
Bases:
WordTTSService
ElevenLabs Text-to-Speech service using HTTP streaming with word timestamps.
- Parameters:
api_key (str) – ElevenLabs API key
voice_id (str) – ID of the voice to use
aiohttp_session (ClientSession) – aiohttp ClientSession
model (str) – Model ID (default: “eleven_flash_v2_5” for low latency)
base_url (str) – API base URL
sample_rate (int | None) – Output sample rate
params (InputParams | None) – Additional parameters for voice configuration
- class InputParams(*, language=None, optimize_streaming_latency=None, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, speed=None)[source]
Bases:
BaseModel
- Parameters:
language (Language | None)
optimize_streaming_latency (int | None)
stability (float | None)
similarity_boost (float | None)
style (float | None)
use_speaker_boost (bool | None)
speed (float | None)
- language: Language | None
- optimize_streaming_latency: int | None
- stability: float | None
- similarity_boost: float | None
- style: float | None
- use_speaker_boost: bool | None
- speed: float | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- language_to_service_language(language)[source]
Convert pipecat Language to ElevenLabs language code.
- Parameters:
language (Language)
- Return type:
str | None
- can_generate_metrics()[source]
Indicate that this service can generate usage metrics.
- Return type:
bool
- async start(frame)[source]
Initialize the service upon receiving a StartFrame.
- Parameters:
frame (StartFrame)
- async push_frame(frame, direction=FrameDirection.DOWNSTREAM)[source]
Push a frame downstream with TTS-specific handling.
- Parameters:
frame (Frame) – The frame to push.
direction (FrameDirection) – The direction to push the frame.
- calculate_word_times(alignment_info)[source]
Calculate word timing from character alignment data.
Example input data: {
“characters”: [” “, “H”, “e”, “l”, “l”, “o”, “ “, “w”, “o”, “r”, “l”, “d”], “character_start_times_seconds”: [0.0, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], “character_end_times_seconds”: [0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
}
Would produce word times (with cumulative_time=0): [(“Hello”, 0.1), (“world”, 0.5)]
- Parameters:
alignment_info (Mapping[str, Any]) – Character timing data from ElevenLabs
- Returns:
List of (word, timestamp) pairs
- Return type:
List[Tuple[str, float]]
- async run_tts(text)[source]
Generate speech from text using ElevenLabs streaming API with timestamps.
Makes a request to the ElevenLabs API to generate audio and timing data. Tracks the duration of each utterance to ensure correct sequencing. Includes previous text as context for better prosody continuity.
- Parameters:
text (str) – Text to convert to speech
- Yields:
Audio and control frames
- Return type:
AsyncGenerator[Frame, None]