TTS

pipecat.services.elevenlabs.tts.language_to_elevenlabs_language(language)[source]

Parameters:: language (Language)
Return type:: str | None

pipecat.services.elevenlabs.tts.output_format_from_sample_rate(sample_rate)[source]

Parameters:: sample_rate (int)
Return type:: str

pipecat.services.elevenlabs.tts.build_elevenlabs_voice_settings(settings)[source]

Build voice settings dictionary for ElevenLabs based on provided settings.

Parameters:: settings (Dict[str, Any]) – Dictionary containing voice settings parameters
Returns:: Dictionary of voice settings or None if no valid settings are provided
Return type:: Dict[str, float | bool] | None

pipecat.services.elevenlabs.tts.calculate_word_times(alignment_info, cumulative_time)[source]

Parameters:

alignment_info (Mapping[str, Any])
cumulative_time (float)

Return type:

List[Tuple[str, float]]

class pipecat.services.elevenlabs.tts.ElevenLabsTTSService(*, api_key, voice_id, model='eleven_flash_v2_5', url='wss://api.elevenlabs.io', sample_rate=None, params=None, **kwargs)[source]

Bases: AudioContextWordTTSService

Parameters:

api_key (str)
voice_id (str)
model (str)
url (str)
sample_rate (int | None)
params (InputParams | None)

class InputParams(*, language=None, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, speed=None, auto_mode=True, enable_ssml_parsing=None, enable_logging=None)[source]

Bases: BaseModel

Parameters:

language (Language | None)
stability (float | None)
similarity_boost (float | None)
style (float | None)
use_speaker_boost (bool | None)
speed (float | None)
auto_mode (bool | None)
enable_ssml_parsing (bool | None)
enable_logging (bool | None)

language: Language | None

stability: float | None

similarity_boost: float | None

style: float | None

use_speaker_boost: bool | None

speed: float | None

auto_mode: bool | None

enable_ssml_parsing: bool | None

enable_logging: bool | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]

Return type:: bool

language_to_service_language(language)[source]

Convert a language to the service-specific language format.

Parameters:: language (Language) – The language to convert.
Returns:: The service-specific language identifier, or None if not supported.
Return type:: str | None

async set_model(model)[source]

Set the TTS model to use.

Parameters:: model (str) – The name of the TTS model.

async start(frame)[source]

Start the audio context TTS service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the audio context TTS service.

Parameters:: frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the audio context TTS service.

Parameters:: frame (CancelFrame) – The cancel frame.

async flush_audio()[source]: Flush any buffered audio data.

async push_frame(frame, direction=FrameDirection.DOWNSTREAM)[source]

Push a frame downstream with TTS-specific handling.

Parameters:

frame (Frame) – The frame to push.
direction (FrameDirection) – The direction to push the frame.

async run_tts(text)[source]

Run text-to-speech synthesis on the provided text.

This method must be implemented by subclasses to provide actual TTS functionality.

Parameters:: text (str) – The text to synthesize into speech.
Yields:: Frame – Audio frames containing the synthesized speech.
Return type:: AsyncGenerator[Frame, None]

class pipecat.services.elevenlabs.tts.ElevenLabsHttpTTSService(*, api_key, voice_id, aiohttp_session, model='eleven_flash_v2_5', base_url='https://api.elevenlabs.io', sample_rate=None, params=None, **kwargs)[source]

Bases: WordTTSService

ElevenLabs Text-to-Speech service using HTTP streaming with word timestamps.

Parameters:

api_key (str) – ElevenLabs API key
voice_id (str) – ID of the voice to use
aiohttp_session (ClientSession) – aiohttp ClientSession
model (str) – Model ID (default: “eleven_flash_v2_5” for low latency)
base_url (str) – API base URL
sample_rate (int | None) – Output sample rate
params (InputParams | None) – Additional parameters for voice configuration

class InputParams(*, language=None, optimize_streaming_latency=None, stability=None, similarity_boost=None, style=None, use_speaker_boost=None, speed=None)[source]

Bases: BaseModel

Parameters:

language (Language | None)
optimize_streaming_latency (int | None)
stability (float | None)
similarity_boost (float | None)
style (float | None)
use_speaker_boost (bool | None)
speed (float | None)

language: Language | None

optimize_streaming_latency: int | None

stability: float | None

similarity_boost: float | None

style: float | None

use_speaker_boost: bool | None

speed: float | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

language_to_service_language(language)[source]

Convert pipecat Language to ElevenLabs language code.

Parameters:: language (Language)
Return type:: str | None

can_generate_metrics()[source]

Indicate that this service can generate usage metrics.

Return type:: bool

async start(frame)[source]

Initialize the service upon receiving a StartFrame.

Parameters:: frame (StartFrame)

async push_frame(frame, direction=FrameDirection.DOWNSTREAM)[source]

Push a frame downstream with TTS-specific handling.

Parameters:

frame (Frame) – The frame to push.
direction (FrameDirection) – The direction to push the frame.

calculate_word_times(alignment_info)[source]

Calculate word timing from character alignment data.

Example input data: {

“characters”: [” “, “H”, “e”, “l”, “l”, “o”, “ “, “w”, “o”, “r”, “l”, “d”], “character_start_times_seconds”: [0.0, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], “character_end_times_seconds”: [0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]

}

Would produce word times (with cumulative_time=0): [(“Hello”, 0.1), (“world”, 0.5)]

Parameters:: alignment_info (Mapping[str, Any]) – Character timing data from ElevenLabs
Returns:: List of (word, timestamp) pairs
Return type:: List[Tuple[str, float]]

async run_tts(text)[source]

Generate speech from text using ElevenLabs streaming API with timestamps.

Makes a request to the ElevenLabs API to generate audio and timing data. Tracks the duration of each utterance to ensure correct sequencing. Includes previous text as context for better prosody continuity.

Parameters:: text (str) – Text to convert to speech
Yields:: Audio and control frames
Return type:: AsyncGenerator[Frame, None]