TtsService

Base classes for Text-to-speech services.

class pipecat.services.tts_service.TTSService(*, aggregate_sentences=True, push_text_frames=True, push_stop_frames=False, stop_frame_timeout_s=2.0, push_silence_after_stop=False, silence_time_s=2.0, pause_frame_processing=False, sample_rate=None, text_aggregator=None, text_filters=None, text_filter=None, transport_destination=None, **kwargs)[source]

Bases: AIService

Base class for text-to-speech services.

Provides common functionality for TTS services including text aggregation, filtering, audio generation, and frame management. Supports configurable sentence aggregation, silence insertion, and frame processing control.

Parameters:

aggregate_sentences (bool) – Whether to aggregate text into sentences before synthesis.
push_text_frames (bool) – Whether to push TextFrames and LLMFullResponseEndFrames.
push_stop_frames (bool) – Whether to automatically push TTSStoppedFrames.
stop_frame_timeout_s (float) – Idle time before pushing TTSStoppedFrame when push_stop_frames is True.
push_silence_after_stop (bool) – Whether to push silence audio after TTSStoppedFrame.
silence_time_s (float) – Duration of silence to push when push_silence_after_stop is True.
pause_frame_processing (bool) – Whether to pause frame processing during audio generation.
sample_rate (int | None) – Output sample rate for generated audio.
text_aggregator (BaseTextAggregator | None) – Custom text aggregator for processing incoming text.
text_filters (Sequence[BaseTextFilter] | None) – Sequence of text filters to apply after aggregation.
text_filter (BaseTextFilter | None) – Single text filter (deprecated, use text_filters).
transport_destination (str | None) – Destination for generated audio frames.
**kwargs – Additional arguments passed to the parent AIService.

property sample_rate: int

Get the current sample rate for audio output.

Returns:: The sample rate in Hz.

property chunk_size: int

Get the recommended chunk size for audio streaming.

This property indicates how much audio we download (from TTS services that require chunking) before we start pushing the first audio frame. This will make sure we download the rest of the audio while audio is being played without causing audio glitches (specially at the beginning). Of course, this will also depend on how fast the TTS service generates bytes.

Returns:: The recommended chunk size in bytes.

async set_model(model)[source]

Set the TTS model to use.

Parameters:: model (str) – The name of the TTS model.

set_voice(voice)[source]

Set the voice for speech synthesis.

Parameters:: voice (str) – The voice identifier or name.

abstractmethod async run_tts(text)[source]

Run text-to-speech synthesis on the provided text.

This method must be implemented by subclasses to provide actual TTS functionality.

Parameters:: text (str) – The text to synthesize into speech.
Yields:: Frame – Audio frames containing the synthesized speech.
Return type:: AsyncGenerator[Frame, None]

language_to_service_language(language)[source]

Convert a language to the service-specific language format.

Parameters:: language (Language) – The language to convert.
Returns:: The service-specific language identifier, or None if not supported.
Return type:: str | None

async update_setting(key, value)[source]

Update a service-specific setting.

Parameters:

key (str) – The setting key to update.
value (Any) – The new value for the setting.

async flush_audio()[source]: Flush any buffered audio data.

async start(frame)[source]

Start the TTS service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the TTS service.

Parameters:: frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the TTS service.

Parameters:: frame (CancelFrame) – The cancel frame.

async say(text)[source]

Immediately speak the provided text.

Parameters:: text (str) – The text to speak.

async process_frame(frame, direction)[source]

Process frames for text-to-speech conversion.

Handles TextFrames for synthesis, interruption frames, settings updates, and various control frames.

Parameters:

frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.

async push_frame(frame, direction=FrameDirection.DOWNSTREAM)[source]

Push a frame downstream with TTS-specific handling.

Parameters:

frame (Frame) – The frame to push.
direction (FrameDirection) – The direction to push the frame.

class pipecat.services.tts_service.WordTTSService(**kwargs)[source]

Bases: TTSService

Base class for TTS services that support word timestamps.

Word timestamps are useful to synchronize audio with text of the spoken words. This way only the spoken words are added to the conversation context.

Parameters:: **kwargs – Additional arguments passed to the parent TTSService.

start_word_timestamps()[source]: Start tracking word timestamps from the current time.

reset_word_timestamps()[source]: Reset word timestamp tracking.

async add_word_timestamps(word_times)[source]

Add word timestamps to the processing queue.

Parameters:: word_times (List[Tuple[str, float]]) – List of (word, timestamp) tuples where timestamp is in seconds.

async start(frame)[source]

Start the word TTS service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the word TTS service.

Parameters:: frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the word TTS service.

Parameters:: frame (CancelFrame) – The cancel frame.

async process_frame(frame, direction)[source]

Process frames with word timestamp awareness.

Parameters:

frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.

class pipecat.services.tts_service.WebsocketTTSService(*, reconnect_on_error=True, **kwargs)[source]

Bases: TTSService, WebsocketService

Base class for websocket-based TTS services.

Combines TTS functionality with websocket connectivity, providing automatic error handling and reconnection capabilities.

Parameters:

reconnect_on_error (bool) – Whether to automatically reconnect on websocket errors.
**kwargs – Additional arguments passed to parent classes.

Event handlers:: on_connection_error: Called when a websocket connection error occurs.

Example

```python @tts.event_handler(“on_connection_error”) async def on_connection_error(tts: TTSService, error: str):

logger.error(f”TTS connection error: {error}”)

```

class pipecat.services.tts_service.InterruptibleTTSService(**kwargs)[source]

Bases: WebsocketTTSService

Websocket-based TTS service that handles interruptions without word timestamps.

Designed for TTS services that don’t support word timestamps. Handles interruptions by reconnecting the websocket when the bot is speaking and gets interrupted.

Parameters:: **kwargs – Additional arguments passed to the parent WebsocketTTSService.

async process_frame(frame, direction)[source]

Process frames with bot speaking state tracking.

Parameters:

frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.

class pipecat.services.tts_service.WebsocketWordTTSService(*, reconnect_on_error=True, **kwargs)[source]

Bases: WordTTSService, WebsocketService

Base class for websocket-based TTS services that support word timestamps.

Combines word timestamp functionality with websocket connectivity.

Parameters:

reconnect_on_error (bool) – Whether to automatically reconnect on websocket errors.
**kwargs – Additional arguments passed to parent classes.

Event handlers:: on_connection_error: Called when a websocket connection error occurs.

Example

```python @tts.event_handler(“on_connection_error”) async def on_connection_error(tts: TTSService, error: str):

logger.error(f”TTS connection error: {error}”)

```

class pipecat.services.tts_service.InterruptibleWordTTSService(**kwargs)[source]

Bases: WebsocketWordTTSService

Websocket-based TTS service with word timestamps that handles interruptions.

For TTS services that support word timestamps but can’t correlate generated audio with requested text. Handles interruptions by reconnecting when needed.

Parameters:: **kwargs – Additional arguments passed to the parent WebsocketWordTTSService.

async process_frame(frame, direction)[source]

Process frames with bot speaking state tracking.

Parameters:

frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.

class pipecat.services.tts_service.AudioContextWordTTSService(**kwargs)[source]

Bases: WebsocketWordTTSService

Websocket-based TTS service with word timestamps and audio context management.

This is a base class for websocket-based TTS services that support word timestamps and also allow correlating the generated audio with the requested text.

Each request could be multiple sentences long which are grouped by context. For this to work, the TTS service needs to support handling multiple requests at once (i.e. multiple simultaneous contexts).

The audio received from the TTS will be played in context order. That is, if we requested audio for a context “A” and then audio for context “B”, the audio from context ID “A” will be played first.

Parameters:: **kwargs – Additional arguments passed to the parent WebsocketWordTTSService.

async create_audio_context(context_id)[source]

Create a new audio context for grouping related audio.

Parameters:: context_id (str) – Unique identifier for the audio context.

async append_to_audio_context(context_id, frame)[source]

Append audio to an existing context.

Parameters:

context_id (str) – The context to append audio to.
frame (TTSAudioRawFrame) – The audio frame to append.

async remove_audio_context(context_id)[source]

Remove an existing audio context.

Parameters:: context_id (str) – The context to remove.

audio_context_available(context_id)[source]

Check whether the given audio context is registered.

Parameters:: context_id (str) – The context ID to check.
Returns:: True if the context exists and is available.
Return type:: bool

async start(frame)[source]

Start the audio context TTS service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the audio context TTS service.

Parameters:: frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the audio context TTS service.

Parameters:: frame (CancelFrame) – The cancel frame.