TTS

class pipecat.services.openai.tts.OpenAITTSService(*, api_key=None, base_url=None, voice='alloy', model='gpt-4o-mini-tts', sample_rate=None, instructions=None, **kwargs)[source]

Bases: TTSService

OpenAI Text-to-Speech service that generates audio from text.

This service uses the OpenAI TTS API to generate PCM-encoded audio at 24kHz.

Parameters:

api_key (str | None) – OpenAI API key. Defaults to None.
voice (str) – Voice ID to use. Defaults to “alloy”.
model (str) – TTS model to use. Defaults to “gpt-4o-mini-tts”.
sample_rate (int | None) – Output audio sample rate in Hz. Defaults to None.
**kwargs – Additional keyword arguments passed to TTSService.
base_url (str | None)
instructions (str | None)

The service returns PCM-encoded audio at the specified sample rate.

OPENAI_SAMPLE_RATE = 24000

can_generate_metrics()[source]

Return type:: bool

async set_model(model)[source]

Set the TTS model to use.

Parameters:: model (str) – The name of the TTS model.

async start(frame)[source]

Start the TTS service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async run_tts(text)[source]

Run text-to-speech synthesis on the provided text.

This method must be implemented by subclasses to provide actual TTS functionality.

Parameters:: text (str) – The text to synthesize into speech.
Yields:: Frame – Audio frames containing the synthesized speech.
Return type:: AsyncGenerator[Frame, None]