TTS

class pipecat.services.openai.tts.OpenAITTSService(*, api_key=None, base_url=None, voice='alloy', model='gpt-4o-mini-tts', sample_rate=None, instructions=None, **kwargs)[source]

Bases: TTSService

OpenAI Text-to-Speech service that generates audio from text.

This service uses the OpenAI TTS API to generate PCM-encoded audio at 24kHz.

Parameters:
  • api_key (str | None) – OpenAI API key. Defaults to None.

  • voice (str) – Voice ID to use. Defaults to “alloy”.

  • model (str) – TTS model to use. Defaults to “gpt-4o-mini-tts”.

  • sample_rate (int | None) – Output audio sample rate in Hz. Defaults to None.

  • **kwargs – Additional keyword arguments passed to TTSService.

  • base_url (str | None)

  • instructions (str | None)

The service returns PCM-encoded audio at the specified sample rate.

OPENAI_SAMPLE_RATE = 24000
can_generate_metrics()[source]
Return type:

bool

async set_model(model)[source]

Set the TTS model to use.

Parameters:

model (str) – The name of the TTS model.

async start(frame)[source]

Start the TTS service.

Parameters:

frame (StartFrame) – The start frame containing initialization parameters.

async run_tts(text)[source]

Run text-to-speech synthesis on the provided text.

This method must be implemented by subclasses to provide actual TTS functionality.

Parameters:

text (str) – The text to synthesize into speech.

Yields:

Frame – Audio frames containing the synthesized speech.

Return type:

AsyncGenerator[Frame, None]