TTS

class pipecat.services.fish.tts.FishAudioTTSService(*, api_key, model, output_format='pcm', sample_rate=None, params=None, **kwargs)[source]

Bases: InterruptibleTTSService

Parameters:
  • api_key (str)

  • model (str)

  • output_format (Literal['opus', 'mp3', 'pcm', 'wav'])

  • sample_rate (int | None)

  • params (InputParams | None)

class InputParams(*, language=Language.EN, latency='normal', prosody_speed=1.0, prosody_volume=0)[source]

Bases: BaseModel

Parameters:
  • language (Language | None)

  • latency (str | None)

  • prosody_speed (float | None)

  • prosody_volume (int | None)

language: Language | None
latency: str | None
prosody_speed: float | None
prosody_volume: int | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]
Return type:

bool

async set_model(model)[source]

Set the TTS model to use.

Parameters:

model (str) – The name of the TTS model.

async start(frame)[source]

Start the TTS service.

Parameters:

frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the TTS service.

Parameters:

frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the TTS service.

Parameters:

frame (CancelFrame) – The cancel frame.

async flush_audio()[source]

Flush any buffered audio by sending a flush event to Fish Audio.

async run_tts(text)[source]

Run text-to-speech synthesis on the provided text.

This method must be implemented by subclasses to provide actual TTS functionality.

Parameters:

text (str) – The text to synthesize into speech.

Yields:

Frame – Audio frames containing the synthesized speech.

Return type:

AsyncGenerator[Frame, None]