TTS

pipecat.services.minimax.tts.language_to_minimax_language(language)[source]

Parameters:: language (Language)
Return type:: str | None

class pipecat.services.minimax.tts.MiniMaxHttpTTSService(*, api_key, group_id, model='speech-02-turbo', voice_id='Calm_Woman', aiohttp_session, sample_rate=None, params=None, **kwargs)[source]

Bases: TTSService

Text-to-speech service using MiniMax’s T2A (Text-to-Audio) API.

Platform documentation: https://www.minimax.io/platform/document/T2A%20V2?key=66719005a427f0c8a5701643

Parameters:

api_key (str) – MiniMax API key for authentication.
group_id (str) – MiniMax Group ID to identify project.
model (str) – TTS model name (default: “speech-02-turbo”). Options include “speech-02-hd”, “speech-02-turbo”, “speech-01-hd”, “speech-01-turbo”.
voice_id (str) – Voice identifier (default: “Calm_Woman”).
aiohttp_session (ClientSession) – aiohttp.ClientSession for API communication.
sample_rate (int | None) – Output audio sample rate in Hz (default: None, set from pipeline).
params (InputParams | None) – Additional configuration parameters.

class InputParams(*, language=Language.EN, speed=1.0, volume=1.0, pitch=0, emotion=None, english_normalization=None)[source]

Bases: BaseModel

Configuration parameters for MiniMax TTS.

Parameters:

language (Language | None)
speed (float | None)
volume (float | None)
pitch (float | None)
emotion (str | None)
english_normalization (bool | None)

language

Language for TTS generation.

Type:: pipecat.transcriptions.language.Language | None

speed

Speech speed (range: 0.5 to 2.0).

Type:: float | None

volume

Speech volume (range: 0 to 10).

Type:: float | None

pitch

Pitch adjustment (range: -12 to 12).

Type:: float | None

emotion

Emotional tone (options: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, “neutral”).

Type:: str | None

english_normalization

Whether to apply English text normalization.

Type:: bool | None

language: Language | None

speed: float | None

volume: float | None

pitch: float | None

emotion: str | None

english_normalization: bool | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]

Return type:: bool

language_to_service_language(language)[source]

Convert a language to the service-specific language format.

Parameters:: language (Language) – The language to convert.
Returns:: The service-specific language identifier, or None if not supported.
Return type:: str | None

set_model_name(model)[source]

Set the TTS model to use

Parameters:: model (str)

set_voice(voice)[source]

Set the voice to use

Parameters:: voice (str)

async start(frame)[source]

Start the TTS service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async run_tts(text)[source]

Run text-to-speech synthesis on the provided text.

This method must be implemented by subclasses to provide actual TTS functionality.

Parameters:: text (str) – The text to synthesize into speech.
Yields:: Frame – Audio frames containing the synthesized speech.
Return type:: AsyncGenerator[Frame, None]