TTS

pipecat.services.minimax.tts.language_to_minimax_language(language)[source]
Parameters:

language (Language)

Return type:

str | None

class pipecat.services.minimax.tts.MiniMaxHttpTTSService(*, api_key, group_id, model='speech-02-turbo', voice_id='Calm_Woman', aiohttp_session, sample_rate=None, params=None, **kwargs)[source]

Bases: TTSService

Text-to-speech service using MiniMax’s T2A (Text-to-Audio) API.

Platform documentation: https://www.minimax.io/platform/document/T2A%20V2?key=66719005a427f0c8a5701643

Parameters:
  • api_key (str) – MiniMax API key for authentication.

  • group_id (str) – MiniMax Group ID to identify project.

  • model (str) – TTS model name (default: “speech-02-turbo”). Options include “speech-02-hd”, “speech-02-turbo”, “speech-01-hd”, “speech-01-turbo”.

  • voice_id (str) – Voice identifier (default: “Calm_Woman”).

  • aiohttp_session (ClientSession) – aiohttp.ClientSession for API communication.

  • sample_rate (int | None) – Output audio sample rate in Hz (default: None, set from pipeline).

  • params (InputParams | None) – Additional configuration parameters.

class InputParams(*, language=Language.EN, speed=1.0, volume=1.0, pitch=0, emotion=None, english_normalization=None)[source]

Bases: BaseModel

Configuration parameters for MiniMax TTS.

Parameters:
  • language (Language | None)

  • speed (float | None)

  • volume (float | None)

  • pitch (float | None)

  • emotion (str | None)

  • english_normalization (bool | None)

language

Language for TTS generation.

Type:

pipecat.transcriptions.language.Language | None

speed

Speech speed (range: 0.5 to 2.0).

Type:

float | None

volume

Speech volume (range: 0 to 10).

Type:

float | None

pitch

Pitch adjustment (range: -12 to 12).

Type:

float | None

emotion

Emotional tone (options: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, “neutral”).

Type:

str | None

english_normalization

Whether to apply English text normalization.

Type:

bool | None

language: Language | None
speed: float | None
volume: float | None
pitch: float | None
emotion: str | None
english_normalization: bool | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]
Return type:

bool

language_to_service_language(language)[source]

Convert a language to the service-specific language format.

Parameters:

language (Language) – The language to convert.

Returns:

The service-specific language identifier, or None if not supported.

Return type:

str | None

set_model_name(model)[source]

Set the TTS model to use

Parameters:

model (str)

set_voice(voice)[source]

Set the voice to use

Parameters:

voice (str)

async start(frame)[source]

Start the TTS service.

Parameters:

frame (StartFrame) – The start frame containing initialization parameters.

async run_tts(text)[source]

Run text-to-speech synthesis on the provided text.

This method must be implemented by subclasses to provide actual TTS functionality.

Parameters:

text (str) – The text to synthesize into speech.

Yields:

Frame – Audio frames containing the synthesized speech.

Return type:

AsyncGenerator[Frame, None]