TTS

pipecat.services.rime.tts.language_to_rime_language(language)[source]

Convert pipecat Language to Rime language code.

Parameters:

language (Language) – The pipecat Language enum value.

Returns:

Three-letter language code used by Rime (e.g., ‘eng’ for English).

Return type:

str

class pipecat.services.rime.tts.RimeTTSService(*, api_key, voice_id, url='wss://users.rime.ai/ws2', model='mistv2', sample_rate=None, params=None, text_aggregator=None, **kwargs)[source]

Bases: AudioContextWordTTSService

Text-to-Speech service using Rime’s websocket API.

Uses Rime’s websocket JSON API to convert text to speech with word-level timing information. Supports interruptions and maintains context across multiple messages within a turn.

Parameters:
  • api_key (str)

  • voice_id (str)

  • url (str)

  • model (str)

  • sample_rate (int | None)

  • params (InputParams | None)

  • text_aggregator (BaseTextAggregator | None)

class InputParams(*, language=Language.EN, speed_alpha=1.0, reduce_latency=False, pause_between_brackets=False, phonemize_between_brackets=False)[source]

Bases: BaseModel

Configuration parameters for Rime TTS service.

Parameters:
  • language (Language | None)

  • speed_alpha (float | None)

  • reduce_latency (bool | None)

  • pause_between_brackets (bool | None)

  • phonemize_between_brackets (bool | None)

language: Language | None
speed_alpha: float | None
reduce_latency: bool | None
pause_between_brackets: bool | None
phonemize_between_brackets: bool | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]
Return type:

bool

language_to_service_language(language)[source]

Convert pipecat language to Rime language code.

Parameters:

language (Language)

Return type:

str | None

async set_model(model)[source]

Update the TTS model.

Parameters:

model (str)

async start(frame)[source]

Start the service and establish websocket connection.

Parameters:

frame (StartFrame)

async stop(frame)[source]

Stop the service and close connection.

Parameters:

frame (EndFrame)

async cancel(frame)[source]

Cancel current operation and clean up.

Parameters:

frame (CancelFrame)

async flush_audio()[source]

Flush any buffered audio data.

async push_frame(frame, direction=FrameDirection.DOWNSTREAM)[source]

Push frame and handle end-of-turn conditions.

Parameters:
  • frame (Frame)

  • direction (FrameDirection)

async run_tts(text)[source]

Generate speech from text.

Parameters:

text (str) – The text to convert to speech.

Yields:

Frames containing audio data and timing information.

Return type:

AsyncGenerator[Frame, None]

class pipecat.services.rime.tts.RimeHttpTTSService(*, api_key, voice_id, aiohttp_session, model='mistv2', sample_rate=None, params=None, **kwargs)[source]

Bases: TTSService

Parameters:
  • api_key (str)

  • voice_id (str)

  • aiohttp_session (ClientSession)

  • model (str)

  • sample_rate (int | None)

  • params (InputParams | None)

class InputParams(*, language=Language.EN, pause_between_brackets=False, phonemize_between_brackets=False, inline_speed_alpha=None, speed_alpha=1.0, reduce_latency=False)[source]

Bases: BaseModel

Parameters:
  • language (Language | None)

  • pause_between_brackets (bool | None)

  • phonemize_between_brackets (bool | None)

  • inline_speed_alpha (str | None)

  • speed_alpha (float | None)

  • reduce_latency (bool | None)

language: Language | None
pause_between_brackets: bool | None
phonemize_between_brackets: bool | None
inline_speed_alpha: str | None
speed_alpha: float | None
reduce_latency: bool | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]
Return type:

bool

language_to_service_language(language)[source]

Convert pipecat language to Rime language code.

Parameters:

language (Language)

Return type:

str | None

async run_tts(text)[source]

Run text-to-speech synthesis on the provided text.

This method must be implemented by subclasses to provide actual TTS functionality.

Parameters:

text (str) – The text to synthesize into speech.

Yields:

Frame – Audio frames containing the synthesized speech.

Return type:

AsyncGenerator[Frame, None]