TTS

pipecat.services.sarvam.tts.language_to_sarvam_language(language)[source]

Convert Pipecat Language enum to Sarvam AI language codes.

Parameters:: language (Language)
Return type:: str | None

class pipecat.services.sarvam.tts.SarvamTTSService(*, api_key, voice_id='anushka', model='bulbul:v2', aiohttp_session, base_url='https://api.sarvam.ai', sample_rate=None, params=None, **kwargs)[source]

Bases: TTSService

Text-to-Speech service using Sarvam AI’s API.

Converts text to speech using Sarvam AI’s TTS models with support for multiple Indian languages. Provides control over voice characteristics like pitch, pace, and loudness.

Parameters:

api_key (str) – Sarvam AI API subscription key.
voice_id (str) – Speaker voice ID (e.g., “anushka”, “meera”).
model (str) – TTS model to use (“bulbul:v1” or “bulbul:v2”).
aiohttp_session (ClientSession) – Shared aiohttp session for making requests.
base_url (str) – Sarvam AI API base URL.
sample_rate (int | None) – Audio sample rate in Hz (8000, 16000, 22050, 24000).
params (InputParams | None) – Additional voice and preprocessing parameters.

Example

```python tts = SarvamTTSService(

api_key=”your-api-key”, voice_id=”anushka”, model=”bulbul:v2”, aiohttp_session=session, params=SarvamTTSService.InputParams(

language=Language.HI, pitch=0.1, pace=1.2

)

)

class InputParams(*, language=Language.EN, pitch=0.0, pace=1.0, loudness=1.0, enable_preprocessing=False)[source]

Bases: BaseModel

Parameters:

language (Language | None)
pitch (float | None)
pace (float | None)
loudness (float | None)
enable_preprocessing (bool | None)

language: Language | None

pitch: float | None

pace: float | None

loudness: float | None

enable_preprocessing: bool | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]

Return type:: bool

language_to_service_language(language)[source]

Convert a language to the service-specific language format.

Parameters:: language (Language) – The language to convert.
Returns:: The service-specific language identifier, or None if not supported.
Return type:: str | None

async start(frame)[source]

Start the TTS service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async run_tts(text)[source]

Run text-to-speech synthesis on the provided text.

This method must be implemented by subclasses to provide actual TTS functionality.

Parameters:: text (str) – The text to synthesize into speech.
Yields:: Frame – Audio frames containing the synthesized speech.
Return type:: AsyncGenerator[Frame, None]