STT

pipecat.services.riva.stt.language_to_riva_language(language)[source]

Maps Language enum to Riva ASR language codes.

Source: https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-riva-build-table.html?highlight=fr%20fr

Parameters:: language (Language) – Language enum value.
Returns:: Riva language code or None if not supported.
Return type:: Optional[str]

class pipecat.services.riva.stt.RivaSTTService(*, api_key, server='grpc.nvcf.nvidia.com:443', model_function_map={'function_id': '1598d209-5e27-4d3c-8079-4751568b1081', 'model_name': 'parakeet-ctc-1.1b-asr'}, sample_rate=None, params=None, **kwargs)[source]

Bases: STTService

Parameters:

api_key (str)
server (str)
model_function_map (Mapping[str, str])
sample_rate (int | None)
params (InputParams | None)

class InputParams(*, language=Language.EN_US)[source]

Bases: BaseModel

Parameters:: language (Language | None)

language: Language | None

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]

Return type:: bool

async set_model(model)[source]

Set the speech recognition model.

Parameters:: model (str) – The name of the model to use for speech recognition.

async start(frame)[source]

Start the STT service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the AI service.

Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.

Parameters:: frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the AI service.

Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.

Parameters:: frame (CancelFrame) – The cancel frame.

async run_stt(audio)[source]

Run speech-to-text on the provided audio data.

This method must be implemented by subclasses to provide actual speech recognition functionality.

Parameters:: audio (bytes) – Raw audio bytes to transcribe.
Yields:: Frame – Frames containing transcription results (typically TextFrame).
Return type:: AsyncGenerator[Frame, None]

class pipecat.services.riva.stt.RivaSegmentedSTTService(*, api_key, server='grpc.nvcf.nvidia.com:443', model_function_map={'function_id': 'ee8dc628-76de-4acc-8595-1836e7e857bd', 'model_name': 'canary-1b-asr'}, sample_rate=None, params=None, **kwargs)[source]

Bases: SegmentedSTTService

Speech-to-text service using NVIDIA Riva’s offline/batch models.

By default, his service uses NVIDIA’s Riva Canary ASR API to perform speech-to-text transcription on audio segments. It inherits from SegmentedSTTService to handle audio buffering and speech detection.

Parameters:

api_key (str) – NVIDIA API key for authentication
server (str) – Riva server address (defaults to NVIDIA Cloud Function endpoint)
model_function_map (Mapping[str, str]) – Mapping of model name and its corresponding NVIDIA Cloud Function ID
sample_rate (int | None) – Audio sample rate in Hz. If not provided, uses the pipeline’s rate
params (InputParams | None) – Additional configuration parameters for Riva
**kwargs – Additional arguments passed to SegmentedSTTService

class InputParams(*, language=Language.EN_US, profanity_filter=False, automatic_punctuation=True, verbatim_transcripts=False, boosted_lm_words=None, boosted_lm_score=4.0)[source]

Bases: BaseModel

Parameters:

language (Language | None)
profanity_filter (bool)
automatic_punctuation (bool)
verbatim_transcripts (bool)
boosted_lm_words (List[str] | None)
boosted_lm_score (float)

language: Language | None

profanity_filter: bool

automatic_punctuation: bool

verbatim_transcripts: bool

boosted_lm_words: List[str] | None

boosted_lm_score: float

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

language_to_service_language(language)[source]

Convert pipecat Language enum to Riva’s language code.

Parameters:: language (Language)
Return type:: str | None

can_generate_metrics()[source]

Indicates whether this service can generate processing metrics.

Return type:: bool

async set_model(model)[source]

Set the speech recognition model.

Parameters:: model (str) – The name of the model to use for speech recognition.

async start(frame)[source]

Initialize the service when the pipeline starts.

Parameters:: frame (StartFrame)

async set_language(language)[source]

Set the language for the STT service.

Parameters:: language (Language)

async run_stt(audio)[source]

Transcribe an audio segment.

Parameters:: audio (bytes) – Raw audio bytes in WAV format (already converted by base class).
Yields:: Frame – TranscriptionFrame containing the transcribed text.
Return type:: AsyncGenerator[Frame, None]

class pipecat.services.riva.stt.ParakeetSTTService(*, api_key, server='grpc.nvcf.nvidia.com:443', model_function_map={'function_id': '1598d209-5e27-4d3c-8079-4751568b1081', 'model_name': 'parakeet-ctc-1.1b-asr'}, sample_rate=None, params=None, **kwargs)[source]

Bases: RivaSTTService

Deprecated: Use RivaSTTService instead.

Parameters:

api_key (str)
server (str)
model_function_map (Mapping[str, str])
sample_rate (int | None)
params (InputParams | None)