STT

pipecat.services.riva.stt.language_to_riva_language(language)[source]

Maps Language enum to Riva ASR language codes.

Source: https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-riva-build-table.html?highlight=fr%20fr

Parameters:

language (Language) – Language enum value.

Returns:

Riva language code or None if not supported.

Return type:

Optional[str]

class pipecat.services.riva.stt.RivaSTTService(*, api_key, server='grpc.nvcf.nvidia.com:443', model_function_map={'function_id': '1598d209-5e27-4d3c-8079-4751568b1081', 'model_name': 'parakeet-ctc-1.1b-asr'}, sample_rate=None, params=None, **kwargs)[source]

Bases: STTService

Parameters:
  • api_key (str)

  • server (str)

  • model_function_map (Mapping[str, str])

  • sample_rate (int | None)

  • params (InputParams | None)

class InputParams(*, language=Language.EN_US)[source]

Bases: BaseModel

Parameters:

language (Language | None)

language: Language | None
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]
Return type:

bool

async set_model(model)[source]

Set the speech recognition model.

Parameters:

model (str) – The name of the model to use for speech recognition.

async start(frame)[source]

Start the STT service.

Parameters:

frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the AI service.

Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.

Parameters:

frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the AI service.

Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.

Parameters:

frame (CancelFrame) – The cancel frame.

async run_stt(audio)[source]

Run speech-to-text on the provided audio data.

This method must be implemented by subclasses to provide actual speech recognition functionality.

Parameters:

audio (bytes) – Raw audio bytes to transcribe.

Yields:

Frame – Frames containing transcription results (typically TextFrame).

Return type:

AsyncGenerator[Frame, None]

class pipecat.services.riva.stt.RivaSegmentedSTTService(*, api_key, server='grpc.nvcf.nvidia.com:443', model_function_map={'function_id': 'ee8dc628-76de-4acc-8595-1836e7e857bd', 'model_name': 'canary-1b-asr'}, sample_rate=None, params=None, **kwargs)[source]

Bases: SegmentedSTTService

Speech-to-text service using NVIDIA Riva’s offline/batch models.

By default, his service uses NVIDIA’s Riva Canary ASR API to perform speech-to-text transcription on audio segments. It inherits from SegmentedSTTService to handle audio buffering and speech detection.

Parameters:
  • api_key (str) – NVIDIA API key for authentication

  • server (str) – Riva server address (defaults to NVIDIA Cloud Function endpoint)

  • model_function_map (Mapping[str, str]) – Mapping of model name and its corresponding NVIDIA Cloud Function ID

  • sample_rate (int | None) – Audio sample rate in Hz. If not provided, uses the pipeline’s rate

  • params (InputParams | None) – Additional configuration parameters for Riva

  • **kwargs – Additional arguments passed to SegmentedSTTService

class InputParams(*, language=Language.EN_US, profanity_filter=False, automatic_punctuation=True, verbatim_transcripts=False, boosted_lm_words=None, boosted_lm_score=4.0)[source]

Bases: BaseModel

Parameters:
  • language (Language | None)

  • profanity_filter (bool)

  • automatic_punctuation (bool)

  • verbatim_transcripts (bool)

  • boosted_lm_words (List[str] | None)

  • boosted_lm_score (float)

language: Language | None
profanity_filter: bool
automatic_punctuation: bool
verbatim_transcripts: bool
boosted_lm_words: List[str] | None
boosted_lm_score: float
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

language_to_service_language(language)[source]

Convert pipecat Language enum to Riva’s language code.

Parameters:

language (Language)

Return type:

str | None

can_generate_metrics()[source]

Indicates whether this service can generate processing metrics.

Return type:

bool

async set_model(model)[source]

Set the speech recognition model.

Parameters:

model (str) – The name of the model to use for speech recognition.

async start(frame)[source]

Initialize the service when the pipeline starts.

Parameters:

frame (StartFrame)

async set_language(language)[source]

Set the language for the STT service.

Parameters:

language (Language)

async run_stt(audio)[source]

Transcribe an audio segment.

Parameters:

audio (bytes) – Raw audio bytes in WAV format (already converted by base class).

Yields:

Frame – TranscriptionFrame containing the transcribed text.

Return type:

AsyncGenerator[Frame, None]

class pipecat.services.riva.stt.ParakeetSTTService(*, api_key, server='grpc.nvcf.nvidia.com:443', model_function_map={'function_id': '1598d209-5e27-4d3c-8079-4751568b1081', 'model_name': 'parakeet-ctc-1.1b-asr'}, sample_rate=None, params=None, **kwargs)[source]

Bases: RivaSTTService

Deprecated: Use RivaSTTService instead.

Parameters:
  • api_key (str)

  • server (str)

  • model_function_map (Mapping[str, str])

  • sample_rate (int | None)

  • params (InputParams | None)