STT
- pipecat.services.riva.stt.language_to_riva_language(language)[source]
Maps Language enum to Riva ASR language codes.
- Parameters:
language (Language) – Language enum value.
- Returns:
Riva language code or None if not supported.
- Return type:
Optional[str]
- class pipecat.services.riva.stt.RivaSTTService(*, api_key, server='grpc.nvcf.nvidia.com:443', model_function_map={'function_id': '1598d209-5e27-4d3c-8079-4751568b1081', 'model_name': 'parakeet-ctc-1.1b-asr'}, sample_rate=None, params=None, **kwargs)[source]
Bases:
STTService
- Parameters:
api_key (str)
server (str)
model_function_map (Mapping[str, str])
sample_rate (int | None)
params (InputParams | None)
- class InputParams(*, language=Language.EN_US)[source]
Bases:
BaseModel
- Parameters:
language (Language | None)
- language: Language | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- can_generate_metrics()[source]
- Return type:
bool
- async set_model(model)[source]
Set the speech recognition model.
- Parameters:
model (str) – The name of the model to use for speech recognition.
- async start(frame)[source]
Start the STT service.
- Parameters:
frame (StartFrame) – The start frame containing initialization parameters.
- async stop(frame)[source]
Stop the AI service.
Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.
- Parameters:
frame (EndFrame) – The end frame.
- async cancel(frame)[source]
Cancel the AI service.
Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.
- Parameters:
frame (CancelFrame) – The cancel frame.
- async run_stt(audio)[source]
Run speech-to-text on the provided audio data.
This method must be implemented by subclasses to provide actual speech recognition functionality.
- Parameters:
audio (bytes) – Raw audio bytes to transcribe.
- Yields:
Frame – Frames containing transcription results (typically TextFrame).
- Return type:
AsyncGenerator[Frame, None]
- class pipecat.services.riva.stt.RivaSegmentedSTTService(*, api_key, server='grpc.nvcf.nvidia.com:443', model_function_map={'function_id': 'ee8dc628-76de-4acc-8595-1836e7e857bd', 'model_name': 'canary-1b-asr'}, sample_rate=None, params=None, **kwargs)[source]
Bases:
SegmentedSTTService
Speech-to-text service using NVIDIA Riva’s offline/batch models.
By default, his service uses NVIDIA’s Riva Canary ASR API to perform speech-to-text transcription on audio segments. It inherits from SegmentedSTTService to handle audio buffering and speech detection.
- Parameters:
api_key (str) – NVIDIA API key for authentication
server (str) – Riva server address (defaults to NVIDIA Cloud Function endpoint)
model_function_map (Mapping[str, str]) – Mapping of model name and its corresponding NVIDIA Cloud Function ID
sample_rate (int | None) – Audio sample rate in Hz. If not provided, uses the pipeline’s rate
params (InputParams | None) – Additional configuration parameters for Riva
**kwargs – Additional arguments passed to SegmentedSTTService
- class InputParams(*, language=Language.EN_US, profanity_filter=False, automatic_punctuation=True, verbatim_transcripts=False, boosted_lm_words=None, boosted_lm_score=4.0)[source]
Bases:
BaseModel
- Parameters:
language (Language | None)
profanity_filter (bool)
automatic_punctuation (bool)
verbatim_transcripts (bool)
boosted_lm_words (List[str] | None)
boosted_lm_score (float)
- language: Language | None
- profanity_filter: bool
- automatic_punctuation: bool
- verbatim_transcripts: bool
- boosted_lm_words: List[str] | None
- boosted_lm_score: float
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- language_to_service_language(language)[source]
Convert pipecat Language enum to Riva’s language code.
- Parameters:
language (Language)
- Return type:
str | None
- can_generate_metrics()[source]
Indicates whether this service can generate processing metrics.
- Return type:
bool
- async set_model(model)[source]
Set the speech recognition model.
- Parameters:
model (str) – The name of the model to use for speech recognition.
- async start(frame)[source]
Initialize the service when the pipeline starts.
- Parameters:
frame (StartFrame)
- async set_language(language)[source]
Set the language for the STT service.
- Parameters:
language (Language)
- async run_stt(audio)[source]
Transcribe an audio segment.
- Parameters:
audio (bytes) – Raw audio bytes in WAV format (already converted by base class).
- Yields:
Frame – TranscriptionFrame containing the transcribed text.
- Return type:
AsyncGenerator[Frame, None]
- class pipecat.services.riva.stt.ParakeetSTTService(*, api_key, server='grpc.nvcf.nvidia.com:443', model_function_map={'function_id': '1598d209-5e27-4d3c-8079-4751568b1081', 'model_name': 'parakeet-ctc-1.1b-asr'}, sample_rate=None, params=None, **kwargs)[source]
Bases:
RivaSTTService
Deprecated: Use RivaSTTService instead.
- Parameters:
api_key (str)
server (str)
model_function_map (Mapping[str, str])
sample_rate (int | None)
params (InputParams | None)