BaseSTT
- pipecat.services.whisper.base_stt.language_to_whisper_language(language)[source]
Language support for Whisper API.
Docs: https://platform.openai.com/docs/guides/speech-to-text#supported-languages
- Parameters:
language (Language)
- Return type:
str | None
- class pipecat.services.whisper.base_stt.BaseWhisperSTTService(*, model, api_key=None, base_url=None, language=Language.EN, prompt=None, temperature=None, **kwargs)[source]
Bases:
SegmentedSTTService
Base class for Whisper-based speech-to-text services.
Provides common functionality for services implementing the Whisper API interface, including metrics generation and error handling.
- Parameters:
model (str) – Name of the Whisper model to use.
api_key (str | None) – Service API key. Defaults to None.
base_url (str | None) – Service API base URL. Defaults to None.
language (Language | None) – Language of the audio input. Defaults to English.
prompt (str | None) – Optional text to guide the model’s style or continue a previous segment.
temperature (float | None) – Sampling temperature between 0 and 1. Defaults to 0.0.
**kwargs – Additional arguments passed to SegmentedSTTService.
- async set_model(model)[source]
Set the speech recognition model.
- Parameters:
model (str) – The name of the model to use for speech recognition.
- can_generate_metrics()[source]
- Return type:
bool
- language_to_service_language(language)[source]
- Parameters:
language (Language)
- Return type:
str | None
- async set_language(language)[source]
Set the language for transcription.
- Parameters:
language (Language) – The Language enum value to use for transcription.
- async run_stt(audio)[source]
Run speech-to-text on the provided audio data.
This method must be implemented by subclasses to provide actual speech recognition functionality.
- Parameters:
audio (bytes) – Raw audio bytes to transcribe.
- Yields:
Frame – Frames containing transcription results (typically TextFrame).
- Return type:
AsyncGenerator[Frame, None]