BaseSTT

pipecat.services.whisper.base_stt.language_to_whisper_language(language)[source]

Language support for Whisper API.

Docs: https://platform.openai.com/docs/guides/speech-to-text#supported-languages

Parameters:

language (Language)

Return type:

str | None

class pipecat.services.whisper.base_stt.BaseWhisperSTTService(*, model, api_key=None, base_url=None, language=Language.EN, prompt=None, temperature=None, **kwargs)[source]

Bases: SegmentedSTTService

Base class for Whisper-based speech-to-text services.

Provides common functionality for services implementing the Whisper API interface, including metrics generation and error handling.

Parameters:
  • model (str) – Name of the Whisper model to use.

  • api_key (str | None) – Service API key. Defaults to None.

  • base_url (str | None) – Service API base URL. Defaults to None.

  • language (Language | None) – Language of the audio input. Defaults to English.

  • prompt (str | None) – Optional text to guide the model’s style or continue a previous segment.

  • temperature (float | None) – Sampling temperature between 0 and 1. Defaults to 0.0.

  • **kwargs – Additional arguments passed to SegmentedSTTService.

async set_model(model)[source]

Set the speech recognition model.

Parameters:

model (str) – The name of the model to use for speech recognition.

can_generate_metrics()[source]
Return type:

bool

language_to_service_language(language)[source]
Parameters:

language (Language)

Return type:

str | None

async set_language(language)[source]

Set the language for transcription.

Parameters:

language (Language) – The Language enum value to use for transcription.

async run_stt(audio)[source]

Run speech-to-text on the provided audio data.

This method must be implemented by subclasses to provide actual speech recognition functionality.

Parameters:

audio (bytes) – Raw audio bytes to transcribe.

Yields:

Frame – Frames containing transcription results (typically TextFrame).

Return type:

AsyncGenerator[Frame, None]