STT
- pipecat.services.google.stt.language_to_google_stt_language(language)[source]
Maps Language enum to Google Speech-to-Text V2 language codes.
- Parameters:
language (Language) – Language enum value.
- Returns:
Google STT language code or None if not supported.
- Return type:
Optional[str]
- class pipecat.services.google.stt.GoogleSTTService(*, credentials=None, credentials_path=None, location='global', sample_rate=None, params=None, **kwargs)[source]
Bases:
STTService
Google Cloud Speech-to-Text V2 service implementation.
Provides real-time speech recognition using Google Cloud’s Speech-to-Text V2 API with streaming support. Handles audio transcription and optional voice activity detection.
- Parameters:
credentials (str | None)
credentials_path (str | None)
location (str)
sample_rate (int | None)
params (InputParams | None)
- InputParams[source]
Configuration parameters for the STT service.
- Parameters:
languages (Language | List[Language])
model (str | None)
use_separate_recognition_per_channel (bool | None)
enable_automatic_punctuation (bool | None)
enable_spoken_punctuation (bool | None)
enable_spoken_emojis (bool | None)
profanity_filter (bool | None)
enable_word_time_offsets (bool | None)
enable_word_confidence (bool | None)
enable_interim_results (bool | None)
enable_voice_activity_events (bool | None)
- Return type:
None
- STREAMING_LIMIT = 240000
- class InputParams(*, languages=<factory>, model='latest_long', use_separate_recognition_per_channel=False, enable_automatic_punctuation=True, enable_spoken_punctuation=False, enable_spoken_emojis=False, profanity_filter=False, enable_word_time_offsets=False, enable_word_confidence=False, enable_interim_results=True, enable_voice_activity_events=False)[source]
Bases:
BaseModel
Configuration parameters for Google Speech-to-Text.
- Parameters:
languages (Language | List[Language])
model (str | None)
use_separate_recognition_per_channel (bool | None)
enable_automatic_punctuation (bool | None)
enable_spoken_punctuation (bool | None)
enable_spoken_emojis (bool | None)
profanity_filter (bool | None)
enable_word_time_offsets (bool | None)
enable_word_confidence (bool | None)
enable_interim_results (bool | None)
enable_voice_activity_events (bool | None)
- languages
Single language or list of recognition languages. First language is primary.
- Type:
pipecat.transcriptions.language.Language | List[pipecat.transcriptions.language.Language]
- model
Speech recognition model to use.
- Type:
str | None
- use_separate_recognition_per_channel
Process each audio channel separately.
- Type:
bool | None
- enable_automatic_punctuation
Add punctuation to transcripts.
- Type:
bool | None
- enable_spoken_punctuation
Include spoken punctuation in transcript.
- Type:
bool | None
- enable_spoken_emojis
Include spoken emojis in transcript.
- Type:
bool | None
- profanity_filter
Filter profanity from transcript.
- Type:
bool | None
- enable_word_time_offsets
Include timing information for each word.
- Type:
bool | None
- enable_word_confidence
Include confidence scores for each word.
- Type:
bool | None
- enable_interim_results
Stream partial recognition results.
- Type:
bool | None
- enable_voice_activity_events
Detect voice activity in audio.
- Type:
bool | None
- languages: Language | List[Language]
- model: str | None
- use_separate_recognition_per_channel: bool | None
- enable_automatic_punctuation: bool | None
- enable_spoken_punctuation: bool | None
- enable_spoken_emojis: bool | None
- profanity_filter: bool | None
- enable_word_time_offsets: bool | None
- enable_word_confidence: bool | None
- enable_interim_results: bool | None
- enable_voice_activity_events: bool | None
- classmethod validate_languages(v)[source]
- Return type:
List[Language]
- property language_list: List[Language]
Get languages as a guaranteed list.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- can_generate_metrics()[source]
- Return type:
bool
- language_to_service_language(language)[source]
Convert Language enum(s) to Google STT language code(s).
- Parameters:
language (Language | List[Language]) – Single Language enum or list of Language enums.
- Returns:
Google STT language code(s).
- Return type:
str | List[str]
- async set_language(language)[source]
Update the service’s recognition language.
A convenience method for setting a single language.
- Parameters:
language (Language) – New language for recognition.
- async set_languages(languages)[source]
Update the service’s recognition languages.
- Parameters:
languages (List[Language]) – List of languages for recognition. First language is primary.
- async set_model(model)[source]
Update the service’s recognition model.
- Parameters:
model (str)
- async start(frame)[source]
Start the STT service.
- Parameters:
frame (StartFrame) – The start frame containing initialization parameters.
- async stop(frame)[source]
Stop the AI service.
Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.
- Parameters:
frame (EndFrame) – The end frame.
- async cancel(frame)[source]
Cancel the AI service.
Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.
- Parameters:
frame (CancelFrame) – The cancel frame.
- async update_options(*, languages=None, model=None, enable_automatic_punctuation=None, enable_spoken_punctuation=None, enable_spoken_emojis=None, profanity_filter=None, enable_word_time_offsets=None, enable_word_confidence=None, enable_interim_results=None, enable_voice_activity_events=None, location=None)[source]
Update service options dynamically.
- Parameters:
languages (List[Language] | None) – New list of recongition languages.
model (str | None) – New recognition model.
enable_automatic_punctuation (bool | None) – Enable/disable automatic punctuation.
enable_spoken_punctuation (bool | None) – Enable/disable spoken punctuation.
enable_spoken_emojis (bool | None) – Enable/disable spoken emojis.
profanity_filter (bool | None) – Enable/disable profanity filter.
enable_word_time_offsets (bool | None) – Enable/disable word timing info.
enable_word_confidence (bool | None) – Enable/disable word confidence scores.
enable_interim_results (bool | None) – Enable/disable interim results.
enable_voice_activity_events (bool | None) – Enable/disable voice activity detection.
location (str | None) – New Google Cloud location.
- Return type:
None
Note
Changes that affect the streaming configuration will cause the stream to be reconnected.
- async run_stt(audio)[source]
Process an audio chunk for STT transcription.
- Parameters:
audio (bytes)
- Return type:
AsyncGenerator[Frame, None]