STT

pipecat.services.google.stt.language_to_google_stt_language(language)[source]

Maps Language enum to Google Speech-to-Text V2 language codes.

Parameters:

language (Language) – Language enum value.

Returns:

Google STT language code or None if not supported.

Return type:

Optional[str]

class pipecat.services.google.stt.GoogleSTTService(*, credentials=None, credentials_path=None, location='global', sample_rate=None, params=None, **kwargs)[source]

Bases: STTService

Google Cloud Speech-to-Text V2 service implementation.

Provides real-time speech recognition using Google Cloud’s Speech-to-Text V2 API with streaming support. Handles audio transcription and optional voice activity detection.

Parameters:
  • credentials (str | None)

  • credentials_path (str | None)

  • location (str)

  • sample_rate (int | None)

  • params (InputParams | None)

InputParams[source]

Configuration parameters for the STT service.

Parameters:
  • languages (Language | List[Language])

  • model (str | None)

  • use_separate_recognition_per_channel (bool | None)

  • enable_automatic_punctuation (bool | None)

  • enable_spoken_punctuation (bool | None)

  • enable_spoken_emojis (bool | None)

  • profanity_filter (bool | None)

  • enable_word_time_offsets (bool | None)

  • enable_word_confidence (bool | None)

  • enable_interim_results (bool | None)

  • enable_voice_activity_events (bool | None)

Return type:

None

STREAMING_LIMIT = 240000
class InputParams(*, languages=<factory>, model='latest_long', use_separate_recognition_per_channel=False, enable_automatic_punctuation=True, enable_spoken_punctuation=False, enable_spoken_emojis=False, profanity_filter=False, enable_word_time_offsets=False, enable_word_confidence=False, enable_interim_results=True, enable_voice_activity_events=False)[source]

Bases: BaseModel

Configuration parameters for Google Speech-to-Text.

Parameters:
  • languages (Language | List[Language])

  • model (str | None)

  • use_separate_recognition_per_channel (bool | None)

  • enable_automatic_punctuation (bool | None)

  • enable_spoken_punctuation (bool | None)

  • enable_spoken_emojis (bool | None)

  • profanity_filter (bool | None)

  • enable_word_time_offsets (bool | None)

  • enable_word_confidence (bool | None)

  • enable_interim_results (bool | None)

  • enable_voice_activity_events (bool | None)

languages

Single language or list of recognition languages. First language is primary.

Type:

pipecat.transcriptions.language.Language | List[pipecat.transcriptions.language.Language]

model

Speech recognition model to use.

Type:

str | None

use_separate_recognition_per_channel

Process each audio channel separately.

Type:

bool | None

enable_automatic_punctuation

Add punctuation to transcripts.

Type:

bool | None

enable_spoken_punctuation

Include spoken punctuation in transcript.

Type:

bool | None

enable_spoken_emojis

Include spoken emojis in transcript.

Type:

bool | None

profanity_filter

Filter profanity from transcript.

Type:

bool | None

enable_word_time_offsets

Include timing information for each word.

Type:

bool | None

enable_word_confidence

Include confidence scores for each word.

Type:

bool | None

enable_interim_results

Stream partial recognition results.

Type:

bool | None

enable_voice_activity_events

Detect voice activity in audio.

Type:

bool | None

languages: Language | List[Language]
model: str | None
use_separate_recognition_per_channel: bool | None
enable_automatic_punctuation: bool | None
enable_spoken_punctuation: bool | None
enable_spoken_emojis: bool | None
profanity_filter: bool | None
enable_word_time_offsets: bool | None
enable_word_confidence: bool | None
enable_interim_results: bool | None
enable_voice_activity_events: bool | None
classmethod validate_languages(v)[source]
Return type:

List[Language]

property language_list: List[Language]

Get languages as a guaranteed list.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]
Return type:

bool

language_to_service_language(language)[source]

Convert Language enum(s) to Google STT language code(s).

Parameters:

language (Language | List[Language]) – Single Language enum or list of Language enums.

Returns:

Google STT language code(s).

Return type:

str | List[str]

async set_language(language)[source]

Update the service’s recognition language.

A convenience method for setting a single language.

Parameters:

language (Language) – New language for recognition.

async set_languages(languages)[source]

Update the service’s recognition languages.

Parameters:

languages (List[Language]) – List of languages for recognition. First language is primary.

async set_model(model)[source]

Update the service’s recognition model.

Parameters:

model (str)

async start(frame)[source]

Start the STT service.

Parameters:

frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the AI service.

Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.

Parameters:

frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the AI service.

Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.

Parameters:

frame (CancelFrame) – The cancel frame.

async update_options(*, languages=None, model=None, enable_automatic_punctuation=None, enable_spoken_punctuation=None, enable_spoken_emojis=None, profanity_filter=None, enable_word_time_offsets=None, enable_word_confidence=None, enable_interim_results=None, enable_voice_activity_events=None, location=None)[source]

Update service options dynamically.

Parameters:
  • languages (List[Language] | None) – New list of recongition languages.

  • model (str | None) – New recognition model.

  • enable_automatic_punctuation (bool | None) – Enable/disable automatic punctuation.

  • enable_spoken_punctuation (bool | None) – Enable/disable spoken punctuation.

  • enable_spoken_emojis (bool | None) – Enable/disable spoken emojis.

  • profanity_filter (bool | None) – Enable/disable profanity filter.

  • enable_word_time_offsets (bool | None) – Enable/disable word timing info.

  • enable_word_confidence (bool | None) – Enable/disable word confidence scores.

  • enable_interim_results (bool | None) – Enable/disable interim results.

  • enable_voice_activity_events (bool | None) – Enable/disable voice activity detection.

  • location (str | None) – New Google Cloud location.

Return type:

None

Note

Changes that affect the streaming configuration will cause the stream to be reconnected.

async run_stt(audio)[source]

Process an audio chunk for STT transcription.

Parameters:

audio (bytes)

Return type:

AsyncGenerator[Frame, None]