STT

Deepgram speech-to-text service implementation.

class pipecat.services.deepgram.stt.DeepgramSTTService(*, api_key, url='', base_url='', sample_rate=None, live_options=None, addons=None, **kwargs)[source]

Bases: STTService

Deepgram speech-to-text service.

Provides real-time speech recognition using Deepgram’s WebSocket API. Supports configurable models, languages, VAD events, and various audio processing options.

Parameters:
  • api_key (str) – Deepgram API key for authentication.

  • url (str) – Deprecated. Use base_url instead.

  • base_url (str) – Custom Deepgram API base URL.

  • sample_rate (int | None) – Audio sample rate. If None, uses default or live_options value.

  • live_options (deepgram.LiveOptions | None) – Deepgram LiveOptions for detailed configuration.

  • addons (Dict | None) – Additional Deepgram features to enable.

  • **kwargs – Additional arguments passed to the parent STTService.

property vad_enabled

Check if Deepgram VAD events are enabled.

Returns:

True if VAD events are enabled in the current settings.

can_generate_metrics()[source]

Check if this service can generate processing metrics.

Returns:

True, as Deepgram service supports metrics generation.

Return type:

bool

async set_model(model)[source]

Set the Deepgram model and reconnect.

Parameters:

model (str) – The Deepgram model name to use.

async set_language(language)[source]

Set the recognition language and reconnect.

Parameters:

language (Language) – The language to use for speech recognition.

async start(frame)[source]

Start the Deepgram STT service.

Parameters:

frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the Deepgram STT service.

Parameters:

frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the Deepgram STT service.

Parameters:

frame (CancelFrame) – The cancel frame.

async run_stt(audio)[source]

Send audio data to Deepgram for transcription.

Parameters:

audio (bytes) – Raw audio bytes to transcribe.

Yields:

Frame – None (transcription results come via WebSocket callbacks).

Return type:

AsyncGenerator[Frame, None]

async start_metrics()[source]

Start TTFB and processing metrics collection.

async process_frame(frame, direction)[source]

Process frames with Deepgram-specific handling.

Parameters:
  • frame (Frame) – The frame to process.

  • direction (FrameDirection) – The direction of frame processing.