STT

Deepgram speech-to-text service implementation.

class pipecat.services.deepgram.stt.DeepgramSTTService(*, api_key, url='', base_url='', sample_rate=None, live_options=None, addons=None, **kwargs)[source]

Bases: STTService

Deepgram speech-to-text service.

Provides real-time speech recognition using Deepgram’s WebSocket API. Supports configurable models, languages, VAD events, and various audio processing options.

Parameters:

api_key (str) – Deepgram API key for authentication.
url (str) – Deprecated. Use base_url instead.
base_url (str) – Custom Deepgram API base URL.
sample_rate (int | None) – Audio sample rate. If None, uses default or live_options value.
live_options (deepgram.LiveOptions | None) – Deepgram LiveOptions for detailed configuration.
addons (Dict | None) – Additional Deepgram features to enable.
**kwargs – Additional arguments passed to the parent STTService.

property vad_enabled

Check if Deepgram VAD events are enabled.

Returns:: True if VAD events are enabled in the current settings.

can_generate_metrics()[source]

Check if this service can generate processing metrics.

Returns:: True, as Deepgram service supports metrics generation.
Return type:: bool

async set_model(model)[source]

Set the Deepgram model and reconnect.

Parameters:: model (str) – The Deepgram model name to use.

async set_language(language)[source]

Set the recognition language and reconnect.

Parameters:: language (Language) – The language to use for speech recognition.

async start(frame)[source]

Start the Deepgram STT service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the Deepgram STT service.

Parameters:: frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the Deepgram STT service.

Parameters:: frame (CancelFrame) – The cancel frame.

async run_stt(audio)[source]

Send audio data to Deepgram for transcription.

Parameters:: audio (bytes) – Raw audio bytes to transcribe.
Yields:: Frame – None (transcription results come via WebSocket callbacks).
Return type:: AsyncGenerator[Frame, None]

async start_metrics()[source]: Start TTFB and processing metrics collection.

async process_frame(frame, direction)[source]

Process frames with Deepgram-specific handling.

Parameters:

frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.