STT

class pipecat.services.assemblyai.stt.AssemblyAISTTService(*, api_key, language=Language.EN, api_endpoint_base_url='wss://streaming.assemblyai.com/v3/ws', connection_params=AssemblyAIConnectionParams(sample_rate=16000, encoding='pcm_s16le', formatted_finals=True, word_finalization_max_wait_time=None, end_of_turn_confidence_threshold=None, min_end_of_turn_silence_when_confident=None, max_turn_silence=None), vad_force_turn_endpoint=True, **kwargs)[source]

Bases: STTService

Parameters:
  • api_key (str)

  • language (Language)

  • api_endpoint_base_url (str)

  • connection_params (AssemblyAIConnectionParams)

  • vad_force_turn_endpoint (bool)

can_generate_metrics()[source]
Return type:

bool

async start(frame)[source]

Start the STT service.

Parameters:

frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the AI service.

Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.

Parameters:

frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the AI service.

Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.

Parameters:

frame (CancelFrame) – The cancel frame.

async run_stt(audio)[source]

Run speech-to-text on the provided audio data.

This method must be implemented by subclasses to provide actual speech recognition functionality.

Parameters:

audio (bytes) – Raw audio bytes to transcribe.

Yields:

Frame – Frames containing transcription results (typically TextFrame).

Return type:

AsyncGenerator[Frame, None]

async process_frame(frame, direction)[source]

Process frames, handling VAD events and audio segmentation.

Parameters:
  • frame (Frame) – The frame to process.

  • direction (FrameDirection) – The direction of frame processing.