STT

class pipecat.services.assemblyai.stt.AssemblyAISTTService(*, api_key, language=Language.EN, api_endpoint_base_url='wss://streaming.assemblyai.com/v3/ws', connection_params=AssemblyAIConnectionParams(sample_rate=16000, encoding='pcm_s16le', formatted_finals=True, word_finalization_max_wait_time=None, end_of_turn_confidence_threshold=None, min_end_of_turn_silence_when_confident=None, max_turn_silence=None), vad_force_turn_endpoint=True, **kwargs)[source]

Bases: STTService

Parameters:

api_key (str)
language (Language)
api_endpoint_base_url (str)
connection_params (AssemblyAIConnectionParams)
vad_force_turn_endpoint (bool)

can_generate_metrics()[source]

Return type:: bool

async start(frame)[source]

Start the STT service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the AI service.

Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.

Parameters:: frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the AI service.

Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.

Parameters:: frame (CancelFrame) – The cancel frame.

async run_stt(audio)[source]

Run speech-to-text on the provided audio data.

This method must be implemented by subclasses to provide actual speech recognition functionality.

Parameters:: audio (bytes) – Raw audio bytes to transcribe.
Yields:: Frame – Frames containing transcription results (typically TextFrame).
Return type:: AsyncGenerator[Frame, None]

async process_frame(frame, direction)[source]

Process frames, handling VAD events and audio segmentation.

Parameters:

frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.