STT
- class pipecat.services.assemblyai.stt.AssemblyAISTTService(*, api_key, language=Language.EN, api_endpoint_base_url='wss://streaming.assemblyai.com/v3/ws', connection_params=AssemblyAIConnectionParams(sample_rate=16000, encoding='pcm_s16le', formatted_finals=True, word_finalization_max_wait_time=None, end_of_turn_confidence_threshold=None, min_end_of_turn_silence_when_confident=None, max_turn_silence=None), vad_force_turn_endpoint=True, **kwargs)[source]
Bases:
STTService
- Parameters:
api_key (str)
language (Language)
api_endpoint_base_url (str)
connection_params (AssemblyAIConnectionParams)
vad_force_turn_endpoint (bool)
- can_generate_metrics()[source]
- Return type:
bool
- async start(frame)[source]
Start the STT service.
- Parameters:
frame (StartFrame) – The start frame containing initialization parameters.
- async stop(frame)[source]
Stop the AI service.
Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.
- Parameters:
frame (EndFrame) – The end frame.
- async cancel(frame)[source]
Cancel the AI service.
Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.
- Parameters:
frame (CancelFrame) – The cancel frame.
- async run_stt(audio)[source]
Run speech-to-text on the provided audio data.
This method must be implemented by subclasses to provide actual speech recognition functionality.
- Parameters:
audio (bytes) – Raw audio bytes to transcribe.
- Yields:
Frame – Frames containing transcription results (typically TextFrame).
- Return type:
AsyncGenerator[Frame, None]
- async process_frame(frame, direction)[source]
Process frames, handling VAD events and audio segmentation.
- Parameters:
frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.