STT
- class pipecat.services.cartesia.stt.CartesiaLiveOptions(*, model='ink-whisper', language='en', encoding='pcm_s16le', sample_rate=16000, **kwargs)[source]
Bases:
object
- Parameters:
model (str)
language (str)
encoding (str)
sample_rate (int)
- to_dict()[source]
- items()[source]
- get(key, default=None)[source]
- classmethod from_json(json_str)[source]
- Parameters:
json_str (str)
- Return type:
CartesiaLiveOptions
- class pipecat.services.cartesia.stt.CartesiaSTTService(*, api_key, base_url='', sample_rate=16000, live_options=None, **kwargs)[source]
Bases:
STTService
- Parameters:
api_key (str)
base_url (str)
sample_rate (int)
live_options (CartesiaLiveOptions | None)
- can_generate_metrics()[source]
- Return type:
bool
- async start(frame)[source]
Start the STT service.
- Parameters:
frame (StartFrame) – The start frame containing initialization parameters.
- async stop(frame)[source]
Stop the AI service.
Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.
- Parameters:
frame (EndFrame) – The end frame.
- async cancel(frame)[source]
Cancel the AI service.
Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.
- Parameters:
frame (CancelFrame) – The cancel frame.
- async run_stt(audio)[source]
Run speech-to-text on the provided audio data.
This method must be implemented by subclasses to provide actual speech recognition functionality.
- Parameters:
audio (bytes) – Raw audio bytes to transcribe.
- Yields:
Frame – Frames containing transcription results (typically TextFrame).
- Return type:
AsyncGenerator[Frame, None]
- async start_metrics()[source]
- async process_frame(frame, direction)[source]
Process frames, handling VAD events and audio segmentation.
- Parameters:
frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.