STT

class pipecat.services.cartesia.stt.CartesiaLiveOptions(*, model='ink-whisper', language='en', encoding='pcm_s16le', sample_rate=16000, **kwargs)[source]

Bases: object

Parameters:

model (str)
language (str)
encoding (str)
sample_rate (int)

to_dict()[source]

items()[source]

get(key, default=None)[source]

classmethod from_json(json_str)[source]

Parameters:: json_str (str)
Return type:: CartesiaLiveOptions

class pipecat.services.cartesia.stt.CartesiaSTTService(*, api_key, base_url='', sample_rate=16000, live_options=None, **kwargs)[source]

Bases: STTService

Parameters:

api_key (str)
base_url (str)
sample_rate (int)
live_options (CartesiaLiveOptions | None)

can_generate_metrics()[source]

Return type:: bool

async start(frame)[source]

Start the STT service.

Parameters:: frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the AI service.

Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.

Parameters:: frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the AI service.

Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.

Parameters:: frame (CancelFrame) – The cancel frame.

async run_stt(audio)[source]

Run speech-to-text on the provided audio data.

This method must be implemented by subclasses to provide actual speech recognition functionality.

Parameters:: audio (bytes) – Raw audio bytes to transcribe.
Yields:: Frame – Frames containing transcription results (typically TextFrame).
Return type:: AsyncGenerator[Frame, None]

async start_metrics()[source]

async process_frame(frame, direction)[source]

Process frames, handling VAD events and audio segmentation.

Parameters:

frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.