STT

class pipecat.services.cartesia.stt.CartesiaLiveOptions(*, model='ink-whisper', language='en', encoding='pcm_s16le', sample_rate=16000, **kwargs)[source]

Bases: object

Parameters:
  • model (str)

  • language (str)

  • encoding (str)

  • sample_rate (int)

to_dict()[source]
items()[source]
get(key, default=None)[source]
classmethod from_json(json_str)[source]
Parameters:

json_str (str)

Return type:

CartesiaLiveOptions

class pipecat.services.cartesia.stt.CartesiaSTTService(*, api_key, base_url='', sample_rate=16000, live_options=None, **kwargs)[source]

Bases: STTService

Parameters:
  • api_key (str)

  • base_url (str)

  • sample_rate (int)

  • live_options (CartesiaLiveOptions | None)

can_generate_metrics()[source]
Return type:

bool

async start(frame)[source]

Start the STT service.

Parameters:

frame (StartFrame) – The start frame containing initialization parameters.

async stop(frame)[source]

Stop the AI service.

Called when the service should stop processing. Subclasses should override this method to perform cleanup operations.

Parameters:

frame (EndFrame) – The end frame.

async cancel(frame)[source]

Cancel the AI service.

Called when the service should cancel all operations. Subclasses should override this method to handle cancellation logic.

Parameters:

frame (CancelFrame) – The cancel frame.

async run_stt(audio)[source]

Run speech-to-text on the provided audio data.

This method must be implemented by subclasses to provide actual speech recognition functionality.

Parameters:

audio (bytes) – Raw audio bytes to transcribe.

Yields:

Frame – Frames containing transcription results (typically TextFrame).

Return type:

AsyncGenerator[Frame, None]

async start_metrics()[source]
async process_frame(frame, direction)[source]

Process frames, handling VAD events and audio segmentation.

Parameters:
  • frame (Frame) – The frame to process.

  • direction (FrameDirection) – The direction of frame processing.