STT

pipecat.services.fal.stt.language_to_fal_language(language)[source]

Language support for Fal’s Wizper API.

Parameters:

language (Language)

Return type:

str | None

class pipecat.services.fal.stt.FalSTTService(*, api_key=None, sample_rate=None, params=None, **kwargs)[source]

Bases: SegmentedSTTService

Speech-to-text service using Fal’s Wizper API.

This service uses Fal’s Wizper API to perform speech-to-text transcription on audio segments. It inherits from SegmentedSTTService to handle audio buffering and speech detection.

Parameters:
  • api_key (str | None) – Fal API key. If not provided, will check FAL_KEY environment variable.

  • sample_rate (int | None) – Audio sample rate in Hz. If not provided, uses the pipeline’s rate.

  • params (InputParams | None) – Configuration parameters for the Wizper API.

  • **kwargs – Additional arguments passed to SegmentedSTTService.

class InputParams(*, language=Language.EN, task='transcribe', chunk_level='segment', version='3')[source]

Bases: BaseModel

Configuration parameters for Fal’s Wizper API.

Parameters:
  • language (Language | None)

  • task (str)

  • chunk_level (str)

  • version (str)

language

Language of the audio input. Defaults to English.

Type:

pipecat.transcriptions.language.Language | None

task

Task to perform (‘transcribe’ or ‘translate’). Defaults to ‘transcribe’.

Type:

str

chunk_level

Level of chunking (‘segment’). Defaults to ‘segment’.

Type:

str

version

Version of Wizper model to use. Defaults to ‘3’.

Type:

str

language: Language | None
task: str
chunk_level: str
version: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

can_generate_metrics()[source]
Return type:

bool

language_to_service_language(language)[source]
Parameters:

language (Language)

Return type:

str | None

async set_language(language)[source]

Set the language for speech recognition.

Parameters:

language (Language) – The language to use for speech recognition.

async set_model(model)[source]

Set the speech recognition model.

Parameters:

model (str) – The name of the model to use for speech recognition.

async run_stt(audio)[source]

Transcribes an audio segment using Fal’s Wizper API.

Parameters:

audio (bytes) – Raw audio bytes in WAV format (already converted by base class).

Yields:

Frame – TranscriptionFrame containing the transcribed text.

Return type:

AsyncGenerator[Frame, None]

Note

The audio is already in WAV format from the SegmentedSTTService. Only non-empty transcriptions are yielded.