Gemini
- pipecat.services.gemini_multimodal_live.gemini.language_to_gemini_language(language)[source]
Maps a Language enum value to a Gemini Live supported language code.
Source: https://ai.google.dev/api/generate-content#MediaResolution
Returns None if the language is not supported by Gemini Live.
- Parameters:
language (Language)
- Return type:
str | None
- class pipecat.services.gemini_multimodal_live.gemini.GeminiMultimodalLiveContext(messages=None, tools=NOT_GIVEN, tool_choice=NOT_GIVEN)[source]
Bases:
OpenAILLMContext
- Parameters:
messages (List[ChatCompletionDeveloperMessageParam | ChatCompletionSystemMessageParam | ChatCompletionUserMessageParam | ChatCompletionAssistantMessageParam | ChatCompletionToolMessageParam | ChatCompletionFunctionMessageParam] | None)
tools (List[ChatCompletionToolParam] | NotGiven | ToolsSchema)
tool_choice (Literal['none', 'auto', 'required'] | ~openai.types.chat.chat_completion_named_tool_choice_param.ChatCompletionNamedToolChoiceParam | ~openai.NotGiven)
- static upgrade(obj)[source]
- Parameters:
obj (OpenAILLMContext)
- Return type:
GeminiMultimodalLiveContext
- extract_system_instructions()[source]
- get_messages_for_initializing_history()[source]
- class pipecat.services.gemini_multimodal_live.gemini.GeminiMultimodalLiveUserContextAggregator(context, *, params=None, **kwargs)[source]
Bases:
OpenAIUserContextAggregator
- Parameters:
context (OpenAILLMContext)
params (LLMUserAggregatorParams | None)
- async process_frame(frame, direction)[source]
- class pipecat.services.gemini_multimodal_live.gemini.GeminiMultimodalLiveAssistantContextAggregator(context, *, params=None, **kwargs)[source]
Bases:
OpenAIAssistantContextAggregator
- Parameters:
context (OpenAILLMContext)
params (LLMAssistantAggregatorParams | None)
- async process_frame(frame, direction)[source]
- Parameters:
frame (Frame)
direction (FrameDirection)
- async handle_user_image_frame(frame)[source]
Handle a user image frame from a function call request.
Marks the associated function call as completed and adds the image to the context for processing.
- Parameters:
frame (UserImageRawFrame) – Frame containing the user image and request context.
- class pipecat.services.gemini_multimodal_live.gemini.GeminiMultimodalLiveContextAggregatorPair(_user: pipecat.services.gemini_multimodal_live.gemini.GeminiMultimodalLiveUserContextAggregator, _assistant: pipecat.services.gemini_multimodal_live.gemini.GeminiMultimodalLiveAssistantContextAggregator)[source]
Bases:
object
- Parameters:
_user (GeminiMultimodalLiveUserContextAggregator)
_assistant (GeminiMultimodalLiveAssistantContextAggregator)
- user()[source]
- Return type:
GeminiMultimodalLiveUserContextAggregator
- assistant()[source]
- Return type:
GeminiMultimodalLiveAssistantContextAggregator
- class pipecat.services.gemini_multimodal_live.gemini.GeminiMultimodalModalities(*values)[source]
Bases:
Enum
- TEXT = 'TEXT'
- AUDIO = 'AUDIO'
- class pipecat.services.gemini_multimodal_live.gemini.GeminiMediaResolution(*values)[source]
Bases:
str
,Enum
Media resolution options for Gemini Multimodal Live.
- UNSPECIFIED = 'MEDIA_RESOLUTION_UNSPECIFIED'
- LOW = 'MEDIA_RESOLUTION_LOW'
- MEDIUM = 'MEDIA_RESOLUTION_MEDIUM'
- HIGH = 'MEDIA_RESOLUTION_HIGH'
- class pipecat.services.gemini_multimodal_live.gemini.GeminiVADParams(*, disabled=None, start_sensitivity=None, end_sensitivity=None, prefix_padding_ms=None, silence_duration_ms=None)[source]
Bases:
BaseModel
Voice Activity Detection parameters.
- Parameters:
disabled (bool | None)
start_sensitivity (StartSensitivity | None)
end_sensitivity (EndSensitivity | None)
prefix_padding_ms (int | None)
silence_duration_ms (int | None)
- disabled: bool | None
- start_sensitivity: StartSensitivity | None
- end_sensitivity: EndSensitivity | None
- prefix_padding_ms: int | None
- silence_duration_ms: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pipecat.services.gemini_multimodal_live.gemini.ContextWindowCompressionParams(*, enabled=False, trigger_tokens=None)[source]
Bases:
BaseModel
Parameters for context window compression.
- Parameters:
enabled (bool)
trigger_tokens (int | None)
- enabled: bool
- trigger_tokens: int | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pipecat.services.gemini_multimodal_live.gemini.InputParams(*, frequency_penalty=None, max_tokens=4096, presence_penalty=None, temperature=None, top_k=None, top_p=None, modalities=GeminiMultimodalModalities.AUDIO, language=Language.EN_US, media_resolution=GeminiMediaResolution.UNSPECIFIED, vad=None, context_window_compression=None, extra=<factory>)[source]
Bases:
BaseModel
- Parameters:
frequency_penalty (float | None)
max_tokens (int | None)
presence_penalty (float | None)
temperature (float | None)
top_k (int | None)
top_p (float | None)
modalities (GeminiMultimodalModalities | None)
language (Language | None)
media_resolution (GeminiMediaResolution | None)
vad (GeminiVADParams | None)
context_window_compression (ContextWindowCompressionParams | None)
extra (Dict[str, Any] | None)
- frequency_penalty: float | None
- max_tokens: int | None
- presence_penalty: float | None
- temperature: float | None
- top_k: int | None
- top_p: float | None
- modalities: GeminiMultimodalModalities | None
- language: Language | None
- media_resolution: GeminiMediaResolution | None
- vad: GeminiVADParams | None
- context_window_compression: ContextWindowCompressionParams | None
- extra: Dict[str, Any] | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class pipecat.services.gemini_multimodal_live.gemini.GeminiMultimodalLiveLLMService(*, api_key, base_url='generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent', model='models/gemini-2.0-flash-live-001', voice_id='Charon', start_audio_paused=False, start_video_paused=False, system_instruction=None, tools=None, params=None, inference_on_context_initialization=True, **kwargs)[source]
Bases:
LLMService
Provides access to Google’s Gemini Multimodal Live API.
This service enables real-time conversations with Gemini, supporting both text and audio modalities. It handles voice transcription, streaming audio responses, and tool usage.
- Parameters:
api_key (str) – Google AI API key
base_url (str, optional) – API endpoint base URL. Defaults to “generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent”.
model (str, optional) – Model identifier to use. Defaults to “models/gemini-2.0-flash-live-001”.
voice_id (str, optional) – TTS voice identifier. Defaults to “Charon”.
start_audio_paused (bool, optional) – Whether to start with audio input paused. Defaults to False.
start_video_paused (bool, optional) – Whether to start with video input paused. Defaults to False.
system_instruction (str, optional) – System prompt for the model. Defaults to None.
tools (Union[List[dict], ToolsSchema], optional) – Tools/functions available to the model. Defaults to None.
params (InputParams, optional) – Configuration parameters for the model. Defaults to InputParams().
inference_on_context_initialization (bool, optional) – Whether to generate a response when context is first set. Defaults to True.
- adapter_class
alias of
GeminiLLMAdapter
- can_generate_metrics()[source]
- Return type:
bool
- set_audio_input_paused(paused)[source]
- Parameters:
paused (bool)
- set_video_input_paused(paused)[source]
- Parameters:
paused (bool)
- set_model_modalities(modalities)[source]
- Parameters:
modalities (GeminiMultimodalModalities)
- set_language(language)[source]
Set the language for generation.
- Parameters:
language (Language)
- async set_context(context)[source]
Set the context explicitly from outside the pipeline.
This is useful when initializing a conversation because in server-side VAD mode we might not have a way to trigger the pipeline. This sends the history to the server. The inference_on_context_initialization flag controls whether to set the turnComplete flag when we do this. Without that flag, the model will not respond. This is often what we want when setting the context at the beginning of a conversation.
- Parameters:
context (OpenAILLMContext)
- async start(frame)[source]
Start the LLM service.
- Parameters:
frame (StartFrame) – The start frame.
- async stop(frame)[source]
Stop the LLM service.
- Parameters:
frame (EndFrame) – The end frame.
- async cancel(frame)[source]
Cancel the LLM service.
- Parameters:
frame (CancelFrame) – The cancel frame.
- async process_frame(frame, direction)[source]
Process a frame.
- Parameters:
frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.
- async send_client_event(event)[source]
- create_context_aggregator(context, *, user_params=LLMUserAggregatorParams(aggregation_timeout=0.5), assistant_params=LLMAssistantAggregatorParams(expect_stripped_words=True))[source]
Create an instance of GeminiMultimodalLiveContextAggregatorPair from an OpenAILLMContext. Constructor keyword arguments for both the user and assistant aggregators can be provided.
- Parameters:
context (OpenAILLMContext) – The LLM context.
user_params (LLMUserAggregatorParams, optional) – User aggregator parameters.
assistant_params (LLMAssistantAggregatorParams, optional) – User aggregator parameters.
- Returns:
A pair of context aggregators, one for the user and one for the assistant, encapsulated in an GeminiMultimodalLiveContextAggregatorPair.
- Return type:
GeminiMultimodalLiveContextAggregatorPair