VisionService

Vision service implementation.

Provides base classes and implementations for computer vision services that can analyze images and generate textual descriptions or answers to questions about visual content.

class pipecat.services.vision_service.VisionService(**kwargs)[source]

Bases: AIService

Base class for vision services.

Provides common functionality for vision services that process images and generate textual responses. Handles image frame processing and integrates with the AI service infrastructure for metrics and lifecycle management.

Parameters:: **kwargs – Additional arguments passed to the parent AIService.

abstractmethod async run_vision(frame)[source]

Process a vision image frame and generate results.

This method must be implemented by subclasses to provide actual computer vision functionality such as image description, object detection, or visual question answering.

Parameters:: frame (VisionImageRawFrame) – The vision image frame to process, containing image data.
Yields:: Frame – Frames containing the vision analysis results, typically TextFrame objects with descriptions or answers.
Return type:: AsyncGenerator[Frame, None]

async process_frame(frame, direction)[source]

Process frames, handling vision image frames for analysis.

Automatically processes VisionImageRawFrame objects by calling run_vision and handles metrics tracking. Other frames are passed through unchanged.

Parameters:

frame (Frame) – The frame to process.
direction (FrameDirection) – The direction of frame processing.