LLM

class pipecat.services.nim.llm.NimLLMService(*, api_key, base_url='https://integrate.api.nvidia.com/v1', model='nvidia/llama-3.1-nemotron-70b-instruct', **kwargs)[source]

Bases: OpenAILLMService

A service for interacting with NVIDIA’s NIM (NVIDIA Inference Microservice) API.

This service extends OpenAILLMService to work with NVIDIA’s NIM API while maintaining compatibility with the OpenAI-style interface. It specifically handles the difference in token usage reporting between NIM (incremental) and OpenAI (final summary).

Parameters:

api_key (str) – The API key for accessing NVIDIA’s NIM API
base_url (str, optional) – The base URL for NIM API. Defaults to “https://integrate.api.nvidia.com/v1”
model (str, optional) – The model identifier to use. Defaults to “nvidia/llama-3.1-nemotron-70b-instruct”
**kwargs – Additional keyword arguments passed to OpenAILLMService

async start_llm_usage_metrics(tokens)[source]

Accumulate token usage metrics during processing.

This method intercepts the incremental token updates from NVIDIA’s API and accumulates them instead of passing each update to the metrics system. The final accumulated totals are reported at the end of processing.

Parameters:: tokens (LLMTokenUsage) – The token usage metrics for the current chunk of processing, containing prompt_tokens and completion_tokens counts.