Guide
LLM Providers
Connect Omni to one or more LLM providers. Use cloud APIs, run models locally, or bring your own endpoint.
Overview
Omni's LLM bridge is a provider-agnostic abstraction layer. You configure one or more providers, and Omni handles streaming, token counting, and automatic failover. All providers support tool calling (function calling), which is how the agent invokes native tools and extension tools.
OpenAI
GPT-4o, GPT-4, o1, etc.
Anthropic
Claude Opus, Sonnet, Haiku
Google Gemini
Gemini Pro, Ultra, Flash
Ollama
Llama, Mistral, Phi, etc.
AWS Bedrock
Claude, Titan, Llama via AWS
Custom HTTP
Any OpenAI-compatible API
OpenAI
Access GPT-4o, GPT-4, and other OpenAI models. Requires an API key from platform.openai.com.
[providers.openai]
provider_type = "openai"
default_model = "gpt-4o"
max_tokens = 4096
temperature = 0.7
The API key is stored in the OS keychain (not the config file). Enter it through Settings → Providers → OpenAI. Token counting uses the cl100k_base tokenizer via tiktoken-rs for accurate billing estimation.
Anthropic
Access Claude models. Requires an API key from console.anthropic.com.
[providers.anthropic]
provider_type = "anthropic"
default_model = "claude-opus-4-6"
temperature = 0.8
Uses streaming SSE for real-time responses. Token counting uses cl100k_base (same as OpenAI) for estimation.
Google Gemini
Access Gemini models. Get an API key from aistudio.google.com.
[providers.gemini]
provider_type = "gemini"
default_model = "gemini-pro"
max_tokens = 4096
Uses the Gemini API v1 beta. Token counting uses character-based estimation (characters / 4).
Ollama (Local)
Run models locally on your machine using Ollama. No API key needed — completely private and offline.
# Install Ollama and pull a model
ollama pull llama3.1
ollama pull mistral
ollama pull phi3
[providers.ollama]
provider_type = "ollama"
default_model = "llama3.1"
endpoint = "http://localhost:11434"
Ollama runs on port 11434 by default. Set the endpoint if you're running it on a different host or port. Token counting uses character-based estimation.
AWS Bedrock
Access models through your AWS account using Bedrock. Supports Claude, Titan, Llama, and other models available in your AWS region.
[providers.bedrock]
provider_type = "bedrock"
default_model = "anthropic.claude-v2"
endpoint = "https://bedrock-runtime.us-east-1.amazonaws.com"
Authentication uses AWS SigV4 signing. Configure your AWS credentials via environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or the standard AWS credentials file. Uses the InvokeModelWithResponseStream API for streaming.
Custom HTTP
Connect to any API endpoint that follows the OpenAI chat completions format. Use this for self-hosted models, LLM proxies, or alternative providers.
[providers.my-proxy]
provider_type = "custom"
default_model = "my-model"
endpoint = "https://my-llm-proxy.example.com/v1"
max_tokens = 2048
The endpoint must support the /chat/completions path with SSE streaming. API key authentication is optional.
Provider Rotation & Fallback
When you configure multiple providers, Omni automatically handles failover. If the primary provider returns an error or times out, the request is retried with the next available provider using exponential backoff.
Backoff Schedule
1st retry
5 seconds
2nd retry
15 seconds
3rd retry
60 seconds
4th retry
300 seconds
Providers are tried in order of their configured priority. A provider that repeatedly fails is temporarily marked unavailable and skipped until its backoff period expires. You can disable a provider without removing it by setting enabled = false.
Streaming
All providers use Server-Sent Events (SSE) streaming for real-time token delivery. As the LLM generates its response, tokens are streamed to the UI character by character.
The streaming architecture uses Tokio async streams with byte buffer accumulation. Each chunk can contain:
TextDelta
Partial text content
ToolCallDelta
Tool call being assembled
Usage
Token count update
Done
Stream complete signal
Token Counting
Omni tracks token usage per request to help you monitor costs. The counting method varies by provider.