LLM Router¶

Every LLM call in AlexClaw declares a tier requirement. The router selects the cheapest available provider that satisfies the tier, tracks usage, and falls back gracefully.

Tier System¶

Tier	Default Providers	Typical Use
`light`	Gemini Flash, Claude Haiku	RSS scoring, classification, simple tasks
`medium`	Gemini Pro, Claude Sonnet	Summarization, research, security review
`heavy`	Claude Opus	Deep reasoning (explicit only)
`local`	LM Studio, Ollama	Privacy-sensitive, offline, zero cost

Provider Selection¶

Query llm_providers table for enabled providers matching the requested tier
Order by priority (lower number = preferred)
Check daily usage limits (if configured)
Select first available provider
If no provider available for the tier, fall back to local tier

# A skill requests a tier, not a specific model
AlexClaw.LLM.call(prompt, tier: :medium)

Provider Types¶

Type	Examples	API Format
`gemini`	Gemini Flash, Gemini Pro	Google AI Studio API
`anthropic`	Claude Haiku, Sonnet, Opus	Anthropic Messages API
`openai_compatible`	LM Studio, any OpenAI-compatible	OpenAI Chat Completions
`ollama`	Local Ollama models	Ollama `/api/chat` (messages format)

Provider Options¶

Each provider row has an options JSONB column for provider-specific inference parameters (e.g., num_ctx, temperature, top_p). These are sent with every request to that provider and can be edited from Admin > LLM Providers via a dynamic options form that adapts to the provider type. For OpenAI-compatible providers, the client falls back to reasoning_content when content is empty (Qwen3 thinking mode). Qwen3 models also expose a thinking toggle in the Admin UI.

Usage Tracking¶

Counters are keyed by {provider_id, date} in ETS for fast reads
Persisted to llm_usage table so counts survive restarts
Visible in Admin > LLM Providers and the /metrics endpoint

Embedding Support¶

LLM.embed/2 generates 768-dimension vectors for semantic search:

Provider resolution is separate from the completion tier system
Configured via embedding.provider config, or auto-detected: Gemini → Ollama → OpenAI-compatible
Supports Gemini text-embedding-004 (free tier), Ollama /api/embed, and OpenAI /v1/embeddings
Concurrent embedding requests are throttled by EmbedThrottle (GenServer limiter)
Embedding calls are tracked in the same usage counters

Workflow Integration¶

LLM provider selection can be configured at three levels (most specific wins):

Step-level — llm_tier and llm_model fields on the workflow step
Workflow-level — default_provider field on the workflow
Global — tier-based fallback chain

Fully Local Deployment¶

A deployment with no cloud API keys is supported. Enable a local provider (Ollama or LM Studio) and all tiers fall back to it. Zero external API calls.