ZeroTrace Companion
Model Recommendations
Which model to pick across local and cloud — by RAM budget, by use case, by language, by cost.
Companion's AI assistant works with any model your provider supports. This page recommends models by use case so you don't have to evaluate the entire ecosystem yourself.
Local — by RAM budget
For local providers (Ollama, LM Studio):
| Available RAM | Recommended model | Why |
|---|---|---|
| 8 GB | qwen2.5:3b or llama3.2:3b | Smallest practical models with tool calling |
| 16 GB | qwen2.5:7b or llama3.1:8b | Sweet spot — good quality, fast on Apple Silicon / mid GPU |
| 32 GB | qwen2.5:14b or mistral-small | Noticeably more capable, especially for complex tool chains |
| 64 GB+ | qwen2.5:32b or llama3.1:70b (quantised) | Top-tier local quality; requires patience or strong GPU |
For a starting point, qwen2.5:7b is the broad recommendation. Good tool-calling support, multilingual, runs well on a typical modern laptop.
Cloud — by provider
OpenAI
| Model | Best for | Approximate cost |
|---|---|---|
| gpt-4o-mini | Default. Cheap, capable, fast. | Cents per typical chat |
| gpt-4o | When you need top quality | Several cents per chat |
| o1-mini / o3-mini | Reasoning-heavy queries | Higher per-response cost |
| o1 / o3 | Hardest reasoning tasks | Significantly more |
For most Companion workflows, gpt-4o-mini is the right default. Bump to gpt-4o when complex tool chains start failing.
Anthropic
| Model | Best for | Approximate cost |
|---|---|---|
| claude-haiku-4-5 | Default. Fast, cheap, good at tool calling. | Cents per typical chat |
| claude-sonnet-4-5 | Default for serious work. Excellent reasoning. | Several cents per chat |
| claude-opus-4-5 | Top-tier reasoning when you need it | Significantly more |
For Companion, Claude Sonnet 4.5 is the sweet-spot recommendation if cost isn't tight. Claude Haiku 4.5 is a great budget pick for routine queries.
OpenRouter
OpenRouter exposes hundreds of models. Worth trying:
| Model | Why |
|---|---|
qwen/qwen-2.5-72b-instruct | Cheap, capable, strong at tool calling |
anthropic/claude-sonnet-4-5 | Same Sonnet via OpenRouter (slightly different pricing) |
openai/gpt-4o-mini | Same gpt-4o-mini via OpenRouter |
meta-llama/llama-3.3-70b-instruct | Open-weights via cloud, good quality |
deepseek/deepseek-chat | Surprisingly strong cheap option |
google/gemini-2.5-pro | Long context, alternative reasoning style |
The big win with OpenRouter: try many models without separate accounts. Pricing per model is shown on openrouter.ai/models.
By use case
Quick conversational Q&A
Pick the smallest / cheapest model that runs comfortably:
- Local: 3B-7B model.
- Cloud: gpt-4o-mini / claude-haiku-4-5.
A small model answers "what is RSSI" in two seconds; a large model takes ten and gives the same answer.
Complex tool chains
Larger models plan multi-step tool calls more reliably:
- Local: 14B+ models.
- Cloud: gpt-4o, claude-sonnet-4-5+, o1.
For investigations where the assistant chains four or five tools, the better planners are worth the cost.
Code generation (HID scripts, automation)
| Provider | Pick |
|---|---|
| Local | qwen2.5-coder:7b or qwen2.5-coder:14b |
| Cloud (OpenAI) | gpt-4o |
| Cloud (Anthropic) | claude-sonnet-4-5 |
| OpenRouter | deepseek/deepseek-coder is a budget pick |
Code-tuned local models trade some general conversational ability for better code output. Cloud models are generally good at code without specialisation.
Multilingual (German, French, Spanish, etc.)
| Provider | Pick |
|---|---|
| Local | Qwen family (7B+); EuroLLM for German |
| Cloud | Any major cloud model — they all handle European languages well |
For best results, set the system prompt to instruct the model to respond in your language.
Reasoning-heavy queries
The "thinking" / "reasoning" model variants trade speed for accuracy on hard questions:
- Local: DeepSeek-R1 distills, Qwen QwQ.
- Cloud (OpenAI): o1, o3 family.
- Cloud (Anthropic): claude-opus-4-5, or any Sonnet 4.5+ with extended thinking.
Slower per response but handle multi-step reasoning better.
Local — model size vs. quality
A useful rule of thumb:
- 3B models — good for greeting, simple Q&A, basic tool calls.
- 7-8B models — usable for genuine work; the floor for serious investigation.
- 14B models — clearly better at complex tasks; slower.
- 30B+ models — much closer to cloud-LLM quality; need real hardware.
There's no substitute for trying. Pull two models, run the same investigation question through both, see which fits your patience and your need.
Don't just pull the biggest model your machine can run. The fastest acceptable response is more useful than the slowest possible best response. For most users, "good enough at 5-second response time" beats "perfect at 30-second response time."
Quantisation
Local models come in different quantisation levels (Q4, Q5, Q8, FP16). Lower quantisation = smaller / faster, slight quality loss. Ollama's defaults usually pick a sensible quantisation; you rarely need to override.
For tightly RAM-constrained setups, Q4_K_M is the standard "small but still good" choice.
Switching models on the fly
Companion lets you switch the active model in Settings → AI → Model. The next message uses the new model.
For workflows that benefit from switching:
- Quick conversational queries → small / cheap model.
- Tool-chain queries → larger / more capable model.
- Sensitive AirLeak queries → local model.
- General help → cloud model if cost is acceptable.
Updating local models
ollama pull <model> re-pulls the latest version of a model. Models update occasionally; the new version replaces the old one of the same name.
To keep multiple versions, use Ollama's tag syntax: ollama pull qwen2.5:7b-q4_K_M keeps the q4 variant separate from the default qwen2.5:7b.
Local — disk usage
Each local model takes a few GB on disk:
- 3B model: ~2 GB
- 7B model: ~4-5 GB
- 14B model: ~8-10 GB
- 30B+ model: ~20+ GB
Models live under Ollama's data directory (or your runner's equivalent). Delete unused models with ollama rm <model>.
Cost — cloud providers
For cloud providers, Companion shows per-response cost estimates. Rough rules of thumb:
| Pattern | Approximate cost |
|---|---|
| Single conversational message, mini-tier model | Cents |
| Long conversation, top-tier model, lots of tool calls | Tens of cents to a few dollars |
| Multi-day investigation with tool chains | Single-digit dollars over the period |
Reasoning models (o1, o3, Claude with extended thinking) cost notably more per response — sometimes 5-10x — because they generate large internal reasoning traces.
For privacy-comfortable budget-tight users: local Ollama with a 7B model is free and sufficient for most workflows. For privacy-flexible budget-comfortable users: claude-sonnet-4-5 or gpt-4o is the sweet spot.
Capability detection
Different models — even from the same provider — support different capabilities:
| Capability | Why it matters |
|---|---|
| Tool calling | Required for the assistant to drive Companion or call MCP tools |
| Vision | Image inputs (not currently used by Companion's flows) |
| Streaming | Token-by-token rendering |
Companion shows the active model's capabilities under settings → AI. Some smaller / older models lack tool calling — they work for chat but can't operate Companion's tools.