ZeroTrace Companion

Model Recommendations

Name: ZeroTrace
Address: DE

Which model to pick across local and cloud — by RAM budget, by use case, by language, by cost.

Companion's AI assistant works with any model your provider supports. This page recommends models by use case so you don't have to evaluate the entire ecosystem yourself.

Local — by RAM budget

For local providers (Ollama, LM Studio):

Available RAM	Recommended model	Why
8 GB	`qwen2.5:3b` or `llama3.2:3b`	Smallest practical models with tool calling
16 GB	`qwen2.5:7b` or `llama3.1:8b`	Sweet spot — good quality, fast on Apple Silicon / mid GPU
32 GB	`qwen2.5:14b` or `mistral-small`	Noticeably more capable, especially for complex tool chains
64 GB+	`qwen2.5:32b` or `llama3.1:70b` (quantised)	Top-tier local quality; requires patience or strong GPU

For a starting point, qwen2.5:7b is the broad recommendation. Good tool-calling support, multilingual, runs well on a typical modern laptop.

Cloud — by provider

OpenAI

Model	Best for	Approximate cost
gpt-4o-mini	Default. Cheap, capable, fast.	Cents per typical chat
gpt-4o	When you need top quality	Several cents per chat
o1-mini / o3-mini	Reasoning-heavy queries	Higher per-response cost
o1 / o3	Hardest reasoning tasks	Significantly more

For most Companion workflows, gpt-4o-mini is the right default. Bump to gpt-4o when complex tool chains start failing.

Anthropic

Model	Best for	Approximate cost
claude-haiku-4-5	Default. Fast, cheap, good at tool calling.	Cents per typical chat
claude-sonnet-4-5	Default for serious work. Excellent reasoning.	Several cents per chat
claude-opus-4-5	Top-tier reasoning when you need it	Significantly more

For Companion, Claude Sonnet 4.5 is the sweet-spot recommendation if cost isn't tight. Claude Haiku 4.5 is a great budget pick for routine queries.

OpenRouter

OpenRouter exposes hundreds of models. Worth trying:

Model	Why
`qwen/qwen-2.5-72b-instruct`	Cheap, capable, strong at tool calling
`anthropic/claude-sonnet-4-5`	Same Sonnet via OpenRouter (slightly different pricing)
`openai/gpt-4o-mini`	Same gpt-4o-mini via OpenRouter
`meta-llama/llama-3.3-70b-instruct`	Open-weights via cloud, good quality
`deepseek/deepseek-chat`	Surprisingly strong cheap option
`google/gemini-2.5-pro`	Long context, alternative reasoning style

The big win with OpenRouter: try many models without separate accounts. Pricing per model is shown on openrouter.ai/models.

By use case

Quick conversational Q&A

Pick the smallest / cheapest model that runs comfortably:

Local: 3B-7B model.
Cloud: gpt-4o-mini / claude-haiku-4-5.

A small model answers "what is RSSI" in two seconds; a large model takes ten and gives the same answer.

Complex tool chains

Larger models plan multi-step tool calls more reliably:

Local: 14B+ models.
Cloud: gpt-4o, claude-sonnet-4-5+, o1.

For investigations where the assistant chains four or five tools, the better planners are worth the cost.

Code generation (HID scripts, automation)

Provider	Pick
Local	`qwen2.5-coder:7b` or `qwen2.5-coder:14b`
Cloud (OpenAI)	`gpt-4o`
Cloud (Anthropic)	`claude-sonnet-4-5`
OpenRouter	`deepseek/deepseek-coder` is a budget pick

Code-tuned local models trade some general conversational ability for better code output. Cloud models are generally good at code without specialisation.

Multilingual (German, French, Spanish, etc.)

Provider	Pick
Local	Qwen family (7B+); EuroLLM for German
Cloud	Any major cloud model — they all handle European languages well

For best results, set the system prompt to instruct the model to respond in your language.

Reasoning-heavy queries

The "thinking" / "reasoning" model variants trade speed for accuracy on hard questions:

Local: DeepSeek-R1 distills, Qwen QwQ.
Cloud (OpenAI): o1, o3 family.
Cloud (Anthropic): claude-opus-4-5, or any Sonnet 4.5+ with extended thinking.

Slower per response but handle multi-step reasoning better.

Local — model size vs. quality

A useful rule of thumb:

3B models — good for greeting, simple Q&A, basic tool calls.
7-8B models — usable for genuine work; the floor for serious investigation.
14B models — clearly better at complex tasks; slower.
30B+ models — much closer to cloud-LLM quality; need real hardware.

There's no substitute for trying. Pull two models, run the same investigation question through both, see which fits your patience and your need.

Don't just pull the biggest model your machine can run. The fastest acceptable response is more useful than the slowest possible best response. For most users, "good enough at 5-second response time" beats "perfect at 30-second response time."

Quantisation

Local models come in different quantisation levels (Q4, Q5, Q8, FP16). Lower quantisation = smaller / faster, slight quality loss. Ollama's defaults usually pick a sensible quantisation; you rarely need to override.

For tightly RAM-constrained setups, Q4_K_M is the standard "small but still good" choice.

Switching models on the fly

Companion lets you switch the active model in Settings → AI → Model. The next message uses the new model.

For workflows that benefit from switching:

Quick conversational queries → small / cheap model.
Tool-chain queries → larger / more capable model.
Sensitive AirLeak queries → local model.
General help → cloud model if cost is acceptable.

Updating local models

ollama pull <model> re-pulls the latest version of a model. Models update occasionally; the new version replaces the old one of the same name.

To keep multiple versions, use Ollama's tag syntax: ollama pull qwen2.5:7b-q4_K_M keeps the q4 variant separate from the default qwen2.5:7b.

Local — disk usage

Each local model takes a few GB on disk:

3B model: ~2 GB
7B model: ~4-5 GB
14B model: ~8-10 GB
30B+ model: ~20+ GB

Models live under Ollama's data directory (or your runner's equivalent). Delete unused models with ollama rm <model>.

Cost — cloud providers

For cloud providers, Companion shows per-response cost estimates. Rough rules of thumb:

Pattern	Approximate cost
Single conversational message, mini-tier model	Cents
Long conversation, top-tier model, lots of tool calls	Tens of cents to a few dollars
Multi-day investigation with tool chains	Single-digit dollars over the period

Reasoning models (o1, o3, Claude with extended thinking) cost notably more per response — sometimes 5-10x — because they generate large internal reasoning traces.

For privacy-comfortable budget-tight users: local Ollama with a 7B model is free and sufficient for most workflows. For privacy-flexible budget-comfortable users: claude-sonnet-4-5 or gpt-4o is the sweet spot.

Capability detection

Different models — even from the same provider — support different capabilities:

Capability	Why it matters
Tool calling	Required for the assistant to drive Companion or call MCP tools
Vision	Image inputs (not currently used by Companion's flows)
Streaming	Token-by-token rendering

Companion shows the active model's capabilities under settings → AI. Some smaller / older models lack tool calling — they work for chat but can't operate Companion's tools.

Command Palette