Local Models (GGUF)
Run inference entirely on your own hardware — no Ollama, no API key. Contenox can download GGUF model files directly from HuggingFace and serve them via a built-in llama.cpp backend.
Curated models
Run contenox model registry-list to see all available models with sizes. The table below lists the curated set; approximate VRAM figures assume Q4_K_M quantization.
| Name | Description | ~VRAM |
|---|---|---|
tiny | FastThink 0.5B (testing only) | ~1 GB |
llama3.2-1b | Llama 3.2 1B | ~1 GB |
qwen2.5-1.5b | Qwen 2.5 1.5B | ~1 GB |
granite-3.2-2b | IBM Granite 3.2 2B | ~1 GB |
qwen3-4b | Qwen 3 4B | ~3 GB |
gemma4-e2b | Gemma 4 E2B | ~3 GB |
phi-4-mini | Microsoft Phi-4 Mini | ~3 GB |
gemma4-e4b | Gemma 4 E4B | ~5 GB |
granite-3.2-8b | IBM Granite 3.2 8B | ~5 GB |
qwen2.5-7b | Qwen 2.5 7B | ~5 GB |
qwen3-14b | Qwen 3 14B | ~9 GB |
qwen3-30b | Qwen 3 30B (MoE, fast) | ~19 GB |
kimi-linear | Kimi Linear 48B (MoE) | ~30 GB |
llama4-scout | Llama 4 Scout 17Bx16E | ~68 GB |
Note
Multi-GPU models (llama4-scout) require several GPUs or unified memory. MoE models (qwen3-30b, kimi-linear) use far less active VRAM than their parameter count suggests.
1. Download a model
Initialize the workspace first if you have not already:
contenox init
Then pick a model from the table and pull it. The file is stored at ~/.contenox/models/<name>/model.gguf.
contenox model pull qwen3-4b
Progress is printed in-line. The download is resumable — if interrupted, re-run the same command.
2. What gets configured
contenox init creates the built-in local backend automatically. contenox model pull adds the model to the local registry and, on a fresh install, sets the first pulled model as default-model.
Contenox scans ~/.contenox/models/ and exposes every */model.gguf it finds as a model name on the local provider.
3. Verify and run
contenox doctor
contenox "hello, what can you do?"
If you are switching back to local models after using a cloud provider, set the defaults explicitly:
contenox config set default-provider local
contenox config set default-model qwen3-4b
Bring your own model
Any GGUF file hosted on HuggingFace (or any public URL) can be pulled by name:
contenox model pull my-model --url https://huggingface.co/org/repo/resolve/main/model.gguf
Use /resolve/main/ (not /blob/main/) in the URL so HuggingFace serves the raw file.
After the download completes, the model is automatically registered in the local registry and available from the local backend.
Registry management
The model registry is the authoritative name → URL index. Manage it from the CLI.
CLI
contenox model registry-list # list all curated + user-added entries
contenox model add my-model --url https://huggingface.co/org/repo/resolve/main/model.gguf
contenox model show my-model # print registry details as JSON
contenox model remove my-model # remove a user-added entry
Curated entries (tiny, qwen3-4b, etc.) cannot be removed — they are embedded in the binary.
Next steps
- CLI reference — full
contenox modelsubcommand reference - Quickstart — wire the backend into your first agent
- Core Concepts — chains, tasks, tools