Ollama
Ollama lets you run open-source language models on your own hardware. No API keys, no cloud services, no data leaving your machine. It’s the most private option for running Selu.
Why use Ollama?
Section titled “Why use Ollama?”- Complete privacy — Your conversations and data stay on your hardware. Nothing is sent to an external service.
- No API costs — Once you have the hardware, there are no per-token charges.
- Offline capable — Works without an internet connection after the initial model download.
- Experimentation — Try many different open-source models easily.
Setting up Ollama
Section titled “Setting up Ollama”- Install Ollama from ollama.com. It’s available for macOS, Linux, and Windows.
- Pull a model — Open your terminal and download a model:
ollama pull llama3.1- Verify it’s running — Ollama starts a local API server automatically. Test it with:
curl http://localhost:11434/api/tagsConnecting to Selu
Section titled “Connecting to Selu”- Open the Selu dashboard and go to Settings → LLM Providers → Ollama.
- Enter the Ollama server URL. If Ollama runs on the same machine as Selu:
Use http://host.docker.internal:11434 so the Selu container can reach Ollama on your host machine.
Use http://localhost:11434 — the default Ollama address.
- Click Test Connection to verify Selu can reach Ollama.
- Select your model from the dropdown (Selu auto-detects installed models).
- Save your settings.
Recommended models
Section titled “Recommended models”| Model | Size | Good for |
|---|---|---|
| Llama 3.1 8B | ~5 GB | Fast general-purpose chat on most hardware |
| Llama 3.1 70B | ~40 GB | High-quality responses, needs a powerful GPU |
| Mistral 7B | ~4 GB | Efficient, good at following instructions |
| Gemma 2 9B | ~5 GB | Strong reasoning in a compact model |
Hardware considerations
Section titled “Hardware considerations”Running models locally requires decent hardware. As a rough guide:
- 8 GB RAM — Can run 7B parameter models comfortably.
- 16 GB RAM — Good for most 7B–13B models with room to spare.
- GPU with 8+ GB VRAM — Significantly speeds up responses. NVIDIA GPUs with CUDA support work best.
Without a GPU, models still work but responses will be noticeably slower.
Troubleshooting
Section titled “Troubleshooting”- Connection refused — Make sure Ollama is running (
ollama serve) and the URL is correct. If Selu is in Docker, usehost.docker.internalinstead oflocalhost. - Slow responses — Try a smaller model or ensure your GPU is being utilized (check with
ollama ps). - Out of memory — The model is too large for your hardware. Switch to a smaller variant.