Ollama

Ollama lets you run open-source language models on your own hardware. No API keys, no cloud services, no data leaving your machine. It’s the most private option for running Selu.

Why use Ollama?

Complete privacy — Your conversations and data stay on your hardware. Nothing is sent to an external service.
No API costs — Once you have the hardware, there are no per-token charges.
Offline capable — Works without an internet connection after the initial model download.
Experimentation — Try many different open-source models easily.

Setting up Ollama

Install Ollama from ollama.com. It’s available for macOS, Linux, and Windows.
Pull a model — Open your terminal and download a model:

ollama pull llama3.1

Verify it’s running — Ollama starts a local API server automatically. Test it with:

curl http://localhost:11434/api/tags

Connecting to Selu

Open the Selu dashboard and go to Settings → LLM Providers → Ollama.
Enter the Ollama server URL. If Ollama runs on the same machine as Selu:

Selu in Docker
Selu on host

Use http://host.docker.internal:11434 so the Selu container can reach Ollama on your host machine.

Use http://localhost:11434 — the default Ollama address.

Click Test Connection to verify Selu can reach Ollama.
Select your model from the dropdown (Selu auto-detects installed models).
Save your settings.

Recommended models

Model	Size	Good for
Llama 3.1 8B	~5 GB	Fast general-purpose chat on most hardware
Llama 3.1 70B	~40 GB	High-quality responses, needs a powerful GPU
Mistral 7B	~4 GB	Efficient, good at following instructions
Gemma 2 9B	~5 GB	Strong reasoning in a compact model

Hardware considerations

Running models locally requires decent hardware. As a rough guide:

8 GB RAM — Can run 7B parameter models comfortably.
16 GB RAM — Good for most 7B–13B models with room to spare.
GPU with 8+ GB VRAM — Significantly speeds up responses. NVIDIA GPUs with CUDA support work best.

Without a GPU, models still work but responses will be noticeably slower.

Troubleshooting

Connection refused — Make sure Ollama is running (ollama serve) and the URL is correct. If Selu is in Docker, use host.docker.internal instead of localhost.
Slow responses — Try a smaller model or ensure your GPU is being utilized (check with ollama ps).
Out of memory — The model is too large for your hardware. Switch to a smaller variant.