Skip to content

Ollama

Ollama lets you run open-source language models on your own hardware. No API keys, no cloud services, no data leaving your machine. It’s the most private option for running Selu.

  • Complete privacy — Your conversations and data stay on your hardware. Nothing is sent to an external service.
  • No API costs — Once you have the hardware, there are no per-token charges.
  • Offline capable — Works without an internet connection after the initial model download.
  • Experimentation — Try many different open-source models easily.
  1. Install Ollama from ollama.com. It’s available for macOS, Linux, and Windows.
  2. Pull a model — Open your terminal and download a model:
Terminal window
ollama pull llama3.1
  1. Verify it’s running — Ollama starts a local API server automatically. Test it with:
Terminal window
curl http://localhost:11434/api/tags
  1. Open the Selu dashboard and go to Settings → LLM Providers → Ollama.
  2. Enter the Ollama server URL. If Ollama runs on the same machine as Selu:

Use http://host.docker.internal:11434 so the Selu container can reach Ollama on your host machine.

  1. Click Test Connection to verify Selu can reach Ollama.
  2. Select your model from the dropdown (Selu auto-detects installed models).
  3. Save your settings.
ModelSizeGood for
Llama 3.1 8B~5 GBFast general-purpose chat on most hardware
Llama 3.1 70B~40 GBHigh-quality responses, needs a powerful GPU
Mistral 7B~4 GBEfficient, good at following instructions
Gemma 2 9B~5 GBStrong reasoning in a compact model

Running models locally requires decent hardware. As a rough guide:

  • 8 GB RAM — Can run 7B parameter models comfortably.
  • 16 GB RAM — Good for most 7B–13B models with room to spare.
  • GPU with 8+ GB VRAM — Significantly speeds up responses. NVIDIA GPUs with CUDA support work best.

Without a GPU, models still work but responses will be noticeably slower.

  • Connection refused — Make sure Ollama is running (ollama serve) and the URL is correct. If Selu is in Docker, use host.docker.internal instead of localhost.
  • Slow responses — Try a smaller model or ensure your GPU is being utilized (check with ollama ps).
  • Out of memory — The model is too large for your hardware. Switch to a smaller variant.