Files
infrastructure/docs/wip/LOCAL-AI-STACK.md
XTRM-Unraid a80415f66b
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
docs: Update LOCAL-AI-STACK.md with deployment status
- Ollama and Open WebUI deployed and running
- qwen2.5-coder:7b model installed (4.7GB)
- Intel GPU passthrough enabled
- Stopped non-critical containers for RAM
- Added docker commands and usage instructions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 15:56:48 +02:00

2.2 KiB

Local AI Stack on Unraid

Status: Deployed
Last Updated: 2026-01-26


Current Deployment

Component Status URL/Port
Ollama Running http://192.168.31.2:11434
Open WebUI Running http://192.168.31.2:3080
Intel GPU Enabled /dev/dri passthrough

Model Installed

Model Size Type
qwen2.5-coder:7b 4.7 GB Code-focused LLM

Hardware

Component Spec
CPU Intel N100
RAM 16GB (shared with Docker)
GPU Intel UHD (iGPU via /dev/dri)
Storage 1.7TB free on array

Containers Stopped for RAM

To free ~4.8GB for AI workloads, these non-critical containers were stopped:

Container RAM Freed Purpose
karakeep 1.68 GB Bookmark manager
unimus 1.62 GB Network backup
homarr 686 MB Dashboard
netdisco-web 531 MB Network discovery UI
netdisco-backend 291 MB Network discovery

To restart if needed:

docker start karakeep unimus homarr netdisco-web netdisco-backend

Docker Configuration

Ollama

docker run -d \
  --name ollama \
  --restart unless-stopped \
  --device /dev/dri \
  -v /mnt/user/appdata/ollama:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama

Open WebUI

docker run -d \
  --name open-webui \
  --restart unless-stopped \
  -p 3080:8080 \
  -e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
  -v /mnt/user/appdata/open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

Usage

Web Interface

  1. Open http://192.168.31.2:3080
  2. Create admin account on first visit
  3. Select qwen2.5-coder:7b model
  4. Start chatting

API Access

# List models
curl http://192.168.31.2:11434/api/tags

# Generate response (example)
curl http://192.168.31.2:11434/api/generate \
  -d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'

Future Considerations

  • More RAM: With 32GB+ RAM, could run larger models (14b, 32b)
  • Dedicated GPU: Would significantly improve inference speed
  • Additional models: Can pull more models as needed with docker exec ollama ollama pull <model>