Files

ci/woodpecker/push/woodpecker Pipeline failed

Details

docs: Update LOCAL-AI-STACK.md with deployment status

- Ollama and Open WebUI deployed and running
- qwen2.5-coder:7b model installed (4.7GB)
- Intel GPU passthrough enabled
- Stopped non-critical containers for RAM
- Added docker commands and usage instructions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-26 15:56:48 +02:00

2.2 KiB

Raw Blame History

Local AI Stack on Unraid

Status: ✅ Deployed
Last Updated: 2026-01-26

Current Deployment

Component	Status	URL/Port
Ollama	✅ Running	http://192.168.31.2:11434
Open WebUI	✅ Running	http://192.168.31.2:3080
Intel GPU	✅ Enabled	/dev/dri passthrough

Model Installed

Model	Size	Type
qwen2.5-coder:7b	4.7 GB	Code-focused LLM

Hardware

Component	Spec
CPU	Intel N100
RAM	16GB (shared with Docker)
GPU	Intel UHD (iGPU via /dev/dri)
Storage	1.7TB free on array

Containers Stopped for RAM

To free ~4.8GB for AI workloads, these non-critical containers were stopped:

Container	RAM Freed	Purpose
karakeep	1.68 GB	Bookmark manager
unimus	1.62 GB	Network backup
homarr	686 MB	Dashboard
netdisco-web	531 MB	Network discovery UI
netdisco-backend	291 MB	Network discovery

To restart if needed:

docker start karakeep unimus homarr netdisco-web netdisco-backend

Docker Configuration

Ollama

docker run -d \
  --name ollama \
  --restart unless-stopped \
  --device /dev/dri \
  -v /mnt/user/appdata/ollama:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama

Open WebUI

docker run -d \
  --name open-webui \
  --restart unless-stopped \
  -p 3080:8080 \
  -e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
  -v /mnt/user/appdata/open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

Usage

Web Interface

Open http://192.168.31.2:3080
Create admin account on first visit
Select qwen2.5-coder:7b model
Start chatting

API Access

# List models
curl http://192.168.31.2:11434/api/tags

# Generate response (example)
curl http://192.168.31.2:11434/api/generate \
  -d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'

Future Considerations

More RAM: With 32GB+ RAM, could run larger models (14b, 32b)
Dedicated GPU: Would significantly improve inference speed
Additional models: Can pull more models as needed with docker exec ollama ollama pull <model>

2.2 KiB Raw Blame History