Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Ollama and Open WebUI deployed and running - qwen2.5-coder:7b model installed (4.7GB) - Intel GPU passthrough enabled - Stopped non-critical containers for RAM - Added docker commands and usage instructions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2.2 KiB
2.2 KiB
Local AI Stack on Unraid
Status: ✅ Deployed
Last Updated: 2026-01-26
Current Deployment
| Component | Status | URL/Port |
|---|---|---|
| Ollama | ✅ Running | http://192.168.31.2:11434 |
| Open WebUI | ✅ Running | http://192.168.31.2:3080 |
| Intel GPU | ✅ Enabled | /dev/dri passthrough |
Model Installed
| Model | Size | Type |
|---|---|---|
| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM |
Hardware
| Component | Spec |
|---|---|
| CPU | Intel N100 |
| RAM | 16GB (shared with Docker) |
| GPU | Intel UHD (iGPU via /dev/dri) |
| Storage | 1.7TB free on array |
Containers Stopped for RAM
To free ~4.8GB for AI workloads, these non-critical containers were stopped:
| Container | RAM Freed | Purpose |
|---|---|---|
| karakeep | 1.68 GB | Bookmark manager |
| unimus | 1.62 GB | Network backup |
| homarr | 686 MB | Dashboard |
| netdisco-web | 531 MB | Network discovery UI |
| netdisco-backend | 291 MB | Network discovery |
To restart if needed:
docker start karakeep unimus homarr netdisco-web netdisco-backend
Docker Configuration
Ollama
docker run -d \
--name ollama \
--restart unless-stopped \
--device /dev/dri \
-v /mnt/user/appdata/ollama:/root/.ollama \
-p 11434:11434 \
ollama/ollama
Open WebUI
docker run -d \
--name open-webui \
--restart unless-stopped \
-p 3080:8080 \
-e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
-v /mnt/user/appdata/open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
Usage
Web Interface
- Open http://192.168.31.2:3080
- Create admin account on first visit
- Select
qwen2.5-coder:7bmodel - Start chatting
API Access
# List models
curl http://192.168.31.2:11434/api/tags
# Generate response (example)
curl http://192.168.31.2:11434/api/generate \
-d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'
Future Considerations
- More RAM: With 32GB+ RAM, could run larger models (14b, 32b)
- Dedicated GPU: Would significantly improve inference speed
- Additional models: Can pull more models as needed with
docker exec ollama ollama pull <model>