Files
infrastructure/docs/wip/LOCAL-AI-STACK.md
XTRM-Unraid 4066e7ff3a
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
docs: Complete local AI stack documentation
- Deployed Ollama + Open WebUI on Unraid
- Created custom unraid-assistant model with full infrastructure knowledge:
  - Network topology (8 VLANs, all IPs/gateways)
  - 45+ Docker containers with ports and purposes
  - RouterOS 7 commands and VLAN patterns
  - Traefik labels and Authentik SSO middleware
  - All xtrm-lab.org external URLs
- Added /usr/local/bin/ai terminal helper command
- Documented RAM optimization (stopped 5 containers)
- Added future upgrade notes for Mac Mini M4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 20:38:06 +02:00

3.7 KiB

Local AI Stack on Unraid

Status: Deployed
Last Updated: 2026-01-26


Current Deployment

Component Status URL/Port
Ollama Running http://192.168.31.2:11434
Open WebUI Running http://192.168.31.2:3080
Intel GPU Enabled /dev/dri passthrough

Models Installed

Model Size Type
qwen2.5-coder:7b 4.7 GB Base coding LLM
unraid-assistant 4.7 GB Custom model with infrastructure knowledge

Custom Model: unraid-assistant

A fine-tuned system prompt model that knows the xtrm-lab.org infrastructure:

Knowledge Included

  • Network topology: All VLANs (10,20,25,30,31,35,40,50), IPs, gateways
  • 45+ Docker containers: Names, images, ports, purposes
  • RouterOS 7: Commands, VLAN patterns, firewall rules
  • Traefik: Labels, routing, SSL configuration
  • Authentik: SSO middleware, provider setup
  • External URLs: All xtrm-lab.org services

Usage

# Terminal (SSH to Unraid)
ai "How do I add a device to the IoT VLAN?"
ai "What port is gitea running on?"
ai "Show me Traefik labels for a new app with Authentik"

# Interactive mode
ai

Rebuild Model

If infrastructure changes, update and rebuild:

# Edit the Modelfile
nano /mnt/user/appdata/ollama/Modelfile-unraid

# Rebuild
docker exec ollama ollama create unraid-assistant -f /root/.ollama/Modelfile-unraid

Hardware

Component Spec
CPU Intel N100 (4 cores)
RAM 16GB (shared with Docker)
GPU Intel UHD (iGPU via /dev/dri)
Storage 1.7TB free on array

Performance

  • ~1 token/sec with 7B models
  • Responses take 30-90 seconds
  • Suitable for occasional use, not real-time chat

Containers Stopped for RAM

To free ~4.8GB for AI workloads:

Container RAM Freed Purpose
karakeep 1.68 GB Bookmark manager
unimus 1.62 GB Network backup
homarr 686 MB Dashboard
netdisco-web 531 MB Network discovery UI
netdisco-backend 291 MB Network discovery

To restart if needed:

docker start karakeep unimus homarr netdisco-web netdisco-backend

Docker Configuration

Ollama

docker run -d \
  --name ollama \
  --restart unless-stopped \
  --device /dev/dri \
  -v /mnt/user/appdata/ollama:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama

Open WebUI

docker run -d \
  --name open-webui \
  --restart unless-stopped \
  -p 3080:8080 \
  -e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
  -v /mnt/user/appdata/open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

AI Command Helper

# /usr/local/bin/ai
#\!/bin/bash
MODEL="unraid-assistant"
if [ $# -eq 0 ]; then
    docker exec -it ollama ollama run $MODEL
else
    docker exec ollama ollama run $MODEL "$*"
fi

Open WebUI RAG Setup

For detailed documentation beyond system prompt:

  1. Go to http://192.168.31.2:3080
  2. WorkspaceKnowledge+ Create
  3. Name: Infrastructure
  4. Upload docs from /mnt/user/appdata/open-webui/docs/

Infrastructure docs are pre-copied to that location.


Future: Mac Mini M4 Upgrade

Planning to migrate AI stack to Mac Mini M4 (32GB):

Metric N100 (current) M4 (planned)
Speed ~1 tok/s ~15-20 tok/s
Max model 7B 70B+
Response time 30-90s 3-5s

The M4 unified memory architecture is ideal for LLM inference.


Files

File Purpose
/mnt/user/appdata/ollama/Modelfile-unraid Custom model definition
/usr/local/bin/ai Terminal helper command
/mnt/user/appdata/open-webui/docs/ RAG documents