infrastructure/docs/wip/LOCAL-AI-STACK.md

# Local AI Stack on Unraid

**Status:** ✅ Deployed
**Last Updated:** 2026-01-26

---

## Current Deployment

| Component | Status | URL/Port |
|-----------|--------|----------|
| Ollama | ✅ Running | http://192.168.31.2:11434 |
| Open WebUI | ✅ Running | http://192.168.31.2:3080 |
| Intel GPU | ✅ Enabled | /dev/dri passthrough |

### Models Installed

| Model | Size | Type |
|-------|------|------|
| qwen2.5-coder:7b | 4.7 GB | Base coding LLM |
| unraid-assistant | 4.7 GB | Custom model with infrastructure knowledge |

---

## Custom Model: unraid-assistant

A fine-tuned system prompt model that knows the xtrm-lab.org infrastructure:

### Knowledge Included
- **Network topology**: All VLANs (10,20,25,30,31,35,40,50), IPs, gateways
- **45+ Docker containers**: Names, images, ports, purposes
- **RouterOS 7**: Commands, VLAN patterns, firewall rules
- **Traefik**: Labels, routing, SSL configuration
- **Authentik**: SSO middleware, provider setup
- **External URLs**: All xtrm-lab.org services

### Usage

```bash
# Terminal (SSH to Unraid)
ai "How do I add a device to the IoT VLAN?"
ai "What port is gitea running on?"
ai "Show me Traefik labels for a new app with Authentik"

# Interactive mode
ai
```

### Rebuild Model

If infrastructure changes, update and rebuild:

```bash
# Edit the Modelfile
nano /mnt/user/appdata/ollama/Modelfile-unraid

# Rebuild
docker exec ollama ollama create unraid-assistant -f /root/.ollama/Modelfile-unraid
```

---

## Hardware

| Component | Spec |
|-----------|------|
| CPU | Intel N100 (4 cores) |
| RAM | 16GB (shared with Docker) |
| GPU | Intel UHD (iGPU via /dev/dri) |
| Storage | 1.7TB free on array |

### Performance
- ~1 token/sec with 7B models
- Responses take 30-90 seconds
- Suitable for occasional use, not real-time chat

---

## Containers Stopped for RAM

To free ~4.8GB for AI workloads:

| Container | RAM Freed | Purpose |
|-----------|-----------|---------|
| karakeep | 1.68 GB | Bookmark manager |
| unimus | 1.62 GB | Network backup |
| homarr | 686 MB | Dashboard |
| netdisco-web | 531 MB | Network discovery UI |
| netdisco-backend | 291 MB | Network discovery |

To restart if needed:
```bash
docker start karakeep unimus homarr netdisco-web netdisco-backend
```

---

## Docker Configuration

### Ollama
```bash
docker run -d \
  --name ollama \
  --restart unless-stopped \
  --device /dev/dri \
  -v /mnt/user/appdata/ollama:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama
```

### Open WebUI
```bash
docker run -d \
  --name open-webui \
  --restart unless-stopped \
  -p 3080:8080 \
  -e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
  -v /mnt/user/appdata/open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main
```

### AI Command Helper
```bash
# /usr/local/bin/ai
#\!/bin/bash
MODEL="unraid-assistant"
if [ $# -eq 0 ]; then
    docker exec -it ollama ollama run $MODEL
else
    docker exec ollama ollama run $MODEL "$*"
fi
```

---

## Open WebUI RAG Setup

For detailed documentation beyond system prompt:

1. Go to http://192.168.31.2:3080
2. **Workspace** → **Knowledge** → **+ Create**
3. Name: `Infrastructure`
4. Upload docs from `/mnt/user/appdata/open-webui/docs/`

Infrastructure docs are pre-copied to that location.

---

## Future: N5 Air (Ryzen AI 5 255) Upgrade

Planning to migrate AI stack to N5 Air with Ryzen AI 5 255:

| Metric | N100 (current) | Ryzen AI 5 255 |
|--------|----------------|--------------|
| Speed | ~1 tok/s | ~5-8 tok/s |
| Max model | 7B | 14B-32B |
| Response time | 30-90s | 5-15s |

Features XDNA NPU (16 TOPS) for potential AI acceleration. DDR5 + 6c/12t CPU will significantly improve inference.

---

## Files

| File | Purpose |
|------|---------|
| /mnt/user/appdata/ollama/Modelfile-unraid | Custom model definition |
| /usr/local/bin/ai | Terminal helper command |
| /mnt/user/appdata/open-webui/docs/ | RAG documents |