Files
infrastructure/docs/archive/LOCAL-AI-STACK.md
Kaloyan Danchev ec9659d0cb
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Restructure docs: archive VLAN migration, update IPs to VLAN 10
Major documentation cleanup after VLAN migration completion:
- Archive 12 VLAN project docs to archive/vlan-migration/
- Archive 5 done WIP docs (VLAN proposals, AI stack, Fossorial, DNS backup)
- Create standing reference docs 08-DNS-ARCHITECTURE and 09-TAILSCALE-VPN
- Renumber docs to clean 01-09 sequence with merged CHANGELOG
- Update all active docs from stale 192.168.31.x to current VLAN 10 IPs
- Fix CSS1 (.10.9→.10.3) and ZX1 (.10.7→.10.4) IPs in hardware inventory
- Clean 06-VLAN-DEVICE-ASSIGNMENT: remove migration columns/sections, fix VLAN 25 subnet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 12:45:16 +02:00

170 lines
3.8 KiB
Markdown

# Local AI Stack on Unraid
**Status:** ✅ Deployed
**Last Updated:** 2026-01-26
---
## Current Deployment
| Component | Status | URL/Port |
|-----------|--------|----------|
| Ollama | ✅ Running | http://192.168.31.2:11434 |
| Open WebUI | ✅ Running | http://192.168.31.2:3080 |
| Intel GPU | ✅ Enabled | /dev/dri passthrough |
### Models Installed
| Model | Size | Type |
|-------|------|------|
| qwen2.5-coder:7b | 4.7 GB | Base coding LLM |
| unraid-assistant | 4.7 GB | Custom model with infrastructure knowledge |
---
## Custom Model: unraid-assistant
A fine-tuned system prompt model that knows the xtrm-lab.org infrastructure:
### Knowledge Included
- **Network topology**: All VLANs (10,20,25,30,31,35,40,50), IPs, gateways
- **45+ Docker containers**: Names, images, ports, purposes
- **RouterOS 7**: Commands, VLAN patterns, firewall rules
- **Traefik**: Labels, routing, SSL configuration
- **Authentik**: SSO middleware, provider setup
- **External URLs**: All xtrm-lab.org services
### Usage
```bash
# Terminal (SSH to Unraid)
ai "How do I add a device to the IoT VLAN?"
ai "What port is gitea running on?"
ai "Show me Traefik labels for a new app with Authentik"
# Interactive mode
ai
```
### Rebuild Model
If infrastructure changes, update and rebuild:
```bash
# Edit the Modelfile
nano /mnt/user/appdata/ollama/Modelfile-unraid
# Rebuild
docker exec ollama ollama create unraid-assistant -f /root/.ollama/Modelfile-unraid
```
---
## Hardware
| Component | Spec |
|-----------|------|
| CPU | Intel N100 (4 cores) |
| RAM | 16GB (shared with Docker) |
| GPU | Intel UHD (iGPU via /dev/dri) |
| Storage | 1.7TB free on array |
### Performance
- ~1 token/sec with 7B models
- Responses take 30-90 seconds
- Suitable for occasional use, not real-time chat
---
## Containers Stopped for RAM
To free ~4.8GB for AI workloads:
| Container | RAM Freed | Purpose |
|-----------|-----------|---------|
| karakeep | 1.68 GB | Bookmark manager |
| unimus | 1.62 GB | Network backup |
| homarr | 686 MB | Dashboard |
| netdisco-web | 531 MB | Network discovery UI |
| netdisco-backend | 291 MB | Network discovery |
To restart if needed:
```bash
docker start karakeep unimus homarr netdisco-web netdisco-backend
```
---
## Docker Configuration
### Ollama
```bash
docker run -d \
--name ollama \
--restart unless-stopped \
--device /dev/dri \
-v /mnt/user/appdata/ollama:/root/.ollama \
-p 11434:11434 \
ollama/ollama
```
### Open WebUI
```bash
docker run -d \
--name open-webui \
--restart unless-stopped \
-p 3080:8080 \
-e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
-v /mnt/user/appdata/open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
```
### AI Command Helper
```bash
# /usr/local/bin/ai
#\!/bin/bash
MODEL="unraid-assistant"
if [ $# -eq 0 ]; then
docker exec -it ollama ollama run $MODEL
else
docker exec ollama ollama run $MODEL "$*"
fi
```
---
## Open WebUI RAG Setup
For detailed documentation beyond system prompt:
1. Go to http://192.168.31.2:3080
2. **Workspace****Knowledge****+ Create**
3. Name: `Infrastructure`
4. Upload docs from `/mnt/user/appdata/open-webui/docs/`
Infrastructure docs are pre-copied to that location.
---
## Future: N5 Air (Ryzen AI 5 255) Upgrade
Planning to migrate AI stack to N5 Air with Ryzen AI 5 255:
| Metric | N100 (current) | Ryzen AI 5 255 |
|--------|----------------|--------------|
| Speed | ~1 tok/s | ~5-8 tok/s |
| Max model | 7B | 14B-32B |
| Response time | 30-90s | 5-15s |
Features XDNA NPU (16 TOPS) for potential AI acceleration. DDR5 + 6c/12t CPU will significantly improve inference.
---
## Files
| File | Purpose |
|------|---------|
| /mnt/user/appdata/ollama/Modelfile-unraid | Custom model definition |
| /usr/local/bin/ai | Terminal helper command |
| /mnt/user/appdata/open-webui/docs/ | RAG documents |