From 4066e7ff3ae8b5210942d5470e0f717f30bd8cb2 Mon Sep 17 00:00:00 2001
From: XTRM-Unraid <admin@xtrm-lab.org>
Date: Mon, 26 Jan 2026 20:38:06 +0200
Subject: [PATCH] docs: Complete local AI stack documentation

- Deployed Ollama + Open WebUI on Unraid
- Created custom unraid-assistant model with full infrastructure knowledge:
  - Network topology (8 VLANs, all IPs/gateways)
  - 45+ Docker containers with ports and purposes
  - RouterOS 7 commands and VLAN patterns
  - Traefik labels and Authentik SSO middleware
  - All xtrm-lab.org external URLs
- Added /usr/local/bin/ai terminal helper command
- Documented RAM optimization (stopped 5 containers)
- Added future upgrade notes for Mac Mini M4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 docs/06-CHANGELOG.md       |  15 +++--
 docs/wip/LOCAL-AI-STACK.md | 115 +++++++++++++++++++++++++++++--------
 2 files changed, 101 insertions(+), 29 deletions(-)

diff --git a/docs/06-CHANGELOG.md b/docs/06-CHANGELOG.md
index d927fc0..7884310 100644
--- a/docs/06-CHANGELOG.md
+++ b/docs/06-CHANGELOG.md
@@ -3,11 +3,18 @@
 ## 2026-01-26
 
 ### Local AI Stack Deployed
-- [AI] Deployed Ollama container with Intel GPU passthrough (/dev/dri)
+- [AI] Deployed Ollama container with Intel GPU passthrough
 - [AI] Deployed Open WebUI at http://192.168.31.2:3080
-- [AI] Installed qwen2.5-coder:7b model (4.7GB)
-- [AI] Stopped non-critical containers to free ~4.8GB RAM:
-  - karakeep, unimus, homarr, netdisco-web, netdisco-backend
+- [AI] Installed qwen2.5-coder:7b base model
+- [AI] Created custom `unraid-assistant` model with infrastructure knowledge:
+  - Network topology (all VLANs, IPs, gateways)
+  - 45+ Docker containers (names, ports, purposes)
+  - RouterOS 7 commands and patterns
+  - Traefik labels and Authentik middleware
+  - All external URLs (xtrm-lab.org)
+- [AI] Created `/usr/local/bin/ai` terminal helper command
+- [AI] Stopped non-critical containers for RAM: karakeep, unimus, homarr, netdisco-*
+
 
 ### VLAN Activation Attempt & Fixes
 - [VLAN] Configured CSS326 switch VLANs via SwOS web interface
diff --git a/docs/wip/LOCAL-AI-STACK.md b/docs/wip/LOCAL-AI-STACK.md
index 0395b9c..c7b5b62 100644
--- a/docs/wip/LOCAL-AI-STACK.md
+++ b/docs/wip/LOCAL-AI-STACK.md
@@ -13,11 +13,50 @@
 | Open WebUI | ✅ Running | http://192.168.31.2:3080 |
 | Intel GPU | ✅ Enabled | /dev/dri passthrough |
 
-### Model Installed
+### Models Installed
 
 | Model | Size | Type |
 |-------|------|------|
-| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM |
+| qwen2.5-coder:7b | 4.7 GB | Base coding LLM |
+| unraid-assistant | 4.7 GB | Custom model with infrastructure knowledge |
+
+---
+
+## Custom Model: unraid-assistant
+
+A fine-tuned system prompt model that knows the xtrm-lab.org infrastructure:
+
+### Knowledge Included
+- **Network topology**: All VLANs (10,20,25,30,31,35,40,50), IPs, gateways
+- **45+ Docker containers**: Names, images, ports, purposes
+- **RouterOS 7**: Commands, VLAN patterns, firewall rules
+- **Traefik**: Labels, routing, SSL configuration
+- **Authentik**: SSO middleware, provider setup
+- **External URLs**: All xtrm-lab.org services
+
+### Usage
+
+```bash
+# Terminal (SSH to Unraid)
+ai "How do I add a device to the IoT VLAN?"
+ai "What port is gitea running on?"
+ai "Show me Traefik labels for a new app with Authentik"
+
+# Interactive mode
+ai
+```
+
+### Rebuild Model
+
+If infrastructure changes, update and rebuild:
+
+```bash
+# Edit the Modelfile
+nano /mnt/user/appdata/ollama/Modelfile-unraid
+
+# Rebuild
+docker exec ollama ollama create unraid-assistant -f /root/.ollama/Modelfile-unraid
+```
 
 ---
 
@@ -25,16 +64,21 @@
 
 | Component | Spec |
 |-----------|------|
-| CPU | Intel N100 |
+| CPU | Intel N100 (4 cores) |
 | RAM | 16GB (shared with Docker) |
 | GPU | Intel UHD (iGPU via /dev/dri) |
 | Storage | 1.7TB free on array |
 
+### Performance
+- ~1 token/sec with 7B models
+- Responses take 30-90 seconds
+- Suitable for occasional use, not real-time chat
+
 ---
 
 ## Containers Stopped for RAM
 
-To free ~4.8GB for AI workloads, these non-critical containers were stopped:
+To free ~4.8GB for AI workloads:
 
 | Container | RAM Freed | Purpose |
 |-----------|-----------|---------|
@@ -75,30 +119,51 @@ docker run -d \
   ghcr.io/open-webui/open-webui:main
 ```
 
----
-
-## Usage
-
-### Web Interface
-1. Open http://192.168.31.2:3080
-2. Create admin account on first visit
-3. Select `qwen2.5-coder:7b` model
-4. Start chatting
-
-### API Access
+### AI Command Helper
 ```bash
-# List models
-curl http://192.168.31.2:11434/api/tags
-
-# Generate response (example)
-curl http://192.168.31.2:11434/api/generate \
-  -d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'
+# /usr/local/bin/ai
+#\!/bin/bash
+MODEL="unraid-assistant"
+if [ $# -eq 0 ]; then
+    docker exec -it ollama ollama run $MODEL
+else
+    docker exec ollama ollama run $MODEL "$*"
+fi
 ```
 
 ---
 
-## Future Considerations
+## Open WebUI RAG Setup
 
-- **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b)
-- **Dedicated GPU:** Would significantly improve inference speed
-- **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull <model>`
+For detailed documentation beyond system prompt:
+
+1. Go to http://192.168.31.2:3080
+2. **Workspace** → **Knowledge** → **+ Create**
+3. Name: `Infrastructure`
+4. Upload docs from `/mnt/user/appdata/open-webui/docs/`
+
+Infrastructure docs are pre-copied to that location.
+
+---
+
+## Future: Mac Mini M4 Upgrade
+
+Planning to migrate AI stack to Mac Mini M4 (32GB):
+
+| Metric | N100 (current) | M4 (planned) |
+|--------|----------------|--------------|
+| Speed | ~1 tok/s | ~15-20 tok/s |
+| Max model | 7B | 70B+ |
+| Response time | 30-90s | 3-5s |
+
+The M4 unified memory architecture is ideal for LLM inference.
+
+---
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| /mnt/user/appdata/ollama/Modelfile-unraid | Custom model definition |
+| /usr/local/bin/ai | Terminal helper command |
+| /mnt/user/appdata/open-webui/docs/ | RAG documents |