docs: Complete local AI stack documentation

- Deployed Ollama + Open WebUI on Unraid - Created custom unraid-assistant model with full infrastructure knowledge: - Network topology (8 VLANs, all IPs/gateways) - 45+ Docker containers with ports and purposes - RouterOS 7 commands and VLAN patterns - Traefik labels and Authentik SSO middleware - All xtrm-lab.org external URLs - Added /usr/local/bin/ai terminal helper command - Documented RAM optimization (stopped 5 containers) - Added future upgrade notes for Mac Mini M4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 20:38:06 +02:00
parent aee91fcc4b
commit 4066e7ff3a
2 changed files with 101 additions and 29 deletions
@@ -3,11 +3,18 @@
 ## 2026-01-26

 ### Local AI Stack Deployed
- [AI] Deployed Ollama container with Intel GPU passthrough (/dev/dri)
+- [AI] Deployed Ollama container with Intel GPU passthrough
 - [AI] Deployed Open WebUI at http://192.168.31.2:3080
- [AI] Installed qwen2.5-coder:7b model (4.7GB)
- [AI] Stopped non-critical containers to free ~4.8GB RAM:
-  - karakeep, unimus, homarr, netdisco-web, netdisco-backend
+- [AI] Installed qwen2.5-coder:7b base model
+- [AI] Created custom `unraid-assistant` model with infrastructure knowledge:
+  - Network topology (all VLANs, IPs, gateways)
+  - 45+ Docker containers (names, ports, purposes)
+  - RouterOS 7 commands and patterns
+  - Traefik labels and Authentik middleware
+  - All external URLs (xtrm-lab.org)
+- [AI] Created `/usr/local/bin/ai` terminal helper command
+- [AI] Stopped non-critical containers for RAM: karakeep, unimus, homarr, netdisco-*
+

 ### VLAN Activation Attempt & Fixes
 - [VLAN] Configured CSS326 switch VLANs via SwOS web interface
@@ -13,11 +13,50 @@
 | Open WebUI | ✅ Running | http://192.168.31.2:3080 |
 | Intel GPU | ✅ Enabled | /dev/dri passthrough |

-### Model Installed
+### Models Installed

 | Model | Size | Type |
 |-------|------|------|
-| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM |
+| qwen2.5-coder:7b | 4.7 GB | Base coding LLM |
+| unraid-assistant | 4.7 GB | Custom model with infrastructure knowledge |
+
+---
+
+## Custom Model: unraid-assistant
+
+A fine-tuned system prompt model that knows the xtrm-lab.org infrastructure:
+
+### Knowledge Included
+- **Network topology**: All VLANs (10,20,25,30,31,35,40,50), IPs, gateways
+- **45+ Docker containers**: Names, images, ports, purposes
+- **RouterOS 7**: Commands, VLAN patterns, firewall rules
+- **Traefik**: Labels, routing, SSL configuration
+- **Authentik**: SSO middleware, provider setup
+- **External URLs**: All xtrm-lab.org services
+
+### Usage
+
+```bash
+# Terminal (SSH to Unraid)
+ai "How do I add a device to the IoT VLAN?"
+ai "What port is gitea running on?"
+ai "Show me Traefik labels for a new app with Authentik"
+
+# Interactive mode
+ai
+```
+
+### Rebuild Model
+
+If infrastructure changes, update and rebuild:
+
+```bash
+# Edit the Modelfile
+nano /mnt/user/appdata/ollama/Modelfile-unraid
+
+# Rebuild
+docker exec ollama ollama create unraid-assistant -f /root/.ollama/Modelfile-unraid
+```

 ---

@@ -25,16 +64,21 @@

 | Component | Spec |
 |-----------|------|
-| CPU | Intel N100 |
+| CPU | Intel N100 (4 cores) |
 | RAM | 16GB (shared with Docker) |
 | GPU | Intel UHD (iGPU via /dev/dri) |
 | Storage | 1.7TB free on array |

+### Performance
+- ~1 token/sec with 7B models
+- Responses take 30-90 seconds
+- Suitable for occasional use, not real-time chat
+
 ---

 ## Containers Stopped for RAM

-To free ~4.8GB for AI workloads, these non-critical containers were stopped:
+To free ~4.8GB for AI workloads:

 | Container | RAM Freed | Purpose |
 |-----------|-----------|---------|
@@ -75,30 +119,51 @@ docker run -d \
  ghcr.io/open-webui/open-webui:main
 ```

---
-
-## Usage
-
-### Web Interface
-1. Open http://192.168.31.2:3080
-2. Create admin account on first visit
-3. Select `qwen2.5-coder:7b` model
-4. Start chatting
-
-### API Access
+### AI Command Helper
 ```bash
-# List models
-curl http://192.168.31.2:11434/api/tags
-
-# Generate response (example)
-curl http://192.168.31.2:11434/api/generate \
-  -d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'
+# /usr/local/bin/ai
+#\!/bin/bash
+MODEL="unraid-assistant"
+if [ $# -eq 0 ]; then
+    docker exec -it ollama ollama run $MODEL
+else
+    docker exec ollama ollama run $MODEL "$*"
+fi
 ```

 ---

-## Future Considerations
+## Open WebUI RAG Setup

- **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b)
- **Dedicated GPU:** Would significantly improve inference speed
- **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull <model>`
+For detailed documentation beyond system prompt:
+
+1. Go to http://192.168.31.2:3080
+2. **Workspace** → **Knowledge** → **+ Create**
+3. Name: `Infrastructure`
+4. Upload docs from `/mnt/user/appdata/open-webui/docs/`
+
+Infrastructure docs are pre-copied to that location.
+
+---
+
+## Future: Mac Mini M4 Upgrade
+
+Planning to migrate AI stack to Mac Mini M4 (32GB):
+
+| Metric | N100 (current) | M4 (planned) |
+|--------|----------------|--------------|
+| Speed | ~1 tok/s | ~15-20 tok/s |
+| Max model | 7B | 70B+ |
+| Response time | 30-90s | 3-5s |
+
+The M4 unified memory architecture is ideal for LLM inference.
+
+---
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| /mnt/user/appdata/ollama/Modelfile-unraid | Custom model definition |
+| /usr/local/bin/ai | Terminal helper command |
+| /mnt/user/appdata/open-webui/docs/ | RAG documents |