docs: Complete local AI stack documentation

- Deployed Ollama + Open WebUI on Unraid - Created custom unraid-assistant model with full infrastructure knowledge: - Network topology (8 VLANs, all IPs/gateways) - 45+ Docker containers with ports and purposes - RouterOS 7 commands and VLAN patterns - Traefik labels and Authentik SSO middleware - All xtrm-lab.org external URLs - Added /usr/local/bin/ai terminal helper command - Documented RAM optimization (stopped 5 containers) - Added future upgrade notes for Mac Mini M4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 20:38:06 +02:00
parent aee91fcc4b
commit 4066e7ff3a
2 changed files with 101 additions and 29 deletions
@@ -3,11 +3,18 @@
 ## 2026-01-26
 ### Local AI Stack Deployed
- [AI] Deployed Ollama container with Intel GPU passthrough (/dev/dri)
+- [AI] Deployed Ollama container with Intel GPU passthrough
 - [AI] Deployed Open WebUI at http://192.168.31.2:3080
- [AI] Installed qwen2.5-coder:7b model (4.7GB)
+- [AI] Installed qwen2.5-coder:7b base model
- [AI] Stopped non-critical containers to free ~4.8GB RAM:
+- [AI] Created custom `unraid-assistant` model with infrastructure knowledge:
-  - karakeep, unimus, homarr, netdisco-web, netdisco-backend
+  - Network topology (all VLANs, IPs, gateways)
  - 45+ Docker containers (names, ports, purposes)
  - RouterOS 7 commands and patterns
  - Traefik labels and Authentik middleware
  - All external URLs (xtrm-lab.org)
 - [AI] Created `/usr/local/bin/ai` terminal helper command
 - [AI] Stopped non-critical containers for RAM: karakeep, unimus, homarr, netdisco-*
 ### VLAN Activation Attempt & Fixes
 - [VLAN] Configured CSS326 switch VLANs via SwOS web interface
@@ -13,11 +13,50 @@
 | Open WebUI | ✅ Running | http://192.168.31.2:3080 |
 | Intel GPU | ✅ Enabled | /dev/dri passthrough |
-### Model Installed
+### Models Installed
 | Model | Size | Type |
 |-------|------|------|
-| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM |
+| qwen2.5-coder:7b | 4.7 GB | Base coding LLM |
 | unraid-assistant | 4.7 GB | Custom model with infrastructure knowledge |
 ---
 ## Custom Model: unraid-assistant
 A fine-tuned system prompt model that knows the xtrm-lab.org infrastructure:
 ### Knowledge Included
 - **Network topology**: All VLANs (10,20,25,30,31,35,40,50), IPs, gateways
 - **45+ Docker containers**: Names, images, ports, purposes
 - **RouterOS 7**: Commands, VLAN patterns, firewall rules
 - **Traefik**: Labels, routing, SSL configuration
 - **Authentik**: SSO middleware, provider setup
 - **External URLs**: All xtrm-lab.org services
 ### Usage
 ```bash
 # Terminal (SSH to Unraid)
 ai "How do I add a device to the IoT VLAN?"
 ai "What port is gitea running on?"
 ai "Show me Traefik labels for a new app with Authentik"
 # Interactive mode
 ai
 ```
 ### Rebuild Model
 If infrastructure changes, update and rebuild:
 ```bash
 # Edit the Modelfile
 nano /mnt/user/appdata/ollama/Modelfile-unraid
 # Rebuild
 docker exec ollama ollama create unraid-assistant -f /root/.ollama/Modelfile-unraid
 ```
 ---
@@ -25,16 +64,21 @@
 | Component | Spec |
 |-----------|------|
-| CPU | Intel N100 |
+| CPU | Intel N100 (4 cores) |
 | RAM | 16GB (shared with Docker) |
 | GPU | Intel UHD (iGPU via /dev/dri) |
 | Storage | 1.7TB free on array |
 ### Performance
 - ~1 token/sec with 7B models
 - Responses take 30-90 seconds
 - Suitable for occasional use, not real-time chat
 ---
 ## Containers Stopped for RAM
-To free ~4.8GB for AI workloads, these non-critical containers were stopped:
+To free ~4.8GB for AI workloads:
 | Container | RAM Freed | Purpose |
 |-----------|-----------|---------|
@@ -75,30 +119,51 @@ docker run -d \
  ghcr.io/open-webui/open-webui:main
 ```
---
+### AI Command Helper
 ## Usage
 ### Web Interface
 1. Open http://192.168.31.2:3080
 2. Create admin account on first visit
 3. Select `qwen2.5-coder:7b` model
 4. Start chatting
 ### API Access
 ```bash
-# List models
+# /usr/local/bin/ai
-curl http://192.168.31.2:11434/api/tags
+#\!/bin/bash
-
+MODEL="unraid-assistant"
-# Generate response (example)
+if [ $# -eq 0 ]; then
-curl http://192.168.31.2:11434/api/generate \
+    docker exec -it ollama ollama run $MODEL
-  -d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'
+else
    docker exec ollama ollama run $MODEL "$*"
 fi
 ```
 ---
-## Future Considerations
+## Open WebUI RAG Setup
- **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b)
+For detailed documentation beyond system prompt:
- **Dedicated GPU:** Would significantly improve inference speed
+
- **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull <model>`
+1. Go to http://192.168.31.2:3080
 2. **Workspace** → **Knowledge** → **+ Create**
 3. Name: `Infrastructure`
 4. Upload docs from `/mnt/user/appdata/open-webui/docs/`
 Infrastructure docs are pre-copied to that location.
 ---
 ## Future: Mac Mini M4 Upgrade
 Planning to migrate AI stack to Mac Mini M4 (32GB):
 | Metric | N100 (current) | M4 (planned) |
 |--------|----------------|--------------|
 | Speed | ~1 tok/s | ~15-20 tok/s |
 | Max model | 7B | 70B+ |
 | Response time | 30-90s | 3-5s |
 The M4 unified memory architecture is ideal for LLM inference.
 ---
 ## Files
 | File | Purpose |
 |------|---------|
 | /mnt/user/appdata/ollama/Modelfile-unraid | Custom model definition |
 | /usr/local/bin/ai | Terminal helper command |
 | /mnt/user/appdata/open-webui/docs/ | RAG documents |