From 4066e7ff3ae8b5210942d5470e0f717f30bd8cb2 Mon Sep 17 00:00:00 2001 From: XTRM-Unraid Date: Mon, 26 Jan 2026 20:38:06 +0200 Subject: [PATCH] docs: Complete local AI stack documentation - Deployed Ollama + Open WebUI on Unraid - Created custom unraid-assistant model with full infrastructure knowledge: - Network topology (8 VLANs, all IPs/gateways) - 45+ Docker containers with ports and purposes - RouterOS 7 commands and VLAN patterns - Traefik labels and Authentik SSO middleware - All xtrm-lab.org external URLs - Added /usr/local/bin/ai terminal helper command - Documented RAM optimization (stopped 5 containers) - Added future upgrade notes for Mac Mini M4 Co-Authored-By: Claude Opus 4.5 --- docs/06-CHANGELOG.md | 15 +++-- docs/wip/LOCAL-AI-STACK.md | 115 +++++++++++++++++++++++++++++-------- 2 files changed, 101 insertions(+), 29 deletions(-) diff --git a/docs/06-CHANGELOG.md b/docs/06-CHANGELOG.md index d927fc0..7884310 100644 --- a/docs/06-CHANGELOG.md +++ b/docs/06-CHANGELOG.md @@ -3,11 +3,18 @@ ## 2026-01-26 ### Local AI Stack Deployed -- [AI] Deployed Ollama container with Intel GPU passthrough (/dev/dri) +- [AI] Deployed Ollama container with Intel GPU passthrough - [AI] Deployed Open WebUI at http://192.168.31.2:3080 -- [AI] Installed qwen2.5-coder:7b model (4.7GB) -- [AI] Stopped non-critical containers to free ~4.8GB RAM: - - karakeep, unimus, homarr, netdisco-web, netdisco-backend +- [AI] Installed qwen2.5-coder:7b base model +- [AI] Created custom `unraid-assistant` model with infrastructure knowledge: + - Network topology (all VLANs, IPs, gateways) + - 45+ Docker containers (names, ports, purposes) + - RouterOS 7 commands and patterns + - Traefik labels and Authentik middleware + - All external URLs (xtrm-lab.org) +- [AI] Created `/usr/local/bin/ai` terminal helper command +- [AI] Stopped non-critical containers for RAM: karakeep, unimus, homarr, netdisco-* + ### VLAN Activation Attempt & Fixes - [VLAN] Configured CSS326 switch VLANs via SwOS web interface diff --git a/docs/wip/LOCAL-AI-STACK.md b/docs/wip/LOCAL-AI-STACK.md index 0395b9c..c7b5b62 100644 --- a/docs/wip/LOCAL-AI-STACK.md +++ b/docs/wip/LOCAL-AI-STACK.md @@ -13,11 +13,50 @@ | Open WebUI | ✅ Running | http://192.168.31.2:3080 | | Intel GPU | ✅ Enabled | /dev/dri passthrough | -### Model Installed +### Models Installed | Model | Size | Type | |-------|------|------| -| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM | +| qwen2.5-coder:7b | 4.7 GB | Base coding LLM | +| unraid-assistant | 4.7 GB | Custom model with infrastructure knowledge | + +--- + +## Custom Model: unraid-assistant + +A fine-tuned system prompt model that knows the xtrm-lab.org infrastructure: + +### Knowledge Included +- **Network topology**: All VLANs (10,20,25,30,31,35,40,50), IPs, gateways +- **45+ Docker containers**: Names, images, ports, purposes +- **RouterOS 7**: Commands, VLAN patterns, firewall rules +- **Traefik**: Labels, routing, SSL configuration +- **Authentik**: SSO middleware, provider setup +- **External URLs**: All xtrm-lab.org services + +### Usage + +```bash +# Terminal (SSH to Unraid) +ai "How do I add a device to the IoT VLAN?" +ai "What port is gitea running on?" +ai "Show me Traefik labels for a new app with Authentik" + +# Interactive mode +ai +``` + +### Rebuild Model + +If infrastructure changes, update and rebuild: + +```bash +# Edit the Modelfile +nano /mnt/user/appdata/ollama/Modelfile-unraid + +# Rebuild +docker exec ollama ollama create unraid-assistant -f /root/.ollama/Modelfile-unraid +``` --- @@ -25,16 +64,21 @@ | Component | Spec | |-----------|------| -| CPU | Intel N100 | +| CPU | Intel N100 (4 cores) | | RAM | 16GB (shared with Docker) | | GPU | Intel UHD (iGPU via /dev/dri) | | Storage | 1.7TB free on array | +### Performance +- ~1 token/sec with 7B models +- Responses take 30-90 seconds +- Suitable for occasional use, not real-time chat + --- ## Containers Stopped for RAM -To free ~4.8GB for AI workloads, these non-critical containers were stopped: +To free ~4.8GB for AI workloads: | Container | RAM Freed | Purpose | |-----------|-----------|---------| @@ -75,30 +119,51 @@ docker run -d \ ghcr.io/open-webui/open-webui:main ``` ---- - -## Usage - -### Web Interface -1. Open http://192.168.31.2:3080 -2. Create admin account on first visit -3. Select `qwen2.5-coder:7b` model -4. Start chatting - -### API Access +### AI Command Helper ```bash -# List models -curl http://192.168.31.2:11434/api/tags - -# Generate response (example) -curl http://192.168.31.2:11434/api/generate \ - -d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}' +# /usr/local/bin/ai +#\!/bin/bash +MODEL="unraid-assistant" +if [ $# -eq 0 ]; then + docker exec -it ollama ollama run $MODEL +else + docker exec ollama ollama run $MODEL "$*" +fi ``` --- -## Future Considerations +## Open WebUI RAG Setup -- **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b) -- **Dedicated GPU:** Would significantly improve inference speed -- **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull ` +For detailed documentation beyond system prompt: + +1. Go to http://192.168.31.2:3080 +2. **Workspace** → **Knowledge** → **+ Create** +3. Name: `Infrastructure` +4. Upload docs from `/mnt/user/appdata/open-webui/docs/` + +Infrastructure docs are pre-copied to that location. + +--- + +## Future: Mac Mini M4 Upgrade + +Planning to migrate AI stack to Mac Mini M4 (32GB): + +| Metric | N100 (current) | M4 (planned) | +|--------|----------------|--------------| +| Speed | ~1 tok/s | ~15-20 tok/s | +| Max model | 7B | 70B+ | +| Response time | 30-90s | 3-5s | + +The M4 unified memory architecture is ideal for LLM inference. + +--- + +## Files + +| File | Purpose | +|------|---------| +| /mnt/user/appdata/ollama/Modelfile-unraid | Custom model definition | +| /usr/local/bin/ai | Terminal helper command | +| /mnt/user/appdata/open-webui/docs/ | RAG documents |