docs: Update LOCAL-AI-STACK.md with deployment status

- Ollama and Open WebUI deployed and running - qwen2.5-coder:7b model installed (4.7GB) - Intel GPU passthrough enabled - Stopped non-critical containers for RAM - Added docker commands and usage instructions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 15:56:48 +02:00
parent 5982e4c444
commit a80415f66b
1 changed files with 77 additions and 123 deletions
@@ -1,150 +1,104 @@
-# WIP: Local AI Stack on Unraid
+# Local AI Stack on Unraid
-**Status:** Planning  
+**Status:** ✅ Deployed  
-**Created:** 2026-01-25  
+**Last Updated:** 2026-01-26
 ---
-## Overview
+## Current Deployment
-Deploy a hardware-accelerated local AI stack on Unraid (192.168.31.2), initially on N100 (Intel iGPU) with future migration path to N5 Air (AMD 780M iGPU).
+| Component | Status | URL/Port |
 |-----------|--------|----------|
 | Ollama | ✅ Running | http://192.168.31.2:11434 |
 | Open WebUI | ✅ Running | http://192.168.31.2:3080 |
 | Intel GPU | ✅ Enabled | /dev/dri passthrough |
 ### Model Installed
 | Model | Size | Type |
 |-------|------|------|
 | qwen2.5-coder:7b | 4.7 GB | Code-focused LLM |
 ---
-## Phase 1: Local AI Configuration (Current N100)
+## Hardware
-### 1. Hardware-Accelerated AI Stack
+| Component | Spec |
 |-----------|------|
 | CPU | Intel N100 |
 | RAM | 16GB (shared with Docker) |
 | GPU | Intel UHD (iGPU via /dev/dri) |
 | Storage | 1.7TB free on array |
-Install via Community Applications (Apps) tab:
+---
-| App | Purpose | Configuration |
+## Containers Stopped for RAM
 |-----|---------|---------------|
 | Intel GPU Top | N100 QuickSync iGPU support | Required for GPU acceleration |
 | Ollama | LLM runtime | Extra Parameters: `--device /dev/dri` |
 | Open WebUI | Chat interface | `OLLAMA_BASE_URL=http://192.168.31.2:11434` |
-**Ollama Docker Template:**
+To free ~4.8GB for AI workloads, these non-critical containers were stopped:
 ```
 Extra Parameters: --device /dev/dri
 Port: 11434
 ```
-**Open WebUI Docker Template:**
+| Container | RAM Freed | Purpose |
-```
+|-----------|-----------|---------|
-OLLAMA_BASE_URL=http://192.168.31.2:11434
+| karakeep | 1.68 GB | Bookmark manager |
-Port: 3080 (or available port)
+| unimus | 1.62 GB | Network backup |
-```
+| homarr | 686 MB | Dashboard |
 | netdisco-web | 531 MB | Network discovery UI |
 | netdisco-backend | 291 MB | Network discovery |
-**PWA Setup:** Open WebUI on phone/tablet → "Add to Home Screen" for native experience.
+To restart if needed:
 ### 2. SSH Bridge & Terminal Agent (Aider)
 **SSH Key Setup on Unraid:**
 ```bash
-# Create directory
+docker start karakeep unimus homarr netdisco-web netdisco-backend
 mkdir -p /mnt/user/appdata/ssh_keys
 # Generate AI agent key
 ssh-keygen -t ed25519 -f /mnt/user/appdata/ssh_keys/ai_agent -N ""
 # Deploy to MikroTik (via existing key)
 # The AI can then manage MikroTik remotely
 ```
-**Aider Configuration:**
+---
 ## Docker Configuration
 ### Ollama
 ```bash
-export OLLAMA_API_BASE=http://192.168.31.2:11434
+docker run -d \
-aider --model ollama_chat/qwen2.5-coder:14b
+  --name ollama \
  --restart unless-stopped \
  --device /dev/dri \
  -v /mnt/user/appdata/ollama:/root/.ollama \
  -p 11434:11434 \
  ollama/ollama
 ```
-### 3. Sanitized Knowledge Base
+### Open WebUI
-
+```bash
-Upload `topology.md` to Open WebUI → Workspace → Knowledge section.
+docker run -d \
-
+  --name open-webui \
---
+  --restart unless-stopped \
-
+  -p 3080:8080 \
-## Phase 2: Hardware Migration (N100 → N5 Air)
+  -e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
-
+  -v /mnt/user/appdata/open-webui:/app/backend/data \
-### 1. Clean Break (On N100)
+  ghcr.io/open-webui/open-webui:main
 1. Stop all Docker containers
 2. Stop Array: Main tab → Stop
 3. Disable Auto-Start: Settings → Disk Settings → Enable Auto-Start: No
 4. Uninstall "Intel GPU Top" plugin
 5. Shutdown N100
 ### 2. N5 Air BIOS Configuration
 Move Unraid USB and drives to N5 Air. Boot to BIOS (F2/Del):
 | Setting | Value | Purpose |
 |---------|-------|---------|
 | SVM Mode / AMD-V | Enabled | Virtualization |
 | UMA Frame Buffer Size | 8GB-16GB | RAM allocation for Radeon 780M |
 | IOMMU | Enabled | Device passthrough |
 ### 3. N5 Air Integration
 1. Boot Unraid on N5 Air
 2. Install "AMD GPU Top" from Apps
 3. Update Ollama Docker:
   ```
   Extra Parameters: --device /dev/dri
   Environment: HSA_OVERRIDE_GFX_VERSION=11.0.0
   ```
 4. CPU Pinning: Settings → CPU Pinning → Assign 8-12 threads to Ollama
 ---
 ## Phase 3: Network Topology for AI
 ```markdown
 ## Network Map
 - **Gateway/DNS:** 192.168.31.1 (MikroTik hAP ax³)
 - **Unraid Server:** 192.168.31.2 (Docker host, AI stack)
 - **AdGuard DNS:** 192.168.31.4 (macvlan on Unraid)
 - **MikroTik AdGuard:** 172.17.0.2 (container, primary DNS)
 - **MikroTik Tailscale:** 172.17.0.3 (container, VPN)
 ## AI-Manageable Hosts
 | Host | IP | SSH Port | Key |
 |------|-----|----------|-----|
 | Unraid | 192.168.31.2 | 422 | id_ed25519_unraid |
 | MikroTik | 192.168.31.1 | 2222 | mikrotik_key |
 ## Services
 | Service | URL |
 |---------|-----|
 | Gitea | https://git.xtrm-lab.org |
 | Woodpecker CI | https://ci.xtrm-lab.org |
 | AdGuard (MikroTik) | http://192.168.31.1:3000 |
 | AdGuard (Unraid) | http://192.168.31.4 |
 | Ollama API | http://192.168.31.2:11434 |
 | Open WebUI | http://192.168.31.2:3080 |
 ## Operational Protocol
 1. Use SSH keys for all remote commands
 2. Verify container status before changes: `docker ps` or `/container print`
 3. Never output raw passwords or credentials
 4. Document all infrastructure changes in git repo
 ```
 ---
-## Tasks
+## Usage
- [ ] Install Intel GPU Top plugin on Unraid
+### Web Interface
- [ ] Deploy Ollama with `--device /dev/dri`
+1. Open http://192.168.31.2:3080
- [ ] Configure Open WebUI with Ollama endpoint
+2. Create admin account on first visit
- [ ] Generate AI agent SSH key
+3. Select `qwen2.5-coder:7b` model
- [ ] Deploy key to MikroTik for remote management
+4. Start chatting
 - [ ] Install Aider on workstation
 - [ ] Create and upload topology.md to Open WebUI
 - [ ] Test AI queries against infrastructure
 - [ ] (Future) Migrate to N5 Air hardware
-## Notes
+### API Access
 ```bash
 # List models
 curl http://192.168.31.2:11434/api/tags
- Current infrastructure repo: https://git.xtrm-lab.org/jazzymc/infrastructure
+# Generate response (example)
- MikroTik containers use bridge network 172.17.0.0/24
+curl http://192.168.31.2:11434/api/generate \
- Unraid SSH on non-standard port 422
+  -d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'
 ```
 ---
 ## Future Considerations
 - **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b)
 - **Dedicated GPU:** Would significantly improve inference speed
 - **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull <model>`