diff --git a/docs/wip/LOCAL-AI-STACK.md b/docs/wip/LOCAL-AI-STACK.md index 30861ae..0395b9c 100644 --- a/docs/wip/LOCAL-AI-STACK.md +++ b/docs/wip/LOCAL-AI-STACK.md @@ -1,150 +1,104 @@ -# WIP: Local AI Stack on Unraid +# Local AI Stack on Unraid -**Status:** Planning -**Created:** 2026-01-25 +**Status:** ✅ Deployed +**Last Updated:** 2026-01-26 --- -## Overview +## Current Deployment -Deploy a hardware-accelerated local AI stack on Unraid (192.168.31.2), initially on N100 (Intel iGPU) with future migration path to N5 Air (AMD 780M iGPU). +| Component | Status | URL/Port | +|-----------|--------|----------| +| Ollama | ✅ Running | http://192.168.31.2:11434 | +| Open WebUI | ✅ Running | http://192.168.31.2:3080 | +| Intel GPU | ✅ Enabled | /dev/dri passthrough | + +### Model Installed + +| Model | Size | Type | +|-------|------|------| +| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM | --- -## Phase 1: Local AI Configuration (Current N100) +## Hardware -### 1. Hardware-Accelerated AI Stack +| Component | Spec | +|-----------|------| +| CPU | Intel N100 | +| RAM | 16GB (shared with Docker) | +| GPU | Intel UHD (iGPU via /dev/dri) | +| Storage | 1.7TB free on array | -Install via Community Applications (Apps) tab: +--- -| App | Purpose | Configuration | -|-----|---------|---------------| -| Intel GPU Top | N100 QuickSync iGPU support | Required for GPU acceleration | -| Ollama | LLM runtime | Extra Parameters: `--device /dev/dri` | -| Open WebUI | Chat interface | `OLLAMA_BASE_URL=http://192.168.31.2:11434` | +## Containers Stopped for RAM -**Ollama Docker Template:** -``` -Extra Parameters: --device /dev/dri -Port: 11434 -``` +To free ~4.8GB for AI workloads, these non-critical containers were stopped: -**Open WebUI Docker Template:** -``` -OLLAMA_BASE_URL=http://192.168.31.2:11434 -Port: 3080 (or available port) -``` +| Container | RAM Freed | Purpose | +|-----------|-----------|---------| +| karakeep | 1.68 GB | Bookmark manager | +| unimus | 1.62 GB | Network backup | +| homarr | 686 MB | Dashboard | +| netdisco-web | 531 MB | Network discovery UI | +| netdisco-backend | 291 MB | Network discovery | -**PWA Setup:** Open WebUI on phone/tablet → "Add to Home Screen" for native experience. - -### 2. SSH Bridge & Terminal Agent (Aider) - -**SSH Key Setup on Unraid:** +To restart if needed: ```bash -# Create directory -mkdir -p /mnt/user/appdata/ssh_keys - -# Generate AI agent key -ssh-keygen -t ed25519 -f /mnt/user/appdata/ssh_keys/ai_agent -N "" - -# Deploy to MikroTik (via existing key) -# The AI can then manage MikroTik remotely +docker start karakeep unimus homarr netdisco-web netdisco-backend ``` -**Aider Configuration:** +--- + +## Docker Configuration + +### Ollama ```bash -export OLLAMA_API_BASE=http://192.168.31.2:11434 -aider --model ollama_chat/qwen2.5-coder:14b +docker run -d \ + --name ollama \ + --restart unless-stopped \ + --device /dev/dri \ + -v /mnt/user/appdata/ollama:/root/.ollama \ + -p 11434:11434 \ + ollama/ollama ``` -### 3. Sanitized Knowledge Base - -Upload `topology.md` to Open WebUI → Workspace → Knowledge section. - ---- - -## Phase 2: Hardware Migration (N100 → N5 Air) - -### 1. Clean Break (On N100) - -1. Stop all Docker containers -2. Stop Array: Main tab → Stop -3. Disable Auto-Start: Settings → Disk Settings → Enable Auto-Start: No -4. Uninstall "Intel GPU Top" plugin -5. Shutdown N100 - -### 2. N5 Air BIOS Configuration - -Move Unraid USB and drives to N5 Air. Boot to BIOS (F2/Del): - -| Setting | Value | Purpose | -|---------|-------|---------| -| SVM Mode / AMD-V | Enabled | Virtualization | -| UMA Frame Buffer Size | 8GB-16GB | RAM allocation for Radeon 780M | -| IOMMU | Enabled | Device passthrough | - -### 3. N5 Air Integration - -1. Boot Unraid on N5 Air -2. Install "AMD GPU Top" from Apps -3. Update Ollama Docker: - ``` - Extra Parameters: --device /dev/dri - Environment: HSA_OVERRIDE_GFX_VERSION=11.0.0 - ``` -4. CPU Pinning: Settings → CPU Pinning → Assign 8-12 threads to Ollama - ---- - -## Phase 3: Network Topology for AI - -```markdown -## Network Map -- **Gateway/DNS:** 192.168.31.1 (MikroTik hAP ax³) -- **Unraid Server:** 192.168.31.2 (Docker host, AI stack) -- **AdGuard DNS:** 192.168.31.4 (macvlan on Unraid) -- **MikroTik AdGuard:** 172.17.0.2 (container, primary DNS) -- **MikroTik Tailscale:** 172.17.0.3 (container, VPN) - -## AI-Manageable Hosts -| Host | IP | SSH Port | Key | -|------|-----|----------|-----| -| Unraid | 192.168.31.2 | 422 | id_ed25519_unraid | -| MikroTik | 192.168.31.1 | 2222 | mikrotik_key | - -## Services -| Service | URL | -|---------|-----| -| Gitea | https://git.xtrm-lab.org | -| Woodpecker CI | https://ci.xtrm-lab.org | -| AdGuard (MikroTik) | http://192.168.31.1:3000 | -| AdGuard (Unraid) | http://192.168.31.4 | -| Ollama API | http://192.168.31.2:11434 | -| Open WebUI | http://192.168.31.2:3080 | - -## Operational Protocol -1. Use SSH keys for all remote commands -2. Verify container status before changes: `docker ps` or `/container print` -3. Never output raw passwords or credentials -4. Document all infrastructure changes in git repo +### Open WebUI +```bash +docker run -d \ + --name open-webui \ + --restart unless-stopped \ + -p 3080:8080 \ + -e OLLAMA_BASE_URL=http://192.168.31.2:11434 \ + -v /mnt/user/appdata/open-webui:/app/backend/data \ + ghcr.io/open-webui/open-webui:main ``` --- -## Tasks +## Usage -- [ ] Install Intel GPU Top plugin on Unraid -- [ ] Deploy Ollama with `--device /dev/dri` -- [ ] Configure Open WebUI with Ollama endpoint -- [ ] Generate AI agent SSH key -- [ ] Deploy key to MikroTik for remote management -- [ ] Install Aider on workstation -- [ ] Create and upload topology.md to Open WebUI -- [ ] Test AI queries against infrastructure -- [ ] (Future) Migrate to N5 Air hardware +### Web Interface +1. Open http://192.168.31.2:3080 +2. Create admin account on first visit +3. Select `qwen2.5-coder:7b` model +4. Start chatting -## Notes +### API Access +```bash +# List models +curl http://192.168.31.2:11434/api/tags -- Current infrastructure repo: https://git.xtrm-lab.org/jazzymc/infrastructure -- MikroTik containers use bridge network 172.17.0.0/24 -- Unraid SSH on non-standard port 422 +# Generate response (example) +curl http://192.168.31.2:11434/api/generate \ + -d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}' +``` + +--- + +## Future Considerations + +- **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b) +- **Dedicated GPU:** Would significantly improve inference speed +- **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull `