docs: Update LOCAL-AI-STACK.md with deployment status
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Ollama and Open WebUI deployed and running - qwen2.5-coder:7b model installed (4.7GB) - Intel GPU passthrough enabled - Stopped non-critical containers for RAM - Added docker commands and usage instructions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -1,150 +1,104 @@
|
|||||||
# WIP: Local AI Stack on Unraid
|
# Local AI Stack on Unraid
|
||||||
|
|
||||||
**Status:** Planning
|
**Status:** ✅ Deployed
|
||||||
**Created:** 2026-01-25
|
**Last Updated:** 2026-01-26
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Overview
|
## Current Deployment
|
||||||
|
|
||||||
Deploy a hardware-accelerated local AI stack on Unraid (192.168.31.2), initially on N100 (Intel iGPU) with future migration path to N5 Air (AMD 780M iGPU).
|
| Component | Status | URL/Port |
|
||||||
|
|-----------|--------|----------|
|
||||||
|
| Ollama | ✅ Running | http://192.168.31.2:11434 |
|
||||||
|
| Open WebUI | ✅ Running | http://192.168.31.2:3080 |
|
||||||
|
| Intel GPU | ✅ Enabled | /dev/dri passthrough |
|
||||||
|
|
||||||
|
### Model Installed
|
||||||
|
|
||||||
|
| Model | Size | Type |
|
||||||
|
|-------|------|------|
|
||||||
|
| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Phase 1: Local AI Configuration (Current N100)
|
## Hardware
|
||||||
|
|
||||||
### 1. Hardware-Accelerated AI Stack
|
| Component | Spec |
|
||||||
|
|-----------|------|
|
||||||
|
| CPU | Intel N100 |
|
||||||
|
| RAM | 16GB (shared with Docker) |
|
||||||
|
| GPU | Intel UHD (iGPU via /dev/dri) |
|
||||||
|
| Storage | 1.7TB free on array |
|
||||||
|
|
||||||
Install via Community Applications (Apps) tab:
|
---
|
||||||
|
|
||||||
| App | Purpose | Configuration |
|
## Containers Stopped for RAM
|
||||||
|-----|---------|---------------|
|
|
||||||
| Intel GPU Top | N100 QuickSync iGPU support | Required for GPU acceleration |
|
|
||||||
| Ollama | LLM runtime | Extra Parameters: `--device /dev/dri` |
|
|
||||||
| Open WebUI | Chat interface | `OLLAMA_BASE_URL=http://192.168.31.2:11434` |
|
|
||||||
|
|
||||||
**Ollama Docker Template:**
|
To free ~4.8GB for AI workloads, these non-critical containers were stopped:
|
||||||
```
|
|
||||||
Extra Parameters: --device /dev/dri
|
|
||||||
Port: 11434
|
|
||||||
```
|
|
||||||
|
|
||||||
**Open WebUI Docker Template:**
|
| Container | RAM Freed | Purpose |
|
||||||
```
|
|-----------|-----------|---------|
|
||||||
OLLAMA_BASE_URL=http://192.168.31.2:11434
|
| karakeep | 1.68 GB | Bookmark manager |
|
||||||
Port: 3080 (or available port)
|
| unimus | 1.62 GB | Network backup |
|
||||||
```
|
| homarr | 686 MB | Dashboard |
|
||||||
|
| netdisco-web | 531 MB | Network discovery UI |
|
||||||
|
| netdisco-backend | 291 MB | Network discovery |
|
||||||
|
|
||||||
**PWA Setup:** Open WebUI on phone/tablet → "Add to Home Screen" for native experience.
|
To restart if needed:
|
||||||
|
|
||||||
### 2. SSH Bridge & Terminal Agent (Aider)
|
|
||||||
|
|
||||||
**SSH Key Setup on Unraid:**
|
|
||||||
```bash
|
```bash
|
||||||
# Create directory
|
docker start karakeep unimus homarr netdisco-web netdisco-backend
|
||||||
mkdir -p /mnt/user/appdata/ssh_keys
|
|
||||||
|
|
||||||
# Generate AI agent key
|
|
||||||
ssh-keygen -t ed25519 -f /mnt/user/appdata/ssh_keys/ai_agent -N ""
|
|
||||||
|
|
||||||
# Deploy to MikroTik (via existing key)
|
|
||||||
# The AI can then manage MikroTik remotely
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Aider Configuration:**
|
---
|
||||||
|
|
||||||
|
## Docker Configuration
|
||||||
|
|
||||||
|
### Ollama
|
||||||
```bash
|
```bash
|
||||||
export OLLAMA_API_BASE=http://192.168.31.2:11434
|
docker run -d \
|
||||||
aider --model ollama_chat/qwen2.5-coder:14b
|
--name ollama \
|
||||||
|
--restart unless-stopped \
|
||||||
|
--device /dev/dri \
|
||||||
|
-v /mnt/user/appdata/ollama:/root/.ollama \
|
||||||
|
-p 11434:11434 \
|
||||||
|
ollama/ollama
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Sanitized Knowledge Base
|
### Open WebUI
|
||||||
|
```bash
|
||||||
Upload `topology.md` to Open WebUI → Workspace → Knowledge section.
|
docker run -d \
|
||||||
|
--name open-webui \
|
||||||
---
|
--restart unless-stopped \
|
||||||
|
-p 3080:8080 \
|
||||||
## Phase 2: Hardware Migration (N100 → N5 Air)
|
-e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
|
||||||
|
-v /mnt/user/appdata/open-webui:/app/backend/data \
|
||||||
### 1. Clean Break (On N100)
|
ghcr.io/open-webui/open-webui:main
|
||||||
|
|
||||||
1. Stop all Docker containers
|
|
||||||
2. Stop Array: Main tab → Stop
|
|
||||||
3. Disable Auto-Start: Settings → Disk Settings → Enable Auto-Start: No
|
|
||||||
4. Uninstall "Intel GPU Top" plugin
|
|
||||||
5. Shutdown N100
|
|
||||||
|
|
||||||
### 2. N5 Air BIOS Configuration
|
|
||||||
|
|
||||||
Move Unraid USB and drives to N5 Air. Boot to BIOS (F2/Del):
|
|
||||||
|
|
||||||
| Setting | Value | Purpose |
|
|
||||||
|---------|-------|---------|
|
|
||||||
| SVM Mode / AMD-V | Enabled | Virtualization |
|
|
||||||
| UMA Frame Buffer Size | 8GB-16GB | RAM allocation for Radeon 780M |
|
|
||||||
| IOMMU | Enabled | Device passthrough |
|
|
||||||
|
|
||||||
### 3. N5 Air Integration
|
|
||||||
|
|
||||||
1. Boot Unraid on N5 Air
|
|
||||||
2. Install "AMD GPU Top" from Apps
|
|
||||||
3. Update Ollama Docker:
|
|
||||||
```
|
|
||||||
Extra Parameters: --device /dev/dri
|
|
||||||
Environment: HSA_OVERRIDE_GFX_VERSION=11.0.0
|
|
||||||
```
|
|
||||||
4. CPU Pinning: Settings → CPU Pinning → Assign 8-12 threads to Ollama
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 3: Network Topology for AI
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
## Network Map
|
|
||||||
- **Gateway/DNS:** 192.168.31.1 (MikroTik hAP ax³)
|
|
||||||
- **Unraid Server:** 192.168.31.2 (Docker host, AI stack)
|
|
||||||
- **AdGuard DNS:** 192.168.31.4 (macvlan on Unraid)
|
|
||||||
- **MikroTik AdGuard:** 172.17.0.2 (container, primary DNS)
|
|
||||||
- **MikroTik Tailscale:** 172.17.0.3 (container, VPN)
|
|
||||||
|
|
||||||
## AI-Manageable Hosts
|
|
||||||
| Host | IP | SSH Port | Key |
|
|
||||||
|------|-----|----------|-----|
|
|
||||||
| Unraid | 192.168.31.2 | 422 | id_ed25519_unraid |
|
|
||||||
| MikroTik | 192.168.31.1 | 2222 | mikrotik_key |
|
|
||||||
|
|
||||||
## Services
|
|
||||||
| Service | URL |
|
|
||||||
|---------|-----|
|
|
||||||
| Gitea | https://git.xtrm-lab.org |
|
|
||||||
| Woodpecker CI | https://ci.xtrm-lab.org |
|
|
||||||
| AdGuard (MikroTik) | http://192.168.31.1:3000 |
|
|
||||||
| AdGuard (Unraid) | http://192.168.31.4 |
|
|
||||||
| Ollama API | http://192.168.31.2:11434 |
|
|
||||||
| Open WebUI | http://192.168.31.2:3080 |
|
|
||||||
|
|
||||||
## Operational Protocol
|
|
||||||
1. Use SSH keys for all remote commands
|
|
||||||
2. Verify container status before changes: `docker ps` or `/container print`
|
|
||||||
3. Never output raw passwords or credentials
|
|
||||||
4. Document all infrastructure changes in git repo
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Tasks
|
## Usage
|
||||||
|
|
||||||
- [ ] Install Intel GPU Top plugin on Unraid
|
### Web Interface
|
||||||
- [ ] Deploy Ollama with `--device /dev/dri`
|
1. Open http://192.168.31.2:3080
|
||||||
- [ ] Configure Open WebUI with Ollama endpoint
|
2. Create admin account on first visit
|
||||||
- [ ] Generate AI agent SSH key
|
3. Select `qwen2.5-coder:7b` model
|
||||||
- [ ] Deploy key to MikroTik for remote management
|
4. Start chatting
|
||||||
- [ ] Install Aider on workstation
|
|
||||||
- [ ] Create and upload topology.md to Open WebUI
|
|
||||||
- [ ] Test AI queries against infrastructure
|
|
||||||
- [ ] (Future) Migrate to N5 Air hardware
|
|
||||||
|
|
||||||
## Notes
|
### API Access
|
||||||
|
```bash
|
||||||
|
# List models
|
||||||
|
curl http://192.168.31.2:11434/api/tags
|
||||||
|
|
||||||
- Current infrastructure repo: https://git.xtrm-lab.org/jazzymc/infrastructure
|
# Generate response (example)
|
||||||
- MikroTik containers use bridge network 172.17.0.0/24
|
curl http://192.168.31.2:11434/api/generate \
|
||||||
- Unraid SSH on non-standard port 422
|
-d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future Considerations
|
||||||
|
|
||||||
|
- **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b)
|
||||||
|
- **Dedicated GPU:** Would significantly improve inference speed
|
||||||
|
- **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull <model>`
|
||||||
|
|||||||
Reference in New Issue
Block a user