docs: Complete local AI stack documentation
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Deployed Ollama + Open WebUI on Unraid - Created custom unraid-assistant model with full infrastructure knowledge: - Network topology (8 VLANs, all IPs/gateways) - 45+ Docker containers with ports and purposes - RouterOS 7 commands and VLAN patterns - Traefik labels and Authentik SSO middleware - All xtrm-lab.org external URLs - Added /usr/local/bin/ai terminal helper command - Documented RAM optimization (stopped 5 containers) - Added future upgrade notes for Mac Mini M4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -3,11 +3,18 @@
|
|||||||
## 2026-01-26
|
## 2026-01-26
|
||||||
|
|
||||||
### Local AI Stack Deployed
|
### Local AI Stack Deployed
|
||||||
- [AI] Deployed Ollama container with Intel GPU passthrough (/dev/dri)
|
- [AI] Deployed Ollama container with Intel GPU passthrough
|
||||||
- [AI] Deployed Open WebUI at http://192.168.31.2:3080
|
- [AI] Deployed Open WebUI at http://192.168.31.2:3080
|
||||||
- [AI] Installed qwen2.5-coder:7b model (4.7GB)
|
- [AI] Installed qwen2.5-coder:7b base model
|
||||||
- [AI] Stopped non-critical containers to free ~4.8GB RAM:
|
- [AI] Created custom `unraid-assistant` model with infrastructure knowledge:
|
||||||
- karakeep, unimus, homarr, netdisco-web, netdisco-backend
|
- Network topology (all VLANs, IPs, gateways)
|
||||||
|
- 45+ Docker containers (names, ports, purposes)
|
||||||
|
- RouterOS 7 commands and patterns
|
||||||
|
- Traefik labels and Authentik middleware
|
||||||
|
- All external URLs (xtrm-lab.org)
|
||||||
|
- [AI] Created `/usr/local/bin/ai` terminal helper command
|
||||||
|
- [AI] Stopped non-critical containers for RAM: karakeep, unimus, homarr, netdisco-*
|
||||||
|
|
||||||
|
|
||||||
### VLAN Activation Attempt & Fixes
|
### VLAN Activation Attempt & Fixes
|
||||||
- [VLAN] Configured CSS326 switch VLANs via SwOS web interface
|
- [VLAN] Configured CSS326 switch VLANs via SwOS web interface
|
||||||
|
|||||||
@@ -13,11 +13,50 @@
|
|||||||
| Open WebUI | ✅ Running | http://192.168.31.2:3080 |
|
| Open WebUI | ✅ Running | http://192.168.31.2:3080 |
|
||||||
| Intel GPU | ✅ Enabled | /dev/dri passthrough |
|
| Intel GPU | ✅ Enabled | /dev/dri passthrough |
|
||||||
|
|
||||||
### Model Installed
|
### Models Installed
|
||||||
|
|
||||||
| Model | Size | Type |
|
| Model | Size | Type |
|
||||||
|-------|------|------|
|
|-------|------|------|
|
||||||
| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM |
|
| qwen2.5-coder:7b | 4.7 GB | Base coding LLM |
|
||||||
|
| unraid-assistant | 4.7 GB | Custom model with infrastructure knowledge |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Custom Model: unraid-assistant
|
||||||
|
|
||||||
|
A fine-tuned system prompt model that knows the xtrm-lab.org infrastructure:
|
||||||
|
|
||||||
|
### Knowledge Included
|
||||||
|
- **Network topology**: All VLANs (10,20,25,30,31,35,40,50), IPs, gateways
|
||||||
|
- **45+ Docker containers**: Names, images, ports, purposes
|
||||||
|
- **RouterOS 7**: Commands, VLAN patterns, firewall rules
|
||||||
|
- **Traefik**: Labels, routing, SSL configuration
|
||||||
|
- **Authentik**: SSO middleware, provider setup
|
||||||
|
- **External URLs**: All xtrm-lab.org services
|
||||||
|
|
||||||
|
### Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Terminal (SSH to Unraid)
|
||||||
|
ai "How do I add a device to the IoT VLAN?"
|
||||||
|
ai "What port is gitea running on?"
|
||||||
|
ai "Show me Traefik labels for a new app with Authentik"
|
||||||
|
|
||||||
|
# Interactive mode
|
||||||
|
ai
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rebuild Model
|
||||||
|
|
||||||
|
If infrastructure changes, update and rebuild:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Edit the Modelfile
|
||||||
|
nano /mnt/user/appdata/ollama/Modelfile-unraid
|
||||||
|
|
||||||
|
# Rebuild
|
||||||
|
docker exec ollama ollama create unraid-assistant -f /root/.ollama/Modelfile-unraid
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -25,16 +64,21 @@
|
|||||||
|
|
||||||
| Component | Spec |
|
| Component | Spec |
|
||||||
|-----------|------|
|
|-----------|------|
|
||||||
| CPU | Intel N100 |
|
| CPU | Intel N100 (4 cores) |
|
||||||
| RAM | 16GB (shared with Docker) |
|
| RAM | 16GB (shared with Docker) |
|
||||||
| GPU | Intel UHD (iGPU via /dev/dri) |
|
| GPU | Intel UHD (iGPU via /dev/dri) |
|
||||||
| Storage | 1.7TB free on array |
|
| Storage | 1.7TB free on array |
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
- ~1 token/sec with 7B models
|
||||||
|
- Responses take 30-90 seconds
|
||||||
|
- Suitable for occasional use, not real-time chat
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Containers Stopped for RAM
|
## Containers Stopped for RAM
|
||||||
|
|
||||||
To free ~4.8GB for AI workloads, these non-critical containers were stopped:
|
To free ~4.8GB for AI workloads:
|
||||||
|
|
||||||
| Container | RAM Freed | Purpose |
|
| Container | RAM Freed | Purpose |
|
||||||
|-----------|-----------|---------|
|
|-----------|-----------|---------|
|
||||||
@@ -75,30 +119,51 @@ docker run -d \
|
|||||||
ghcr.io/open-webui/open-webui:main
|
ghcr.io/open-webui/open-webui:main
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
### AI Command Helper
|
||||||
|
|
||||||
## Usage
|
|
||||||
|
|
||||||
### Web Interface
|
|
||||||
1. Open http://192.168.31.2:3080
|
|
||||||
2. Create admin account on first visit
|
|
||||||
3. Select `qwen2.5-coder:7b` model
|
|
||||||
4. Start chatting
|
|
||||||
|
|
||||||
### API Access
|
|
||||||
```bash
|
```bash
|
||||||
# List models
|
# /usr/local/bin/ai
|
||||||
curl http://192.168.31.2:11434/api/tags
|
#\!/bin/bash
|
||||||
|
MODEL="unraid-assistant"
|
||||||
# Generate response (example)
|
if [ $# -eq 0 ]; then
|
||||||
curl http://192.168.31.2:11434/api/generate \
|
docker exec -it ollama ollama run $MODEL
|
||||||
-d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'
|
else
|
||||||
|
docker exec ollama ollama run $MODEL "$*"
|
||||||
|
fi
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Future Considerations
|
## Open WebUI RAG Setup
|
||||||
|
|
||||||
- **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b)
|
For detailed documentation beyond system prompt:
|
||||||
- **Dedicated GPU:** Would significantly improve inference speed
|
|
||||||
- **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull <model>`
|
1. Go to http://192.168.31.2:3080
|
||||||
|
2. **Workspace** → **Knowledge** → **+ Create**
|
||||||
|
3. Name: `Infrastructure`
|
||||||
|
4. Upload docs from `/mnt/user/appdata/open-webui/docs/`
|
||||||
|
|
||||||
|
Infrastructure docs are pre-copied to that location.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future: Mac Mini M4 Upgrade
|
||||||
|
|
||||||
|
Planning to migrate AI stack to Mac Mini M4 (32GB):
|
||||||
|
|
||||||
|
| Metric | N100 (current) | M4 (planned) |
|
||||||
|
|--------|----------------|--------------|
|
||||||
|
| Speed | ~1 tok/s | ~15-20 tok/s |
|
||||||
|
| Max model | 7B | 70B+ |
|
||||||
|
| Response time | 30-90s | 3-5s |
|
||||||
|
|
||||||
|
The M4 unified memory architecture is ideal for LLM inference.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| /mnt/user/appdata/ollama/Modelfile-unraid | Custom model definition |
|
||||||
|
| /usr/local/bin/ai | Terminal helper command |
|
||||||
|
| /mnt/user/appdata/open-webui/docs/ | RAG documents |
|
||||||
|
|||||||
Reference in New Issue
Block a user