# Local AI Stack on Unraid **Status:** ✅ Deployed **Last Updated:** 2026-01-26 --- ## Current Deployment | Component | Status | URL/Port | |-----------|--------|----------| | Ollama | ✅ Running | http://192.168.31.2:11434 | | Open WebUI | ✅ Running | http://192.168.31.2:3080 | | Intel GPU | ✅ Enabled | /dev/dri passthrough | ### Model Installed | Model | Size | Type | |-------|------|------| | qwen2.5-coder:7b | 4.7 GB | Code-focused LLM | --- ## Hardware | Component | Spec | |-----------|------| | CPU | Intel N100 | | RAM | 16GB (shared with Docker) | | GPU | Intel UHD (iGPU via /dev/dri) | | Storage | 1.7TB free on array | --- ## Containers Stopped for RAM To free ~4.8GB for AI workloads, these non-critical containers were stopped: | Container | RAM Freed | Purpose | |-----------|-----------|---------| | karakeep | 1.68 GB | Bookmark manager | | unimus | 1.62 GB | Network backup | | homarr | 686 MB | Dashboard | | netdisco-web | 531 MB | Network discovery UI | | netdisco-backend | 291 MB | Network discovery | To restart if needed: ```bash docker start karakeep unimus homarr netdisco-web netdisco-backend ``` --- ## Docker Configuration ### Ollama ```bash docker run -d \ --name ollama \ --restart unless-stopped \ --device /dev/dri \ -v /mnt/user/appdata/ollama:/root/.ollama \ -p 11434:11434 \ ollama/ollama ``` ### Open WebUI ```bash docker run -d \ --name open-webui \ --restart unless-stopped \ -p 3080:8080 \ -e OLLAMA_BASE_URL=http://192.168.31.2:11434 \ -v /mnt/user/appdata/open-webui:/app/backend/data \ ghcr.io/open-webui/open-webui:main ``` --- ## Usage ### Web Interface 1. Open http://192.168.31.2:3080 2. Create admin account on first visit 3. Select `qwen2.5-coder:7b` model 4. Start chatting ### API Access ```bash # List models curl http://192.168.31.2:11434/api/tags # Generate response (example) curl http://192.168.31.2:11434/api/generate \ -d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}' ``` --- ## Future Considerations - **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b) - **Dedicated GPU:** Would significantly improve inference speed - **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull `