Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
- Ollama and Open WebUI deployed and running - qwen2.5-coder:7b model installed (4.7GB) - Intel GPU passthrough enabled - Stopped non-critical containers for RAM - Added docker commands and usage instructions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
105 lines
2.2 KiB
Markdown
105 lines
2.2 KiB
Markdown
# Local AI Stack on Unraid
|
|
|
|
**Status:** ✅ Deployed
|
|
**Last Updated:** 2026-01-26
|
|
|
|
---
|
|
|
|
## Current Deployment
|
|
|
|
| Component | Status | URL/Port |
|
|
|-----------|--------|----------|
|
|
| Ollama | ✅ Running | http://192.168.31.2:11434 |
|
|
| Open WebUI | ✅ Running | http://192.168.31.2:3080 |
|
|
| Intel GPU | ✅ Enabled | /dev/dri passthrough |
|
|
|
|
### Model Installed
|
|
|
|
| Model | Size | Type |
|
|
|-------|------|------|
|
|
| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM |
|
|
|
|
---
|
|
|
|
## Hardware
|
|
|
|
| Component | Spec |
|
|
|-----------|------|
|
|
| CPU | Intel N100 |
|
|
| RAM | 16GB (shared with Docker) |
|
|
| GPU | Intel UHD (iGPU via /dev/dri) |
|
|
| Storage | 1.7TB free on array |
|
|
|
|
---
|
|
|
|
## Containers Stopped for RAM
|
|
|
|
To free ~4.8GB for AI workloads, these non-critical containers were stopped:
|
|
|
|
| Container | RAM Freed | Purpose |
|
|
|-----------|-----------|---------|
|
|
| karakeep | 1.68 GB | Bookmark manager |
|
|
| unimus | 1.62 GB | Network backup |
|
|
| homarr | 686 MB | Dashboard |
|
|
| netdisco-web | 531 MB | Network discovery UI |
|
|
| netdisco-backend | 291 MB | Network discovery |
|
|
|
|
To restart if needed:
|
|
```bash
|
|
docker start karakeep unimus homarr netdisco-web netdisco-backend
|
|
```
|
|
|
|
---
|
|
|
|
## Docker Configuration
|
|
|
|
### Ollama
|
|
```bash
|
|
docker run -d \
|
|
--name ollama \
|
|
--restart unless-stopped \
|
|
--device /dev/dri \
|
|
-v /mnt/user/appdata/ollama:/root/.ollama \
|
|
-p 11434:11434 \
|
|
ollama/ollama
|
|
```
|
|
|
|
### Open WebUI
|
|
```bash
|
|
docker run -d \
|
|
--name open-webui \
|
|
--restart unless-stopped \
|
|
-p 3080:8080 \
|
|
-e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
|
|
-v /mnt/user/appdata/open-webui:/app/backend/data \
|
|
ghcr.io/open-webui/open-webui:main
|
|
```
|
|
|
|
---
|
|
|
|
## Usage
|
|
|
|
### Web Interface
|
|
1. Open http://192.168.31.2:3080
|
|
2. Create admin account on first visit
|
|
3. Select `qwen2.5-coder:7b` model
|
|
4. Start chatting
|
|
|
|
### API Access
|
|
```bash
|
|
# List models
|
|
curl http://192.168.31.2:11434/api/tags
|
|
|
|
# Generate response (example)
|
|
curl http://192.168.31.2:11434/api/generate \
|
|
-d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'
|
|
```
|
|
|
|
---
|
|
|
|
## Future Considerations
|
|
|
|
- **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b)
|
|
- **Dedicated GPU:** Would significantly improve inference speed
|
|
- **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull <model>`
|