Files
infrastructure/docs/wip/LOCAL-AI-STACK.md
XTRM-Unraid a80415f66b
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
docs: Update LOCAL-AI-STACK.md with deployment status
- Ollama and Open WebUI deployed and running
- qwen2.5-coder:7b model installed (4.7GB)
- Intel GPU passthrough enabled
- Stopped non-critical containers for RAM
- Added docker commands and usage instructions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 15:56:48 +02:00

105 lines
2.2 KiB
Markdown

# Local AI Stack on Unraid
**Status:** ✅ Deployed
**Last Updated:** 2026-01-26
---
## Current Deployment
| Component | Status | URL/Port |
|-----------|--------|----------|
| Ollama | ✅ Running | http://192.168.31.2:11434 |
| Open WebUI | ✅ Running | http://192.168.31.2:3080 |
| Intel GPU | ✅ Enabled | /dev/dri passthrough |
### Model Installed
| Model | Size | Type |
|-------|------|------|
| qwen2.5-coder:7b | 4.7 GB | Code-focused LLM |
---
## Hardware
| Component | Spec |
|-----------|------|
| CPU | Intel N100 |
| RAM | 16GB (shared with Docker) |
| GPU | Intel UHD (iGPU via /dev/dri) |
| Storage | 1.7TB free on array |
---
## Containers Stopped for RAM
To free ~4.8GB for AI workloads, these non-critical containers were stopped:
| Container | RAM Freed | Purpose |
|-----------|-----------|---------|
| karakeep | 1.68 GB | Bookmark manager |
| unimus | 1.62 GB | Network backup |
| homarr | 686 MB | Dashboard |
| netdisco-web | 531 MB | Network discovery UI |
| netdisco-backend | 291 MB | Network discovery |
To restart if needed:
```bash
docker start karakeep unimus homarr netdisco-web netdisco-backend
```
---
## Docker Configuration
### Ollama
```bash
docker run -d \
--name ollama \
--restart unless-stopped \
--device /dev/dri \
-v /mnt/user/appdata/ollama:/root/.ollama \
-p 11434:11434 \
ollama/ollama
```
### Open WebUI
```bash
docker run -d \
--name open-webui \
--restart unless-stopped \
-p 3080:8080 \
-e OLLAMA_BASE_URL=http://192.168.31.2:11434 \
-v /mnt/user/appdata/open-webui:/app/backend/data \
ghcr.io/open-webui/open-webui:main
```
---
## Usage
### Web Interface
1. Open http://192.168.31.2:3080
2. Create admin account on first visit
3. Select `qwen2.5-coder:7b` model
4. Start chatting
### API Access
```bash
# List models
curl http://192.168.31.2:11434/api/tags
# Generate response (example)
curl http://192.168.31.2:11434/api/generate \
-d '{"model": "qwen2.5-coder:7b", "prompt": "Hello"}'
```
---
## Future Considerations
- **More RAM:** With 32GB+ RAM, could run larger models (14b, 32b)
- **Dedicated GPU:** Would significantly improve inference speed
- **Additional models:** Can pull more models as needed with `docker exec ollama ollama pull <model>`