All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Deployed automatic failover for critical services (Traefik, Vaultwarden, Authentik, AdGuard) from Unraid to Nobara workstation via Keepalived VRRP with VIP 192.168.10.250. ~4 second failover time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
237 lines
5.0 KiB
Markdown
237 lines
5.0 KiB
Markdown
# Critical Services
|
|
|
|
**Last Updated:** 2026-01-31
|
|
|
|
Services that must remain operational for network functionality and security.
|
|
|
|
---
|
|
|
|
## Priority Levels
|
|
|
|
| Priority | Meaning | Recovery Target |
|
|
|----------|---------|-----------------|
|
|
| **P0** | Network offline without this | < 5 minutes |
|
|
| **P1** | Major functionality impacted | < 1 hour |
|
|
|
|
---
|
|
|
|
## P0 - Network Core
|
|
|
|
### DNS (AdGuard Home)
|
|
|
|
| Instance | Host | IP | Role |
|
|
|----------|------|-----|------|
|
|
| Primary | HAP1 (container) | 172.17.0.2 | Main DNS |
|
|
| Secondary | XTRM-U (macvlan) | 192.168.10.10 | Failover DNS |
|
|
|
|
**Failover:** Automatic via Netwatch (ping + DNS resolution checks)
|
|
|
|
**Config Sync:** adguardhome-sync (every 30 min, Unraid → MikroTik)
|
|
|
|
**Upstream:** Quad9 DoH (`https://dns.quad9.net/dns-query`)
|
|
|
|
**Web UI:**
|
|
- Primary: http://192.168.10.1:3000
|
|
- Secondary: http://192.168.10.10:3000
|
|
- Credentials: jazzymc / 7RqWElENNbZnPW
|
|
|
|
**Recovery:**
|
|
1. If primary fails → automatic failover to secondary (192.168.10.10)
|
|
2. Manual restart: `/container start [find name~"adguard"]`
|
|
|
|
---
|
|
|
|
### Routing (HAP1)
|
|
|
|
| Function | Details |
|
|
|----------|---------|
|
|
| WAN | 62.73.120.142 via Vivacom fiber |
|
|
| VLANs | 10 (Mgmt), 20 (Trusted), 25 (Kids), 30 (IoT), 40 (CatchAll) |
|
|
| NAT | Port forwarding to XTRM-U (192.168.10.20) |
|
|
| Firewall | RouterOS firewall rules |
|
|
|
|
**Recovery:**
|
|
1. Physical access to HAP1
|
|
2. Reset: hold reset button 5s
|
|
3. Reconfigure via WinBox or SSH (port 2222)
|
|
|
|
---
|
|
|
|
### DHCP (HAP1)
|
|
|
|
| VLAN | Pool | Range |
|
|
|------|------|-------|
|
|
| 10 (Mgmt) | pool-vlan10 | 192.168.10.100-200 |
|
|
| 20 (Trusted) | pool-vlan20 | 192.168.20.100-200 |
|
|
| 25 (Kids) | pool-vlan25 | 192.168.25.100-200 |
|
|
| 30 (IoT) | pool-vlan30 | 192.168.30.100-200 |
|
|
| 40 (CatchAll) | dhcp | 192.168.1.10-254 |
|
|
|
|
**Lease Time:** 30 minutes
|
|
|
|
---
|
|
|
|
## P1 - Authentication & Secrets
|
|
|
|
### Authentik (SSO)
|
|
|
|
| Component | IP | Purpose |
|
|
|-----------|-----|---------|
|
|
| authentik | 172.18.0.11 | Web UI + OIDC |
|
|
| authentik-worker | 172.18.0.12 | Background tasks |
|
|
|
|
**URL:** https://auth.xtrm-lab.org
|
|
|
|
**Protects:**
|
|
- Traefik forward auth (all *.xtrm-lab.org)
|
|
- Gitea OAuth
|
|
- Woodpecker OAuth
|
|
- NetBox OAuth
|
|
- NetDisco SSO
|
|
|
|
**Recovery:**
|
|
```bash
|
|
cd /mnt/user/appdata/authentik
|
|
docker compose up -d
|
|
```
|
|
|
|
**Database:** postgresql17 (authentik_db)
|
|
|
|
---
|
|
|
|
### Vaultwarden (Passwords)
|
|
|
|
| Component | IP | Purpose |
|
|
|-----------|-----|---------|
|
|
| vaultwarden | 172.18.0.15 | Password manager |
|
|
|
|
**URL:** https://vault.xtrm-lab.org
|
|
|
|
**Data:** `/mnt/user/appdata/vaultwarden/`
|
|
|
|
**Recovery:**
|
|
```bash
|
|
docker start vaultwarden
|
|
```
|
|
|
|
**Backup:** Part of Unraid flash backup
|
|
|
|
---
|
|
|
|
## P1 - Reverse Proxy
|
|
|
|
### Traefik
|
|
|
|
| Component | IP | Ports |
|
|
|-----------|-----|-------|
|
|
| traefik | 172.18.0.3 | 8001→80, 44301→443 |
|
|
|
|
**Config:** `/mnt/user/appdata/traefik/`
|
|
- `traefik.yml` - Static config
|
|
- `dynamic.yml` - Routers & services
|
|
|
|
**TLS:** Let's Encrypt wildcard for *.xtrm-lab.org
|
|
|
|
**Recovery:**
|
|
```bash
|
|
docker start traefik
|
|
```
|
|
|
|
---
|
|
|
|
## Shared Infrastructure
|
|
|
|
### PostgreSQL 17
|
|
|
|
| IP | Databases |
|
|
|----|-----------|
|
|
| 172.18.0.13 | authentik_db, netbox, gitea, netdisco_db, diode, hydra |
|
|
|
|
**Data:** `/mnt/user/appdata/postgresql17/`
|
|
|
|
**Recovery:**
|
|
```bash
|
|
docker start postgresql17
|
|
# Wait for DB to be ready before starting dependents
|
|
```
|
|
|
|
### Redis
|
|
|
|
| IP | Consumers |
|
|
|----|-----------|
|
|
| 172.18.0.14 | Authentik, NetBox, Diode |
|
|
|
|
**Recovery:**
|
|
```bash
|
|
docker start Redis
|
|
```
|
|
|
|
---
|
|
|
|
## Startup Order
|
|
|
|
When recovering from full outage:
|
|
|
|
1. **postgresql17** - Database (wait 30s)
|
|
2. **Redis** - Cache/queue (wait 10s)
|
|
3. **traefik** - Reverse proxy
|
|
4. **authentik** + **authentik-worker** - SSO
|
|
5. **vaultwarden** - Passwords
|
|
6. All other services
|
|
|
|
---
|
|
|
|
## Monitoring
|
|
|
|
### Uptime Kuma
|
|
|
|
| URL | Monitors |
|
|
|-----|----------|
|
|
| https://uptime.xtrm-lab.org | 27 services |
|
|
|
|
**Alerts:** Configured per service (email/webhook)
|
|
|
|
---
|
|
|
|
## Backup Strategy
|
|
|
|
| Data | Location | Frequency |
|
|
|------|----------|-----------|
|
|
| Unraid Flash | Google Drive | Daily |
|
|
| PostgreSQL | `/mnt/user/Backup/` | Daily |
|
|
| Vaultwarden | Unraid Flash | With flash backup |
|
|
| Authentik | PostgreSQL + `/mnt/user/appdata/authentik/` | Daily |
|
|
|
|
---
|
|
|
|
## Active Failover: XTRM-Nobara
|
|
|
|
Critical services are replicated on the Nobara workstation with automatic VRRP failover:
|
|
|
|
| Service | Primary (XTRM-U) | Failover (XTRM-Nobara) |
|
|
|---------|-------------------|------------------------|
|
|
| Traefik | 192.168.10.20 | 192.168.10.103 |
|
|
| Vaultwarden | 192.168.10.20 | 192.168.10.103 |
|
|
| Authentik | 192.168.10.20 | 192.168.10.103 |
|
|
| AdGuard Home | 192.168.10.20 | 192.168.10.103 |
|
|
|
|
**VIP:** 192.168.10.250 (floats between XTRM-U and XTRM-Nobara via Keepalived VRRP)
|
|
|
|
**Failover time:** ~4 seconds
|
|
|
|
See: `10-FAILOVER-NOBARA.md` for full documentation.
|
|
|
|
---
|
|
|
|
## Future: XTRM-N1 Survival Node
|
|
|
|
When hardware upgrade completes, these services will have replicas on XTRM-N1:
|
|
|
|
| Service | Primary | Replica |
|
|
|---------|---------|---------|
|
|
| DNS | HAP1 | XTRM-N1 |
|
|
| Vaultwarden | XTRM-N5 | XTRM-N1 |
|
|
| Authentik | XTRM-N5 | XTRM-N1 |
|
|
|
|
See: `wip/UPGRADE-2026-HARDWARE.md`
|