Files
infrastructure/docs/02-SERVICES-CRITICAL.md
T
jazzymc 38e83410bd docs: update Docker container inventory to current state
- Add new services: Ollama, Open WebUI, Overseerr, Radarr, Sonarr, Prowlarr,
  Zurg/Rclone, Plex Debrid, ntfy, Obsidian LiveSync, Supabase stack, pgAdmin4,
  Dockhand, SeekAndWatch, xtrm-agent, Tuya Gateway
- Remove NetBox and Diode (no longer running)
- Fix Speedtest URL (speed.xtrm-lab.org)
- Update monitor count to 24
- Update stopped/disabled services list
- Update shared databases and Redis consumers
2026-05-05 19:24:57 +00:00

5.0 KiB

Critical Services

Last Updated: 2026-05-05

Services that must remain operational for network functionality and security.


Priority Levels

Priority Meaning Recovery Target
P0 Network offline without this < 5 minutes
P1 Major functionality impacted < 1 hour

P0 - Network Core

DNS (AdGuard Home)

Instance Host IP Role
Primary HAP1 (container) 172.17.0.2 Main DNS
Secondary XTRM-U (macvlan) 192.168.10.10 Failover DNS

Failover: Automatic via Netwatch (ping + DNS resolution checks)

Config Sync: adguardhome-sync (every 30 min, Unraid → MikroTik)

Upstream: Quad9 DoH (https://dns.quad9.net/dns-query)

Web UI:

Recovery:

  1. If primary fails → automatic failover to secondary (192.168.10.10)
  2. Manual restart: /container start [find name~"adguard"]

Routing (HAP1)

Function Details
WAN 62.73.120.142 via Vivacom fiber
VLANs 10 (Mgmt), 20 (Trusted), 25 (Kids), 30 (IoT), 40 (CatchAll)
NAT Port forwarding to XTRM-U (192.168.10.20)
Firewall RouterOS firewall rules

Recovery:

  1. Physical access to HAP1
  2. Reset: hold reset button 5s
  3. Reconfigure via WinBox or SSH (port 2222)

DHCP (HAP1)

VLAN Pool Range
10 (Mgmt) pool-vlan10 192.168.10.100-200
20 (Trusted) pool-vlan20 192.168.20.100-200
25 (Kids) pool-vlan25 192.168.25.100-200
30 (IoT) pool-vlan30 192.168.30.100-200
40 (CatchAll) dhcp 192.168.1.10-254

Lease Time: 30 minutes


P1 - Authentication & Secrets

Authentik (SSO)

Component IP Purpose
authentik 172.18.0.11 Web UI + OIDC
authentik-worker 172.18.0.12 Background tasks

URL: https://auth.xtrm-lab.org

Protects:

  • Traefik forward auth (all *.xtrm-lab.org)
  • Gitea OAuth
  • Woodpecker OAuth
  • NetBox OAuth
  • NetDisco SSO

Recovery:

cd /mnt/user/appdata/authentik
docker compose up -d

Database: postgresql17 (authentik_db)


Vaultwarden (Passwords)

Component IP Purpose
vaultwarden 172.18.0.15 Password manager

URL: https://vault.xtrm-lab.org

Data: /mnt/user/appdata/vaultwarden/

Recovery:

docker start vaultwarden

Backup: Part of Unraid flash backup


P1 - Reverse Proxy

Traefik

Component IP Ports
traefik 172.18.0.3 8001→80, 44301→443

Config: /mnt/user/appdata/traefik/

  • traefik.yml - Static config
  • dynamic.yml - Routers & services

TLS: Let's Encrypt wildcard for *.xtrm-lab.org

Recovery:

docker start traefik

Shared Infrastructure

PostgreSQL 17

IP Databases
172.18.0.13 authentik_db, netbox, gitea, netdisco_db, diode, hydra

Data: /mnt/user/appdata/postgresql17/

Recovery:

docker start postgresql17
# Wait for DB to be ready before starting dependents

Redis

IP Consumers
172.18.0.14 Authentik, NetBox, Diode

Recovery:

docker start Redis

Startup Order

When recovering from full outage:

  1. postgresql17 - Database (wait 30s)
  2. Redis - Cache/queue (wait 10s)
  3. traefik - Reverse proxy
  4. authentik + authentik-worker - SSO
  5. vaultwarden - Passwords
  6. All other services

Monitoring

Uptime Kuma

URL Monitors
https://uptime.xtrm-lab.org 24 services

Alerts: Configured per service (email/webhook)


Backup Strategy

Data Location Frequency
Unraid Flash Google Drive Daily
PostgreSQL /mnt/user/Backup/ Daily
Vaultwarden Unraid Flash With flash backup
Authentik PostgreSQL + /mnt/user/appdata/authentik/ Daily

Active Failover: XTRM-Nobara

Critical services are replicated on the Nobara workstation with automatic VRRP failover:

Service Primary (XTRM-U) Failover (XTRM-Nobara)
Traefik 192.168.10.20 192.168.10.103
Vaultwarden 192.168.10.20 192.168.10.103
Authentik 192.168.10.20 192.168.10.103
AdGuard Home 192.168.10.20 192.168.10.103

VIP: 192.168.10.250 (floats between XTRM-U and XTRM-Nobara via Keepalived VRRP)

Failover time: ~4 seconds

See: 10-FAILOVER-NOBARA.md for full documentation.


Future: XTRM-N1 Survival Node

When hardware upgrade completes, these services will have replicas on XTRM-N1:

Service Primary Replica
DNS HAP1 XTRM-N1
Vaultwarden XTRM-N5 XTRM-N1
Authentik XTRM-N5 XTRM-N1

See: wip/UPGRADE-2026-HARDWARE.md