# Critical Services **Last Updated:** 2026-01-31 Services that must remain operational for network functionality and security. --- ## Priority Levels | Priority | Meaning | Recovery Target | |----------|---------|-----------------| | **P0** | Network offline without this | < 5 minutes | | **P1** | Major functionality impacted | < 1 hour | --- ## P0 - Network Core ### DNS (AdGuard Home) | Instance | Host | IP | Role | |----------|------|-----|------| | Primary | HAP1 (container) | 172.17.0.2 | Main DNS | | Secondary | XTRM-U (macvlan) | 192.168.10.10 | Failover DNS | **Failover:** Automatic via Netwatch (ping + DNS resolution checks) **Config Sync:** adguardhome-sync (every 30 min, Unraid → MikroTik) **Upstream:** Quad9 DoH (`https://dns.quad9.net/dns-query`) **Web UI:** - Primary: http://192.168.10.1:3000 - Secondary: http://192.168.10.10:3000 - Credentials: jazzymc / 7RqWElENNbZnPW **Recovery:** 1. If primary fails → automatic failover to secondary (192.168.10.10) 2. Manual restart: `/container start [find name~"adguard"]` --- ### Routing (HAP1) | Function | Details | |----------|---------| | WAN | 62.73.120.142 via Vivacom fiber | | VLANs | 10 (Mgmt), 20 (Trusted), 25 (Kids), 30 (IoT), 40 (CatchAll) | | NAT | Port forwarding to XTRM-U (192.168.10.20) | | Firewall | RouterOS firewall rules | **Recovery:** 1. Physical access to HAP1 2. Reset: hold reset button 5s 3. Reconfigure via WinBox or SSH (port 2222) --- ### DHCP (HAP1) | VLAN | Pool | Range | |------|------|-------| | 10 (Mgmt) | pool-vlan10 | 192.168.10.100-200 | | 20 (Trusted) | pool-vlan20 | 192.168.20.100-200 | | 25 (Kids) | pool-vlan25 | 192.168.25.100-200 | | 30 (IoT) | pool-vlan30 | 192.168.30.100-200 | | 40 (CatchAll) | dhcp | 192.168.1.10-254 | **Lease Time:** 30 minutes --- ## P1 - Authentication & Secrets ### Authentik (SSO) | Component | IP | Purpose | |-----------|-----|---------| | authentik | 172.18.0.11 | Web UI + OIDC | | authentik-worker | 172.18.0.12 | Background tasks | **URL:** https://auth.xtrm-lab.org **Protects:** - Traefik forward auth (all *.xtrm-lab.org) - Gitea OAuth - Woodpecker OAuth - NetBox OAuth - NetDisco SSO **Recovery:** ```bash cd /mnt/user/appdata/authentik docker compose up -d ``` **Database:** postgresql17 (authentik_db) --- ### Vaultwarden (Passwords) | Component | IP | Purpose | |-----------|-----|---------| | vaultwarden | 172.18.0.15 | Password manager | **URL:** https://vault.xtrm-lab.org **Data:** `/mnt/user/appdata/vaultwarden/` **Recovery:** ```bash docker start vaultwarden ``` **Backup:** Part of Unraid flash backup --- ## P1 - Reverse Proxy ### Traefik | Component | IP | Ports | |-----------|-----|-------| | traefik | 172.18.0.3 | 8001→80, 44301→443 | **Config:** `/mnt/user/appdata/traefik/` - `traefik.yml` - Static config - `dynamic.yml` - Routers & services **TLS:** Let's Encrypt wildcard for *.xtrm-lab.org **Recovery:** ```bash docker start traefik ``` --- ## Shared Infrastructure ### PostgreSQL 17 | IP | Databases | |----|-----------| | 172.18.0.13 | authentik_db, netbox, gitea, netdisco_db, diode, hydra | **Data:** `/mnt/user/appdata/postgresql17/` **Recovery:** ```bash docker start postgresql17 # Wait for DB to be ready before starting dependents ``` ### Redis | IP | Consumers | |----|-----------| | 172.18.0.14 | Authentik, NetBox, Diode | **Recovery:** ```bash docker start Redis ``` --- ## Startup Order When recovering from full outage: 1. **postgresql17** - Database (wait 30s) 2. **Redis** - Cache/queue (wait 10s) 3. **traefik** - Reverse proxy 4. **authentik** + **authentik-worker** - SSO 5. **vaultwarden** - Passwords 6. All other services --- ## Monitoring ### Uptime Kuma | URL | Monitors | |-----|----------| | https://uptime.xtrm-lab.org | 27 services | **Alerts:** Configured per service (email/webhook) --- ## Backup Strategy | Data | Location | Frequency | |------|----------|-----------| | Unraid Flash | Google Drive | Daily | | PostgreSQL | `/mnt/user/Backup/` | Daily | | Vaultwarden | Unraid Flash | With flash backup | | Authentik | PostgreSQL + `/mnt/user/appdata/authentik/` | Daily | --- ## Active Failover: XTRM-Nobara Critical services are replicated on the Nobara workstation with automatic VRRP failover: | Service | Primary (XTRM-U) | Failover (XTRM-Nobara) | |---------|-------------------|------------------------| | Traefik | 192.168.10.20 | 192.168.10.103 | | Vaultwarden | 192.168.10.20 | 192.168.10.103 | | Authentik | 192.168.10.20 | 192.168.10.103 | | AdGuard Home | 192.168.10.20 | 192.168.10.103 | **VIP:** 192.168.10.250 (floats between XTRM-U and XTRM-Nobara via Keepalived VRRP) **Failover time:** ~4 seconds See: `10-FAILOVER-NOBARA.md` for full documentation. --- ## Future: XTRM-N1 Survival Node When hardware upgrade completes, these services will have replicas on XTRM-N1: | Service | Primary | Replica | |---------|---------|---------| | DNS | HAP1 | XTRM-N1 | | Vaultwarden | XTRM-N5 | XTRM-N1 | | Authentik | XTRM-N5 | XTRM-N1 | See: `wip/UPGRADE-2026-HARDWARE.md`