# Critical Services **Last Updated:** 2026-01-25 Services that must remain operational for network functionality and security. --- ## Priority Levels | Priority | Meaning | Recovery Target | |----------|---------|-----------------| | **P0** | Network offline without this | < 5 minutes | | **P1** | Major functionality impacted | < 1 hour | --- ## P0 - Network Core ### DNS (AdGuard Home) | Instance | Host | IP | Role | |----------|------|-----|------| | Primary | HAP1 | 172.17.0.5 | Main DNS, DoH/DoT/DoQ | | Secondary | XTRM-U | 192.168.31.4 | Failover DNS | **Endpoints:** - DoH: `https://dns.xtrm-lab.org/dns-query` - DoT: `tls://dns.xtrm-lab.org:853` - DoQ: `quic://dns.xtrm-lab.org:8853` **Config Sync:** adguardhome-sync (every 30 min) **Upstream:** Quad9 DoH (`https://dns10.quad9.net/dns-query`) **Recovery:** 1. If primary fails → clients use secondary (192.168.31.4) 2. Restart container on HAP1: `/container/start adguardhome` --- ### Routing (HAP1) | Function | Details | |----------|---------| | WAN | 62.73.120.142 via Vivacom fiber | | LAN | 192.168.31.0/24 | | NAT | Port forwarding to XTRM-U | | Firewall | RouterOS firewall rules | **Recovery:** 1. Physical access to HAP1 2. Reset: hold reset button 5s 3. Reconfigure via WinBox or SSH --- ### DHCP (HAP1) | Pool | Range | |------|-------| | Dynamic | 192.168.31.100-200 | | Lease Time | 24 hours | **Static Leases:** Managed in RouterOS DHCP server --- ## P1 - Authentication & Secrets ### Authentik (SSO) | Component | IP | Purpose | |-----------|-----|---------| | authentik | 172.18.0.11 | Web UI + OIDC | | authentik-worker | 172.18.0.12 | Background tasks | **URL:** https://auth.xtrm-lab.org **Protects:** - Traefik forward auth (all *.xtrm-lab.org) - Gitea OAuth - Woodpecker OAuth - NetBox OAuth - NetDisco SSO **Recovery:** ```bash cd /mnt/user/appdata/authentik docker compose up -d ``` **Database:** postgresql17 (authentik_db) --- ### Vaultwarden (Passwords) | Component | IP | Purpose | |-----------|-----|---------| | vaultwarden | 172.18.0.15 | Password manager | **URL:** https://vault.xtrm-lab.org **Data:** `/mnt/user/appdata/vaultwarden/` **Recovery:** ```bash docker start vaultwarden ``` **Backup:** Part of Unraid flash backup --- ## P1 - Reverse Proxy ### Traefik | Component | IP | Ports | |-----------|-----|-------| | traefik | 172.18.0.3 | 8001→80, 44301→443 | **Config:** `/mnt/user/appdata/traefik/` - `traefik.yml` - Static config - `dynamic.yml` - Routers & services **TLS:** Let's Encrypt wildcard for *.xtrm-lab.org **Recovery:** ```bash docker start traefik ``` --- ## Shared Infrastructure ### PostgreSQL 17 | IP | Databases | |----|-----------| | 172.18.0.13 | authentik_db, netbox, gitea, netdisco_db, diode, hydra | **Data:** `/mnt/user/appdata/postgresql17/` **Recovery:** ```bash docker start postgresql17 # Wait for DB to be ready before starting dependents ``` ### Redis | IP | Consumers | |----|-----------| | 172.18.0.14 | Authentik, NetBox, Diode | **Recovery:** ```bash docker start Redis ``` --- ## Startup Order When recovering from full outage: 1. **postgresql17** - Database (wait 30s) 2. **Redis** - Cache/queue (wait 10s) 3. **traefik** - Reverse proxy 4. **authentik** + **authentik-worker** - SSO 5. **vaultwarden** - Passwords 6. All other services --- ## Monitoring ### Uptime Kuma | URL | Monitors | |-----|----------| | https://uptime.xtrm-lab.org | 27 services | **Alerts:** Configured per service (email/webhook) --- ## Backup Strategy | Data | Location | Frequency | |------|----------|-----------| | Unraid Flash | Google Drive | Daily | | PostgreSQL | `/mnt/user/Backup/` | Daily | | Vaultwarden | Unraid Flash | With flash backup | | Authentik | PostgreSQL + `/mnt/user/appdata/authentik/` | Daily | --- ## Future: XTRM-N1 Survival Node When hardware upgrade completes, these services will have replicas on XTRM-N1: | Service | Primary | Replica | |---------|---------|---------| | DNS | HAP1 | XTRM-N1 | | Vaultwarden | XTRM-N5 | XTRM-N1 | | Authentik | XTRM-N5 | XTRM-N1 | See: `wip/UPGRADE-2026-HARDWARE.md`