# Failover Infrastructure - Nobara (XTRM-Nobara) **Last Updated:** 2026-02-13 **Purpose:** Temporary failover for critical services during Unraid maintenance windows. --- ## Overview A Docker-based replica of critical services runs on the Nobara Linux workstation (XTRM-Nobara) with automatic failover via Keepalived VRRP. When Unraid goes offline, the virtual IP floats to Nobara and services continue operating. ``` Clients → 192.168.10.250 (VIP) → XTRM-U (MASTER, priority 150) ↓ failover (~4 seconds) XTRM-Nobara (BACKUP, priority 100) ``` --- ## Machines | Role | Host | IP | Interface | Priority | |------|------|-----|-----------|----------| | **MASTER** | XTRM-U (Unraid) | 192.168.10.20 | br0 | 150 | | **BACKUP** | XTRM-Nobara | 192.168.10.103 | enp5s0 | 100 | | **VIP** | Shared | 192.168.10.250 | — | — | --- ## Replicated Services | Service | Image | Ports (Nobara) | Domain | |---------|-------|----------------|--------| | **Traefik** | traefik:latest | 80, 443, 8080 | *.xtrm-lab.org | | **Vaultwarden** | vaultwarden/server:latest | internal:80 | vault.xtrm-lab.org | | **Authentik** | ghcr.io/goauthentik/server:2025.8.1 | internal:9000 | auth.xtrm-lab.org | | **Authentik Worker** | ghcr.io/goauthentik/server:2025.8.1 | — | — | | **PostgreSQL** | postgres:17 | internal:5432 | — | | **Redis** | redis:7-alpine | internal:6379 | — | | **AdGuard Home** | adguard/adguardhome:latest | 192.168.10.103:53, 3000 | — | --- ## File Locations ### Nobara (XTRM-Nobara) | Path | Contents | |------|----------| | `/home/failover/docker-compose.yml` | Main compose stack | | `/home/failover/traefik/` | Traefik config, certs, acme.json | | `/home/failover/vaultwarden/` | Vaultwarden data (copy from Unraid) | | `/home/failover/authentik/` | Authentik media & templates | | `/home/failover/postgres/` | PostgreSQL data + initial dump | | `/home/failover/redis/` | Redis data | | `/home/failover/adguard/` | AdGuard conf & work dirs | | `/etc/keepalived/keepalived.conf` | Keepalived VRRP config | | `/usr/local/bin/check_failover.sh` | Health check script | | `/usr/local/bin/failover-notify.sh` | State change notification script | | `/var/log/keepalived-failover.log` | Failover event log | ### Unraid (XTRM-U) | Path | Contents | |------|----------| | `/mnt/user/appdata/keepalived/keepalived.conf` | Keepalived VRRP config | | `/mnt/user/appdata/keepalived/check_services.sh` | Health check script | --- ## Keepalived Configuration ### VRRP Parameters | Parameter | Value | |-----------|-------| | Virtual Router ID | 51 | | Auth Type | PASS | | Auth Password | xtrm2026 | | Advertisement Interval | 1 second | | Health Check Interval | 5 seconds | | Fail Threshold | 3 missed checks | | Recovery Threshold | 2 successful checks | ### Unraid (MASTER) - Runs as Docker container: `local/keepalived` (built from alpine + keepalived + curl) - Priority: 150 (+ health check weight 2 = 152 when healthy) - Health check: curls `http://localhost:8183/api/overview` (Traefik dashboard) - Preemption: enabled (will reclaim VIP from Nobara when healthy) ```bash # Start/stop on Unraid docker start keepalived docker stop keepalived docker logs keepalived ``` ### Nobara (BACKUP) - Runs as systemd service: `keepalived.service` - Priority: 100 (+ health check weight 2 = 102 when healthy) - Health check: verifies Traefik and Vaultwarden containers are running - `nopreempt` set (won't fight for VIP if Unraid is healthy) ```bash # Start/stop on Nobara sudo systemctl start keepalived sudo systemctl stop keepalived sudo journalctl -u keepalived -f ``` --- ## DNS Strategy **Approach:** Local DNS override via AdGuard Home. To route traffic through the VIP for internal clients, configure AdGuard DNS rewrite rules to resolve `*.xtrm-lab.org` → `192.168.10.250`. External (Cloudflare) DNS remains pointed at Unraid's public IP. --- ## Operations ### Before Maintenance (Data Sync) Run these commands from the Mac to sync latest data to Nobara: ```bash # 1. Sync Vaultwarden data ssh unraid "tar czf - -C /mnt/user/appdata vaultwarden/" | \ ssh nobara "tar xzf - -C /home/failover/" # 2. Dump and sync Authentik database ssh unraid "docker exec postgresql17 pg_dump -U authentik_user authentik_db" | \ ssh nobara "cat > /home/failover/postgres/authentik_dump.sql" # 3. Sync AdGuard config ssh unraid "tar czf - -C /mnt/user/appdata/adguardhome conf/ work/" | \ ssh nobara "tar xzf - -C /home/failover/adguard/" # 4. Sync Traefik config and certs ssh unraid "tar czf - -C /mnt/user/appdata/traefik traefik.yml dynamic.yml acme.json certs/" | \ ssh nobara "tar xzf - -C /home/failover/traefik/" ``` **Note:** `ssh unraid` = `ssh -i ~/.ssh/id_ed25519_unraid -p 422 root@192.168.10.20` ### Start Failover Services ```bash # On Nobara cd /home/failover sudo docker compose up -d sudo systemctl start keepalived ``` ### Stop Failover Services ```bash # On Nobara cd /home/failover sudo docker compose down sudo systemctl stop keepalived ``` ### Test Failover ```bash # 1. Check VIP location ssh unraid "ip addr show br0 | grep inet" ssh nobara "ip addr show enp5s0 | grep inet" # 2. Simulate Unraid failure ssh unraid "docker stop keepalived" # 3. Verify VIP moved to Nobara (wait ~4 seconds) ssh nobara "ip addr show enp5s0 | grep inet" # 4. Restore Unraid ssh unraid "docker start keepalived" # 5. Verify VIP returned to Unraid ssh unraid "ip addr show br0 | grep inet" ``` ### Check Status ```bash # Nobara service status ssh nobara "sudo docker ps --format 'table {{.Names}}\t{{.Status}}'" # Nobara keepalived state ssh nobara "sudo journalctl -u keepalived -n 10 --no-pager" # Unraid keepalived state ssh unraid "docker logs keepalived --tail 10" # Which machine holds the VIP? ping -c 1 192.168.10.250 ``` --- ## Traefik Configuration (Failover) The Nobara Traefik instance has a **reduced** dynamic.yml that only serves the four critical services: | Router | Domain | Backend | |--------|--------|---------| | vaultwarden-secure | vault.xtrm-lab.org | http://vaultwarden:80 | | authentik-secure | auth.xtrm-lab.org | http://authentik:9000 | | traefik-secure | traefik.xtrm-lab.org | api@internal | TLS certificates are shared (copied from Unraid's acme.json + static certs). --- ## Limitations - **Data is a point-in-time snapshot.** Changes made on Unraid after the last sync are not reflected on Nobara. Re-sync before maintenance. - **No real-time replication.** Vaultwarden passwords saved during failover will not sync back to Unraid automatically. - **Only critical services replicated.** Other services (Plex, Gitea, NetBox, etc.) will be offline during maintenance. - **External DNS not updated.** Failover only works for clients using the local DNS (AdGuard) that resolves to the VIP. External access via Cloudflare will not failover. --- ## SSH Access ```bash # From Mac to Nobara (passwordless, key-based) ssh nobara # or: ssh -i ~/.ssh/id_ed25519_nobara jazzymc@192.168.10.103 # Sudo on Nobara requires password: (check password manager) ``` --- ## Recovery After Maintenance 1. Bring Unraid back online 2. Verify all Unraid services are running: `docker ps` 3. Keepalived on Unraid will auto-reclaim VIP (preemption) 4. Stop failover on Nobara: `cd /home/failover && sudo docker compose down` 5. If Vaultwarden was used during failover, manually export/import any new entries --- ## Architecture Diagram ``` ┌─────────────────────┐ │ 192.168.10.250 │ │ (VRRP VIP) │ └─────────┬───────────┘ │ ┌───────────────┼───────────────┐ │ │ ┌─────────▼─────────┐ ┌─────────▼─────────┐ │ XTRM-U (Unraid) │ │ XTRM-Nobara │ │ 192.168.10.20 │ │ 192.168.10.103 │ │ MASTER (150) │ │ BACKUP (100) │ │ │ │ │ │ ┌──────────────┐ │ │ ┌──────────────┐ │ │ │ Traefik │ │ │ │ Traefik │ │ │ │ Vaultwarden │ │ │ │ Vaultwarden │ │ │ │ Authentik │ │ │ │ Authentik │ │ │ │ AdGuard │ │ │ │ AdGuard │ │ │ │ + 25 more │ │ │ │ PostgreSQL │ │ │ └──────────────┘ │ │ │ Redis │ │ │ │ │ └──────────────┘ │ │ Keepalived (Docker)│ │ Keepalived (systemd)│ └────────────────────┘ └────────────────────┘ ```