Deployed automatic failover for critical services (Traefik, Vaultwarden, Authentik, AdGuard) from Unraid to Nobara workstation via Keepalived VRRP with VIP 192.168.10.250. ~4 second failover time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9.1 KiB
Failover Infrastructure - Nobara (XTRM-Nobara)
Last Updated: 2026-02-13
Purpose: Temporary failover for critical services during Unraid maintenance windows.
Overview
A Docker-based replica of critical services runs on the Nobara Linux workstation (XTRM-Nobara) with automatic failover via Keepalived VRRP. When Unraid goes offline, the virtual IP floats to Nobara and services continue operating.
Clients → 192.168.10.250 (VIP) → XTRM-U (MASTER, priority 150)
↓ failover (~4 seconds)
XTRM-Nobara (BACKUP, priority 100)
Machines
| Role | Host | IP | Interface | Priority |
|---|---|---|---|---|
| MASTER | XTRM-U (Unraid) | 192.168.10.20 | br0 | 150 |
| BACKUP | XTRM-Nobara | 192.168.10.103 | enp5s0 | 100 |
| VIP | Shared | 192.168.10.250 | — | — |
Replicated Services
| Service | Image | Ports (Nobara) | Domain |
|---|---|---|---|
| Traefik | traefik:latest | 80, 443, 8080 | *.xtrm-lab.org |
| Vaultwarden | vaultwarden/server:latest | internal:80 | vault.xtrm-lab.org |
| Authentik | ghcr.io/goauthentik/server:2025.8.1 | internal:9000 | auth.xtrm-lab.org |
| Authentik Worker | ghcr.io/goauthentik/server:2025.8.1 | — | — |
| PostgreSQL | postgres:17 | internal:5432 | — |
| Redis | redis:7-alpine | internal:6379 | — |
| AdGuard Home | adguard/adguardhome:latest | 192.168.10.103:53, 3000 | — |
File Locations
Nobara (XTRM-Nobara)
| Path | Contents |
|---|---|
/home/failover/docker-compose.yml |
Main compose stack |
/home/failover/traefik/ |
Traefik config, certs, acme.json |
/home/failover/vaultwarden/ |
Vaultwarden data (copy from Unraid) |
/home/failover/authentik/ |
Authentik media & templates |
/home/failover/postgres/ |
PostgreSQL data + initial dump |
/home/failover/redis/ |
Redis data |
/home/failover/adguard/ |
AdGuard conf & work dirs |
/etc/keepalived/keepalived.conf |
Keepalived VRRP config |
/usr/local/bin/check_failover.sh |
Health check script |
/usr/local/bin/failover-notify.sh |
State change notification script |
/var/log/keepalived-failover.log |
Failover event log |
Unraid (XTRM-U)
| Path | Contents |
|---|---|
/mnt/user/appdata/keepalived/keepalived.conf |
Keepalived VRRP config |
/mnt/user/appdata/keepalived/check_services.sh |
Health check script |
Keepalived Configuration
VRRP Parameters
| Parameter | Value |
|---|---|
| Virtual Router ID | 51 |
| Auth Type | PASS |
| Auth Password | xtrm2026 |
| Advertisement Interval | 1 second |
| Health Check Interval | 5 seconds |
| Fail Threshold | 3 missed checks |
| Recovery Threshold | 2 successful checks |
Unraid (MASTER)
- Runs as Docker container:
local/keepalived(built from alpine + keepalived + curl) - Priority: 150 (+ health check weight 2 = 152 when healthy)
- Health check: curls
http://localhost:8183/api/overview(Traefik dashboard) - Preemption: enabled (will reclaim VIP from Nobara when healthy)
# Start/stop on Unraid
docker start keepalived
docker stop keepalived
docker logs keepalived
Nobara (BACKUP)
- Runs as systemd service:
keepalived.service - Priority: 100 (+ health check weight 2 = 102 when healthy)
- Health check: verifies Traefik and Vaultwarden containers are running
nopreemptset (won't fight for VIP if Unraid is healthy)
# Start/stop on Nobara
sudo systemctl start keepalived
sudo systemctl stop keepalived
sudo journalctl -u keepalived -f
DNS Strategy
Approach: Local DNS override via AdGuard Home.
To route traffic through the VIP for internal clients, configure AdGuard DNS rewrite rules to resolve *.xtrm-lab.org → 192.168.10.250. External (Cloudflare) DNS remains pointed at Unraid's public IP.
Operations
Before Maintenance (Data Sync)
Run these commands from the Mac to sync latest data to Nobara:
# 1. Sync Vaultwarden data
ssh unraid "tar czf - -C /mnt/user/appdata vaultwarden/" | \
ssh nobara "tar xzf - -C /home/failover/"
# 2. Dump and sync Authentik database
ssh unraid "docker exec postgresql17 pg_dump -U authentik_user authentik_db" | \
ssh nobara "cat > /home/failover/postgres/authentik_dump.sql"
# 3. Sync AdGuard config
ssh unraid "tar czf - -C /mnt/user/appdata/adguardhome conf/ work/" | \
ssh nobara "tar xzf - -C /home/failover/adguard/"
# 4. Sync Traefik config and certs
ssh unraid "tar czf - -C /mnt/user/appdata/traefik traefik.yml dynamic.yml acme.json certs/" | \
ssh nobara "tar xzf - -C /home/failover/traefik/"
Note: ssh unraid = ssh -i ~/.ssh/id_ed25519_unraid -p 422 root@192.168.10.20
Start Failover Services
# On Nobara
cd /home/failover
sudo docker compose up -d
sudo systemctl start keepalived
Stop Failover Services
# On Nobara
cd /home/failover
sudo docker compose down
sudo systemctl stop keepalived
Test Failover
# 1. Check VIP location
ssh unraid "ip addr show br0 | grep inet"
ssh nobara "ip addr show enp5s0 | grep inet"
# 2. Simulate Unraid failure
ssh unraid "docker stop keepalived"
# 3. Verify VIP moved to Nobara (wait ~4 seconds)
ssh nobara "ip addr show enp5s0 | grep inet"
# 4. Restore Unraid
ssh unraid "docker start keepalived"
# 5. Verify VIP returned to Unraid
ssh unraid "ip addr show br0 | grep inet"
Check Status
# Nobara service status
ssh nobara "sudo docker ps --format 'table {{.Names}}\t{{.Status}}'"
# Nobara keepalived state
ssh nobara "sudo journalctl -u keepalived -n 10 --no-pager"
# Unraid keepalived state
ssh unraid "docker logs keepalived --tail 10"
# Which machine holds the VIP?
ping -c 1 192.168.10.250
Traefik Configuration (Failover)
The Nobara Traefik instance has a reduced dynamic.yml that only serves the four critical services:
| Router | Domain | Backend |
|---|---|---|
| vaultwarden-secure | vault.xtrm-lab.org | http://vaultwarden:80 |
| authentik-secure | auth.xtrm-lab.org | http://authentik:9000 |
| traefik-secure | traefik.xtrm-lab.org | api@internal |
TLS certificates are shared (copied from Unraid's acme.json + static certs).
Limitations
- Data is a point-in-time snapshot. Changes made on Unraid after the last sync are not reflected on Nobara. Re-sync before maintenance.
- No real-time replication. Vaultwarden passwords saved during failover will not sync back to Unraid automatically.
- Only critical services replicated. Other services (Plex, Gitea, NetBox, etc.) will be offline during maintenance.
- External DNS not updated. Failover only works for clients using the local DNS (AdGuard) that resolves to the VIP. External access via Cloudflare will not failover.
SSH Access
# From Mac to Nobara (passwordless, key-based)
ssh nobara
# or: ssh -i ~/.ssh/id_ed25519_nobara jazzymc@192.168.10.103
# Sudo on Nobara requires password: (check password manager)
Recovery After Maintenance
- Bring Unraid back online
- Verify all Unraid services are running:
docker ps - Keepalived on Unraid will auto-reclaim VIP (preemption)
- Stop failover on Nobara:
cd /home/failover && sudo docker compose down - If Vaultwarden was used during failover, manually export/import any new entries
Architecture Diagram
┌─────────────────────┐
│ 192.168.10.250 │
│ (VRRP VIP) │
└─────────┬───────────┘
│
┌───────────────┼───────────────┐
│ │
┌─────────▼─────────┐ ┌─────────▼─────────┐
│ XTRM-U (Unraid) │ │ XTRM-Nobara │
│ 192.168.10.20 │ │ 192.168.10.103 │
│ MASTER (150) │ │ BACKUP (100) │
│ │ │ │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ Traefik │ │ │ │ Traefik │ │
│ │ Vaultwarden │ │ │ │ Vaultwarden │ │
│ │ Authentik │ │ │ │ Authentik │ │
│ │ AdGuard │ │ │ │ AdGuard │ │
│ │ + 25 more │ │ │ │ PostgreSQL │ │
│ └──────────────┘ │ │ │ Redis │ │
│ │ │ └──────────────┘ │
│ Keepalived (Docker)│ │ Keepalived (systemd)│
└────────────────────┘ └────────────────────┘