Files
infrastructure/docs/archive/vlan-migration/17-DNS-ADGUARD-FAILOVER.md
Kaloyan Danchev ec9659d0cb
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Restructure docs: archive VLAN migration, update IPs to VLAN 10
Major documentation cleanup after VLAN migration completion:
- Archive 12 VLAN project docs to archive/vlan-migration/
- Archive 5 done WIP docs (VLAN proposals, AI stack, Fossorial, DNS backup)
- Create standing reference docs 08-DNS-ARCHITECTURE and 09-TAILSCALE-VPN
- Renumber docs to clean 01-09 sequence with merged CHANGELOG
- Update all active docs from stale 192.168.31.x to current VLAN 10 IPs
- Fix CSS1 (.10.9→.10.3) and ZX1 (.10.7→.10.4) IPs in hardware inventory
- Clean 06-VLAN-DEVICE-ASSIGNMENT: remove migration columns/sections, fix VLAN 25 subnet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 12:45:16 +02:00

15 KiB

DNS Architecture with AdGuard Failover

Created: 2026-01-31 Updated: 2026-01-31 Status: Implemented Backup: dns-dual-failover-2026-01-31.backup


Overview

Dual AdGuard DNS setup with automatic failover. All DNS queries are filtered through AdGuard for ad-blocking, and if the primary (MikroTik) fails, traffic automatically switches to secondary (Unraid).


Architecture

                            ┌─────────────────────────────────────┐
                            │           INTERNET                   │
                            │                                      │
                            │   External clients (DoT/DoH)         │
                            │   dns.xtrm-lab.org:853 (DoT)        │
                            │   dns.xtrm-lab.org:8443 (DoH)       │
                            └──────────────┬──────────────────────┘
                                           │
                                           ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                        MikroTik hAP ax³ (192.168.10.1)                       │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │                    AdGuard Home (PRIMARY)                              │  │
│  │                    Container: 172.17.0.2                               │  │
│  │                    Web UI: http://192.168.10.1:3000                    │  │
│  │                                                                        │  │
│  │    ┌─────────────┐     ┌─────────────┐     ┌─────────────┐            │  │
│  │    │  Filters    │     │  Blocklists │     │   Clients   │            │  │
│  │    │  (synced)   │     │  143K rules │     │  (synced)   │            │  │
│  │    └─────────────┘     └─────────────┘     └─────────────┘            │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                    Netwatch monitors every 10s                               │
│                                    │                                         │
│                         ┌─────────┴─────────┐                               │
│                         │                   │                               │
│                    Container UP        Container DOWN                        │
│                         │                   │                               │
│                         ▼                   ▼                               │
│                  NAT → 172.17.0.2    NAT → 192.168.10.10                    │
│                  (MikroTik)          (Unraid Failover)                      │
└──────────────────────────────────────────────────────────────────────────────┘
        ▲                            ▲                            ▲
        │                            │                            │
   NAT Redirect                 NAT Redirect                 NAT Redirect
        │                            │                            │
┌───────┴───────┐          ┌────────┴────────┐          ┌────────┴────────┐
│   VLAN 10     │          │    VLAN 20/25   │          │   VLAN 30/40    │
│  Management   │          │  Trusted/Kids   │          │   IoT/CatchAll  │
│ 192.168.10.x  │          │  192.168.20.x   │          │  192.168.30.x   │
│               │          │  192.168.25.x   │          │  192.168.1.x    │
└───────────────┘          └─────────────────┘          └─────────────────┘

AdGuard Instances

Instance Role IP Port Web UI
MikroTik Primary 172.17.0.2 53 http://192.168.10.1:3000
Unraid Secondary/Failover 192.168.10.10 3000 http://192.168.10.10:3000

Credentials (Same for Both)

Username Password
jazzymc 7RqWElENNbZnPW

DNS Redirect Rules

All DNS queries (port 53) from any VLAN are intercepted and redirected:

VLAN Subnet Redirected To
10 192.168.10.0/24 172.17.0.2:53
20 192.168.20.0/24 172.17.0.2:53
25 192.168.25.0/24 172.17.0.2:53
30 192.168.30.0/24 172.17.0.2:53
40 192.168.1.0/24 172.17.0.2:53

Note: Clients don't need any DNS configuration - even if they use 8.8.8.8, traffic is intercepted by NAT.

NAT Rules on MikroTik

# Exception rules (prevent loops) - MUST BE FIRST
/ip firewall nat
add chain=dstnat action=accept protocol=udp src-address=172.17.0.0/24 dst-port=53 comment="[DNS] Allow MikroTik AdGuard outbound"
add chain=dstnat action=accept protocol=udp src-address=192.168.10.10 dst-port=53 comment="[DNS] Allow Unraid AdGuard outbound"

# VLAN redirect rules
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.10.0/24 dst-port=53 comment="[DNS] VLAN10 Mgmt redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.20.0/24 dst-port=53 comment="[DNS] VLAN20 Trusted redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.25.0/24 dst-port=53 comment="[DNS] VLAN25 Kids redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.30.0/24 dst-port=53 comment="[DNS] VLAN30 IoT redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.1.0/24 dst-port=53 comment="[DNS] VLAN40 CatchAll redirect"

# Masquerade for return traffic
add chain=srcnat action=masquerade protocol=udp src-address=192.168.10.0/24 dst-address=172.17.0.2 dst-port=53 comment="[DNS] VLAN10 masquerade"
# ... (similar for other VLANs)

Automatic Failover

How It Works (Dual Health Check)

Two independent Netwatch monitors trigger failover:

Monitor Type What It Checks Interval Timeout
Ping simple Container reachable 10s 3s
DNS dns DNS queries work 30s 10s

Either monitor failing triggers failover to Unraid.

Failure Scenarios Covered

Scenario Ping Check DNS Check Failover?
Container crashed ✗ Fail ✗ Fail Yes
Container stopped ✗ Fail ✗ Fail Yes
Network/routing issue ✗ Fail ✗ Fail Yes
Upstream DNS unreachable ✓ Pass ✗ Fail Yes
AdGuard overloaded ✓ Pass ✗ Fail Yes
Everything working ✓ Pass ✓ Pass No

Failover Timeline

Event Detection Time Total Switchover
Container crash (ping) ~10-13 seconds ~13-16 seconds
DNS failure (resolution) ~30-40 seconds ~33-43 seconds
Recovery ~10-30 seconds Automatic

Failover Scripts

# dns-failover-down (runs when either check fails)
/system script add name=dns-failover-down dont-require-permissions=yes source={
    :log warning "DNS Failover: Switching to Unraid"
    /ip firewall nat set [find where comment~"VLAN" and comment~"redirect"] to-addresses=192.168.10.10 to-ports=3000
}

# dns-failover-up (runs when check recovers)
/system script add name=dns-failover-up dont-require-permissions=yes source={
    :log info "DNS Failover: Switching back to MikroTik"
    /ip firewall nat set [find where comment~"VLAN" and comment~"redirect"] to-addresses=172.17.0.2 to-ports=53
}

Netwatch Configuration

# Monitor 1: Ping check (fast crash detection)
/tool netwatch add type=simple host=172.17.0.2 interval=10s timeout=3s \
    up-script=dns-failover-up down-script=dns-failover-down \
    comment="AdGuard failover monitor"

# Monitor 2: DNS resolution check (functional verification)
/tool netwatch add type=dns host=google.com interval=30s timeout=10s \
    up-script=dns-failover-up down-script=dns-failover-down \
    comment="AdGuard DNS resolution check"

Sync Configuration

Settings are synced from Unraid (source of truth) to MikroTik every 30 minutes.

What Syncs

Feature Synced
Filter lists (blocklists)
User rules (custom blocks/allows)
Client settings (per-device rules)
Services (blocked services)
Rewrites (custom DNS entries)
DNS server config
DHCP settings
Query logs/stats

Sync Container

# /mnt/user/appdata/adguard-sync/adguardhome-sync.yaml
cron: "*/30 * * * *"
runOnStart: true

origin:
  url: http://192.168.10.10:3000
  username: jazzymc
  password: 7RqWElENNbZnPW

replicas:
  - url: http://192.168.10.1:3000
    username: jazzymc
    password: 7RqWElENNbZnPW

features:
  dns:
    serverConfig: false
    accessLists: true
    rewrites: true
  filters: true
  clientSettings: true
  services: true

Note: The sync container must be connected to both dockerproxy and br0 networks to reach both AdGuard instances.


Container Configuration (MikroTik)

Container Details

Setting Value
Image adguard/adguardhome:latest
Interface veth-adguard
IP 172.17.0.2/24
Gateway 172.17.0.1
Root dir usb1/adguard/root
Config mount usb1/adguard/conf → /opt/adguardhome/conf
Work mount usb1/adguard/work → /opt/adguardhome/work
Start on boot Yes

Container Commands

# Check status
/container print

# Start container
/container start 0

# Stop container
/container stop 0

# View logs
/log print where topics~"container"

Upstream DNS

Both AdGuard instances use the same upstream:

Upstream Type
https://dns.quad9.net/dns-query Primary (DoH)
9.9.9.9 Bootstrap
149.112.112.112 Bootstrap secondary

Management

Task Where to Do It
Change blocklists Unraid AdGuard (syncs to MikroTik)
Add custom rules Unraid AdGuard
Add client settings Unraid AdGuard
View query logs MikroTik AdGuard (real-time)
Check failover status MikroTik /tool netwatch print

Troubleshooting

Check Failover Status

/tool netwatch print
# Both monitors should show STATUS=up normally
# Monitor 0: Ping check
# Monitor 1: DNS resolution check

Check Current DNS Target

/ip firewall nat print where comment~"VLAN10 Mgmt redirect"
# to-addresses should be 172.17.0.2 (normal) or 192.168.10.10 (failover)

View Failover Logs

/log print where message~"Failover"

Manual Failover Test

# Stop container (triggers failover)
/container stop 0

# Wait 15 seconds, check NAT rules switched to 192.168.10.10

# Start container (triggers recovery)
/container start 0

# Wait 15 seconds, check NAT rules switched back to 172.17.0.2

DNS Not Working

  1. Check container is running: /container print
  2. Check netwatch status: /tool netwatch print
  3. Test DNS directly: :resolve google.com server=172.17.0.2
  4. Check NAT rules: /ip firewall nat print where comment~"DNS"
  5. Check /32 routes exist: /ip route print where dst-address~"172.17.0.[23]"
  6. Ping container: /ping 172.17.0.2 count=3

Container Reachable but DNS Fails

If ping works but DNS queries timeout:

  1. Check container can reach upstream: Look for timeout errors in logs
  2. Verify /32 routes: Missing routes cause ECMP issues
  3. Check NAT masquerade: /ip firewall nat print where comment~"Container"
  4. Verify routes:
/ip route print where dst-address~"172.17"
# Should show /32 routes for each container IP

Sync Not Working

# On Unraid
docker logs adguardhome-sync --tail 20

# Check connectivity
docker exec adguardhome-sync ping -c 2 192.168.10.10
docker exec adguardhome-sync ping -c 2 192.168.10.1

Container Network Routing

Important: /32 Host Routes Required

When running multiple containers on the same subnet (172.17.0.0/24), specific host routes are required to prevent ECMP routing issues:

# Without these routes, return traffic may go to wrong container
/ip route add dst-address=172.17.0.2/32 gateway=veth-adguard comment="AdGuard container - specific route"
/ip route add dst-address=172.17.0.3/32 gateway=veth-tailscale comment="Tailscale container - specific route"

Why this matters: Each veth interface creates a /24 route. With multiple veth interfaces on the same subnet, RouterOS enables ECMP load balancing, sending return traffic to random interfaces.


Backups

Backup Description
pre-adguard-2026-01-31 Before AdGuard setup
adguard-container-running-2026-01-31 Container working, before NAT
adguard-synced-2026-01-31 After sync configured
adguard-failover-complete-2026-01-31 Single ping failover
routing-fix-complete-2026-01-31 After /32 routing fix
dns-dual-failover-2026-01-31 Dual health check (current)

Restore Command

/system backup load name=dns-dual-failover-2026-01-31

Quick Reference

Normal Operation

  • DNS queries → MikroTik AdGuard (172.17.0.2)
  • Ad blocking active
  • ~143,000 filter rules

During Failover

  • DNS queries → Unraid AdGuard (192.168.10.10)
  • Ad blocking still active (same rules synced)
  • Automatic, no manual intervention needed

Recovery

  • Automatic when container comes back up
  • NAT rules switch back to MikroTik
  • No DNS interruption for clients

Document Version: 1.1 Last Updated: 2026-01-31 Changes: Added dual health check (ping + DNS), documented /32 routing fix