Files
infrastructure/docs/08-DNS-ARCHITECTURE.md
Kaloyan Danchev 7867b5c950
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
WiFi VLAN fixes, CAP bridge filtering, AdGuard IP conflicts, channel optimization
- Enable bridge VLAN filtering on CAP for proper per-client VLAN assignment
- Fix AdGuard container IP conflicts (.2→.10, .3→.11) with static IPs
- Fix 2.4GHz co-channel interference (both APs were on ch 1, CAP now ch 6)
- Fix 5GHz overlap (HAP ch 36/5180, CAP moved to ch 52/5260)
- Update WiFi access-list: VLAN assignment now active with per-device VLAN IDs
- Add Xiaomi Air Purifier MC1 to VLAN 30 access-list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 09:40:29 +02:00

14 KiB

DNS Architecture with AdGuard Failover

Last Updated: 2026-02-26


Overview

Dual AdGuard DNS setup with automatic failover. All DNS queries are filtered through AdGuard for ad-blocking, and if the primary (MikroTik) fails, traffic automatically switches to secondary (Unraid).


Architecture

                            ┌─────────────────────────────────────┐
                            │           INTERNET                   │
                            │                                      │
                            │   External clients (DoT/DoH)         │
                            │   dns.xtrm-lab.org:853 (DoT)        │
                            │   dns.xtrm-lab.org:8443 (DoH)       │
                            └──────────────┬──────────────────────┘
                                           │
                                           ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                        MikroTik hAP ax³ (192.168.10.1)                       │
│                                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │                    AdGuard Home (PRIMARY)                              │  │
│  │                    Container: 172.17.0.2                               │  │
│  │                    Web UI: http://192.168.10.1:3000                    │  │
│  │                                                                        │  │
│  │    ┌─────────────┐     ┌─────────────┐     ┌─────────────┐            │  │
│  │    │  Filters    │     │  Blocklists │     │   Clients   │            │  │
│  │    │  (synced)   │     │  143K rules │     │  (synced)   │            │  │
│  │    └─────────────┘     └─────────────┘     └─────────────┘            │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                    Netwatch monitors every 10s                               │
│                                    │                                         │
│                         ┌─────────┴─────────┐                               │
│                         │                   │                               │
│                    Container UP        Container DOWN                        │
│                         │                   │                               │
│                         ▼                   ▼                               │
│                  NAT → 172.17.0.2    NAT → 192.168.10.10                    │
│                  (MikroTik)          (Unraid Failover)                      │
└──────────────────────────────────────────────────────────────────────────────┘
        ▲                            ▲                            ▲
        │                            │                            │
   NAT Redirect                 NAT Redirect                 NAT Redirect
        │                            │                            │
┌───────┴───────┐          ┌────────┴────────┐          ┌────────┴────────┐
│   VLAN 10     │          │    VLAN 20/25   │          │   VLAN 30/40    │
│  Management   │          │  Trusted/Kids   │          │   IoT/CatchAll  │
│ 192.168.10.x  │          │  192.168.20.x   │          │  192.168.30.x   │
│               │          │  192.168.25.x   │          │  192.168.1.x    │
└───────────────┘          └─────────────────┘          └─────────────────┘

AdGuard Instances

Instance Role IP Port Web UI
MikroTik Primary 172.17.0.2 53 http://192.168.10.1:3000
Unraid Secondary/Failover 192.168.10.10 3000 http://192.168.10.10:3000

Credentials (Same for Both)

Username Password
jazzymc 7RqWElENNbZnPW

DNS Redirect Rules

All DNS queries (port 53) from any VLAN are intercepted and redirected:

VLAN Subnet Redirected To
10 192.168.10.0/24 172.17.0.2:53
20 192.168.20.0/24 172.17.0.2:53
25 192.168.25.0/24 172.17.0.2:53
30 192.168.30.0/24 172.17.0.2:53
40 192.168.1.0/24 172.17.0.2:53

Note: Clients don't need any DNS configuration - even if they use 8.8.8.8, traffic is intercepted by NAT.

NAT Rules on MikroTik

# Exception rules (prevent loops) - MUST BE FIRST
/ip firewall nat
add chain=dstnat action=accept protocol=udp src-address=172.17.0.0/24 dst-port=53 comment="[DNS] Allow MikroTik AdGuard outbound"
add chain=dstnat action=accept protocol=udp src-address=192.168.10.10 dst-port=53 comment="[DNS] Allow Unraid AdGuard outbound"

# VLAN redirect rules
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.10.0/24 dst-port=53 comment="[DNS] VLAN10 Mgmt redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.20.0/24 dst-port=53 comment="[DNS] VLAN20 Trusted redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.25.0/24 dst-port=53 comment="[DNS] VLAN25 Kids redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.30.0/24 dst-port=53 comment="[DNS] VLAN30 IoT redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.1.0/24 dst-port=53 comment="[DNS] VLAN40 CatchAll redirect"

# Masquerade for return traffic
add chain=srcnat action=masquerade protocol=udp src-address=192.168.10.0/24 dst-address=172.17.0.2 dst-port=53 comment="[DNS] VLAN10 masquerade"
# ... (similar for other VLANs)

Automatic Failover

How It Works (Dual Health Check)

Two independent Netwatch monitors trigger failover:

Monitor Type What It Checks Interval Timeout
Ping simple Container reachable 10s 3s
DNS dns DNS queries work 30s 10s

Either monitor failing triggers failover to Unraid.

Failure Scenarios Covered

Scenario Ping Check DNS Check Failover?
Container crashed Fail Fail Yes
Container stopped Fail Fail Yes
Network/routing issue Fail Fail Yes
Upstream DNS unreachable Pass Fail Yes
AdGuard overloaded Pass Fail Yes
Everything working Pass Pass No

Failover Timeline

Event Detection Time Total Switchover
Container crash (ping) ~10-13 seconds ~13-16 seconds
DNS failure (resolution) ~30-40 seconds ~33-43 seconds
Recovery ~10-30 seconds Automatic

Failover Scripts

# dns-failover-down (runs when either check fails)
/system script add name=dns-failover-down dont-require-permissions=yes source={
    :log warning "DNS Failover: Switching to Unraid"
    /ip firewall nat set [find where comment~"VLAN" and comment~"redirect"] to-addresses=192.168.10.10 to-ports=3000
}

# dns-failover-up (runs when check recovers)
/system script add name=dns-failover-up dont-require-permissions=yes source={
    :log info "DNS Failover: Switching back to MikroTik"
    /ip firewall nat set [find where comment~"VLAN" and comment~"redirect"] to-addresses=172.17.0.2 to-ports=53
}

Netwatch Configuration

# Monitor 1: Ping check (fast crash detection)
/tool netwatch add type=simple host=172.17.0.2 interval=10s timeout=3s \
    up-script=dns-failover-up down-script=dns-failover-down \
    comment="AdGuard failover monitor"

# Monitor 2: DNS resolution check (functional verification)
/tool netwatch add type=dns host=google.com interval=30s timeout=10s \
    up-script=dns-failover-up down-script=dns-failover-down \
    comment="AdGuard DNS resolution check"

Sync Configuration

Settings are synced from Unraid (source of truth) to MikroTik every 30 minutes.

What Syncs

Feature Synced
Filter lists (blocklists) Yes
User rules (custom blocks/allows) Yes
Client settings (per-device rules) Yes
Services (blocked services) Yes
Rewrites (custom DNS entries) Yes
DNS server config No
DHCP settings No
Query logs/stats No

Sync Container

Container: adguardhome-sync at 192.168.10.11 (br0 macvlan, static IP)

# /mnt/user/appdata/dockge/stacks/adguard-sync/adguardhome-sync.yaml
cron: "*/30 * * * *"
runOnStart: true

origin:
  url: http://192.168.10.10:3000
  username: jazzymc
  password: 7RqWElENNbZnPW

replica:
  url: http://192.168.10.1:3000
  username: jazzymc
  password: 7RqWElENNbZnPW

Note: The sync container is on the br0 macvlan network with a static IP to avoid conflicts with infrastructure devices.


Container Configuration (MikroTik)

Container Details

Setting Value
Image adguard/adguardhome:latest
Interface veth-adguard
IP 172.17.0.2/24
Gateway 172.17.0.1
Root dir usb1/adguard/root
Config mount usb1/adguard/conf → /opt/adguardhome/conf
Work mount usb1/adguard/work → /opt/adguardhome/work
Start on boot Yes

Container Commands

# Check status
/container print

# Start container
/container start 0

# Stop container
/container stop 0

# View logs
/log print where topics~"container"

Upstream DNS

Both AdGuard instances use the same upstream:

Upstream Type
https://dns.quad9.net/dns-query Primary (DoH)
9.9.9.9 Bootstrap
149.112.112.112 Bootstrap secondary

Management

Task Where to Do It
Change blocklists Unraid AdGuard (syncs to MikroTik)
Add custom rules Unraid AdGuard
Add client settings Unraid AdGuard
View query logs MikroTik AdGuard (real-time)
Check failover status MikroTik /tool netwatch print

Troubleshooting

Check Failover Status

/tool netwatch print
# Both monitors should show STATUS=up normally
# Monitor 0: Ping check
# Monitor 1: DNS resolution check

Check Current DNS Target

/ip firewall nat print where comment~"VLAN10 Mgmt redirect"
# to-addresses should be 172.17.0.2 (normal) or 192.168.10.10 (failover)

View Failover Logs

/log print where message~"Failover"

Manual Failover Test

# Stop container (triggers failover)
/container stop 0

# Wait 15 seconds, check NAT rules switched to 192.168.10.10

# Start container (triggers recovery)
/container start 0

# Wait 15 seconds, check NAT rules switched back to 172.17.0.2

DNS Not Working

  1. Check container is running: /container print
  2. Check netwatch status: /tool netwatch print
  3. Test DNS directly: :resolve google.com server=172.17.0.2
  4. Check NAT rules: /ip firewall nat print where comment~"DNS"
  5. Check /32 routes exist: /ip route print where dst-address~"172.17.0.[23]"
  6. Ping container: /ping 172.17.0.2 count=3

Container Reachable but DNS Fails

If ping works but DNS queries timeout:

  1. Check container can reach upstream: Look for timeout errors in logs
  2. Verify /32 routes: Missing routes cause ECMP issues
  3. Check NAT masquerade: /ip firewall nat print where comment~"Container"
  4. Verify routes:
/ip route print where dst-address~"172.17"
# Should show /32 routes for each container IP

Sync Not Working

# On Unraid
docker logs adguardhome-sync --tail 20

# Check connectivity
docker exec adguardhome-sync ping -c 2 192.168.10.10
docker exec adguardhome-sync ping -c 2 192.168.10.1

Container Network Routing

Important: /32 Host Routes Required

When running multiple containers on the same subnet (172.17.0.0/24), specific host routes are required to prevent ECMP routing issues:

# Without these routes, return traffic may go to wrong container
/ip route add dst-address=172.17.0.2/32 gateway=veth-adguard comment="AdGuard container - specific route"
/ip route add dst-address=172.17.0.3/32 gateway=veth-tailscale comment="Tailscale container - specific route"

Why this matters: Each veth interface creates a /24 route. With multiple veth interfaces on the same subnet, RouterOS enables ECMP load balancing, sending return traffic to random interfaces.


Quick Reference

Normal Operation

  • DNS queries → MikroTik AdGuard (172.17.0.2)
  • Ad blocking active
  • ~143,000 filter rules

During Failover

  • DNS queries → Unraid AdGuard (192.168.10.10)
  • Ad blocking still active (same rules synced)
  • Automatic, no manual intervention needed

Recovery

  • Automatic when container comes back up
  • NAT rules switch back to MikroTik
  • No DNS interruption for clients