Major documentation cleanup after VLAN migration completion: - Archive 12 VLAN project docs to archive/vlan-migration/ - Archive 5 done WIP docs (VLAN proposals, AI stack, Fossorial, DNS backup) - Create standing reference docs 08-DNS-ARCHITECTURE and 09-TAILSCALE-VPN - Renumber docs to clean 01-09 sequence with merged CHANGELOG - Update all active docs from stale 192.168.31.x to current VLAN 10 IPs - Fix CSS1 (.10.9→.10.3) and ZX1 (.10.7→.10.4) IPs in hardware inventory - Clean 06-VLAN-DEVICE-ASSIGNMENT: remove migration columns/sections, fix VLAN 25 subnet Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15 KiB
DNS Architecture with AdGuard Failover
Created: 2026-01-31
Updated: 2026-01-31
Status: Implemented
Backup: dns-dual-failover-2026-01-31.backup
Overview
Dual AdGuard DNS setup with automatic failover. All DNS queries are filtered through AdGuard for ad-blocking, and if the primary (MikroTik) fails, traffic automatically switches to secondary (Unraid).
Architecture
┌─────────────────────────────────────┐
│ INTERNET │
│ │
│ External clients (DoT/DoH) │
│ dns.xtrm-lab.org:853 (DoT) │
│ dns.xtrm-lab.org:8443 (DoH) │
└──────────────┬──────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ MikroTik hAP ax³ (192.168.10.1) │
│ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ AdGuard Home (PRIMARY) │ │
│ │ Container: 172.17.0.2 │ │
│ │ Web UI: http://192.168.10.1:3000 │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Filters │ │ Blocklists │ │ Clients │ │ │
│ │ │ (synced) │ │ 143K rules │ │ (synced) │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ Netwatch monitors every 10s │
│ │ │
│ ┌─────────┴─────────┐ │
│ │ │ │
│ Container UP Container DOWN │
│ │ │ │
│ ▼ ▼ │
│ NAT → 172.17.0.2 NAT → 192.168.10.10 │
│ (MikroTik) (Unraid Failover) │
└──────────────────────────────────────────────────────────────────────────────┘
▲ ▲ ▲
│ │ │
NAT Redirect NAT Redirect NAT Redirect
│ │ │
┌───────┴───────┐ ┌────────┴────────┐ ┌────────┴────────┐
│ VLAN 10 │ │ VLAN 20/25 │ │ VLAN 30/40 │
│ Management │ │ Trusted/Kids │ │ IoT/CatchAll │
│ 192.168.10.x │ │ 192.168.20.x │ │ 192.168.30.x │
│ │ │ 192.168.25.x │ │ 192.168.1.x │
└───────────────┘ └─────────────────┘ └─────────────────┘
AdGuard Instances
| Instance | Role | IP | Port | Web UI |
|---|---|---|---|---|
| MikroTik | Primary | 172.17.0.2 | 53 | http://192.168.10.1:3000 |
| Unraid | Secondary/Failover | 192.168.10.10 | 3000 | http://192.168.10.10:3000 |
Credentials (Same for Both)
| Username | Password |
|---|---|
| jazzymc | 7RqWElENNbZnPW |
DNS Redirect Rules
All DNS queries (port 53) from any VLAN are intercepted and redirected:
| VLAN | Subnet | Redirected To |
|---|---|---|
| 10 | 192.168.10.0/24 | 172.17.0.2:53 |
| 20 | 192.168.20.0/24 | 172.17.0.2:53 |
| 25 | 192.168.25.0/24 | 172.17.0.2:53 |
| 30 | 192.168.30.0/24 | 172.17.0.2:53 |
| 40 | 192.168.1.0/24 | 172.17.0.2:53 |
Note: Clients don't need any DNS configuration - even if they use 8.8.8.8, traffic is intercepted by NAT.
NAT Rules on MikroTik
# Exception rules (prevent loops) - MUST BE FIRST
/ip firewall nat
add chain=dstnat action=accept protocol=udp src-address=172.17.0.0/24 dst-port=53 comment="[DNS] Allow MikroTik AdGuard outbound"
add chain=dstnat action=accept protocol=udp src-address=192.168.10.10 dst-port=53 comment="[DNS] Allow Unraid AdGuard outbound"
# VLAN redirect rules
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.10.0/24 dst-port=53 comment="[DNS] VLAN10 Mgmt redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.20.0/24 dst-port=53 comment="[DNS] VLAN20 Trusted redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.25.0/24 dst-port=53 comment="[DNS] VLAN25 Kids redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.30.0/24 dst-port=53 comment="[DNS] VLAN30 IoT redirect"
add chain=dstnat action=dst-nat to-addresses=172.17.0.2 to-ports=53 protocol=udp src-address=192.168.1.0/24 dst-port=53 comment="[DNS] VLAN40 CatchAll redirect"
# Masquerade for return traffic
add chain=srcnat action=masquerade protocol=udp src-address=192.168.10.0/24 dst-address=172.17.0.2 dst-port=53 comment="[DNS] VLAN10 masquerade"
# ... (similar for other VLANs)
Automatic Failover
How It Works (Dual Health Check)
Two independent Netwatch monitors trigger failover:
| Monitor | Type | What It Checks | Interval | Timeout |
|---|---|---|---|---|
| Ping | simple | Container reachable | 10s | 3s |
| DNS | dns | DNS queries work | 30s | 10s |
Either monitor failing triggers failover to Unraid.
Failure Scenarios Covered
| Scenario | Ping Check | DNS Check | Failover? |
|---|---|---|---|
| Container crashed | ✗ Fail | ✗ Fail | ✅ Yes |
| Container stopped | ✗ Fail | ✗ Fail | ✅ Yes |
| Network/routing issue | ✗ Fail | ✗ Fail | ✅ Yes |
| Upstream DNS unreachable | ✓ Pass | ✗ Fail | ✅ Yes |
| AdGuard overloaded | ✓ Pass | ✗ Fail | ✅ Yes |
| Everything working | ✓ Pass | ✓ Pass | ❌ No |
Failover Timeline
| Event | Detection Time | Total Switchover |
|---|---|---|
| Container crash (ping) | ~10-13 seconds | ~13-16 seconds |
| DNS failure (resolution) | ~30-40 seconds | ~33-43 seconds |
| Recovery | ~10-30 seconds | Automatic |
Failover Scripts
# dns-failover-down (runs when either check fails)
/system script add name=dns-failover-down dont-require-permissions=yes source={
:log warning "DNS Failover: Switching to Unraid"
/ip firewall nat set [find where comment~"VLAN" and comment~"redirect"] to-addresses=192.168.10.10 to-ports=3000
}
# dns-failover-up (runs when check recovers)
/system script add name=dns-failover-up dont-require-permissions=yes source={
:log info "DNS Failover: Switching back to MikroTik"
/ip firewall nat set [find where comment~"VLAN" and comment~"redirect"] to-addresses=172.17.0.2 to-ports=53
}
Netwatch Configuration
# Monitor 1: Ping check (fast crash detection)
/tool netwatch add type=simple host=172.17.0.2 interval=10s timeout=3s \
up-script=dns-failover-up down-script=dns-failover-down \
comment="AdGuard failover monitor"
# Monitor 2: DNS resolution check (functional verification)
/tool netwatch add type=dns host=google.com interval=30s timeout=10s \
up-script=dns-failover-up down-script=dns-failover-down \
comment="AdGuard DNS resolution check"
Sync Configuration
Settings are synced from Unraid (source of truth) to MikroTik every 30 minutes.
What Syncs
| Feature | Synced |
|---|---|
| Filter lists (blocklists) | ✅ |
| User rules (custom blocks/allows) | ✅ |
| Client settings (per-device rules) | ✅ |
| Services (blocked services) | ✅ |
| Rewrites (custom DNS entries) | ✅ |
| DNS server config | ❌ |
| DHCP settings | ❌ |
| Query logs/stats | ❌ |
Sync Container
# /mnt/user/appdata/adguard-sync/adguardhome-sync.yaml
cron: "*/30 * * * *"
runOnStart: true
origin:
url: http://192.168.10.10:3000
username: jazzymc
password: 7RqWElENNbZnPW
replicas:
- url: http://192.168.10.1:3000
username: jazzymc
password: 7RqWElENNbZnPW
features:
dns:
serverConfig: false
accessLists: true
rewrites: true
filters: true
clientSettings: true
services: true
Note: The sync container must be connected to both dockerproxy and br0 networks to reach both AdGuard instances.
Container Configuration (MikroTik)
Container Details
| Setting | Value |
|---|---|
| Image | adguard/adguardhome:latest |
| Interface | veth-adguard |
| IP | 172.17.0.2/24 |
| Gateway | 172.17.0.1 |
| Root dir | usb1/adguard/root |
| Config mount | usb1/adguard/conf → /opt/adguardhome/conf |
| Work mount | usb1/adguard/work → /opt/adguardhome/work |
| Start on boot | Yes |
Container Commands
# Check status
/container print
# Start container
/container start 0
# Stop container
/container stop 0
# View logs
/log print where topics~"container"
Upstream DNS
Both AdGuard instances use the same upstream:
| Upstream | Type |
|---|---|
| https://dns.quad9.net/dns-query | Primary (DoH) |
| 9.9.9.9 | Bootstrap |
| 149.112.112.112 | Bootstrap secondary |
Management
| Task | Where to Do It |
|---|---|
| Change blocklists | Unraid AdGuard (syncs to MikroTik) |
| Add custom rules | Unraid AdGuard |
| Add client settings | Unraid AdGuard |
| View query logs | MikroTik AdGuard (real-time) |
| Check failover status | MikroTik /tool netwatch print |
Troubleshooting
Check Failover Status
/tool netwatch print
# Both monitors should show STATUS=up normally
# Monitor 0: Ping check
# Monitor 1: DNS resolution check
Check Current DNS Target
/ip firewall nat print where comment~"VLAN10 Mgmt redirect"
# to-addresses should be 172.17.0.2 (normal) or 192.168.10.10 (failover)
View Failover Logs
/log print where message~"Failover"
Manual Failover Test
# Stop container (triggers failover)
/container stop 0
# Wait 15 seconds, check NAT rules switched to 192.168.10.10
# Start container (triggers recovery)
/container start 0
# Wait 15 seconds, check NAT rules switched back to 172.17.0.2
DNS Not Working
- Check container is running:
/container print - Check netwatch status:
/tool netwatch print - Test DNS directly:
:resolve google.com server=172.17.0.2 - Check NAT rules:
/ip firewall nat print where comment~"DNS" - Check /32 routes exist:
/ip route print where dst-address~"172.17.0.[23]" - Ping container:
/ping 172.17.0.2 count=3
Container Reachable but DNS Fails
If ping works but DNS queries timeout:
- Check container can reach upstream: Look for timeout errors in logs
- Verify /32 routes: Missing routes cause ECMP issues
- Check NAT masquerade:
/ip firewall nat print where comment~"Container" - Verify routes:
/ip route print where dst-address~"172.17"
# Should show /32 routes for each container IP
Sync Not Working
# On Unraid
docker logs adguardhome-sync --tail 20
# Check connectivity
docker exec adguardhome-sync ping -c 2 192.168.10.10
docker exec adguardhome-sync ping -c 2 192.168.10.1
Container Network Routing
Important: /32 Host Routes Required
When running multiple containers on the same subnet (172.17.0.0/24), specific host routes are required to prevent ECMP routing issues:
# Without these routes, return traffic may go to wrong container
/ip route add dst-address=172.17.0.2/32 gateway=veth-adguard comment="AdGuard container - specific route"
/ip route add dst-address=172.17.0.3/32 gateway=veth-tailscale comment="Tailscale container - specific route"
Why this matters: Each veth interface creates a /24 route. With multiple veth interfaces on the same subnet, RouterOS enables ECMP load balancing, sending return traffic to random interfaces.
Backups
| Backup | Description |
|---|---|
pre-adguard-2026-01-31 |
Before AdGuard setup |
adguard-container-running-2026-01-31 |
Container working, before NAT |
adguard-synced-2026-01-31 |
After sync configured |
adguard-failover-complete-2026-01-31 |
Single ping failover |
routing-fix-complete-2026-01-31 |
After /32 routing fix |
dns-dual-failover-2026-01-31 |
Dual health check (current) |
Restore Command
/system backup load name=dns-dual-failover-2026-01-31
Quick Reference
Normal Operation
- DNS queries → MikroTik AdGuard (172.17.0.2)
- Ad blocking active
- ~143,000 filter rules
During Failover
- DNS queries → Unraid AdGuard (192.168.10.10)
- Ad blocking still active (same rules synced)
- Automatic, no manual intervention needed
Recovery
- Automatic when container comes back up
- NAT rules switch back to MikroTik
- No DNS interruption for clients
Document Version: 1.1 Last Updated: 2026-01-31 Changes: Added dual health check (ping + DNS), documented /32 routing fix