2.4 KiB
Incident: Traefik IP Collision on dockerproxy Network
Date: 2026-05-17 (root cause 04:02–04:39 UTC)
Severity: P1 — full reverse-proxy outage
Status: Resolved
Affected: all *.xtrm-lab.org services routed through Traefik (vaultwarden, authentik, gitea, uptime-kuma, transmission, urbackup, unimus, netalert, openbrain, ~70 others)
Symptoms
traefikcontainer stuck inCreatedstate, never reachedRunning.docker inspect traefikreported:State.Status: createdState.Error: failed to set up container networking: Address already in useState.ExitCode: 128
- All
*.xtrm-lab.orghostnames unreachable through the proxy. traefik-managersidecar remained healthy but isolated.
Root Cause
Static-IP collision on the dockerproxy Docker bridge (172.18.0.0/16).
Timeline (UTC):
- 04:02:40 —
ewa-apps(Dockge stack at/mnt/user/appdata/dockge/stacks/ewa-apps) restarted. Its compose joineddockerproxywithoutipv4_address, so Docker handed it the lowest free IP: 172.18.0.3 (Traefik's reserved IP, but Traefik was momentarily down). - 04:39:09 —
traefikwas recreated and requested its IPAMConfig-reserved172.18.0.3. Already taken — container left inCreated.
dockerproxy is a user-defined bridge where critical services have hard-coded static IPs (traefik=.3, postgresql17=.10, vaultwarden=.15, etc.) but several stacks (ewa-apps) had no reservation. First-to-start wins the IP race.
Resolution
docker stop ewa-apps(released .3)docker start traefik(claimed .3 via IPAMConfig)docker start ewa-apps(got .4)- Logs immediately showed Authentik/Traefik routing chains resuming.
Preventive Fix
Pinned static IP for ewa-apps in its Dockge compose:
networks:
dockerproxy:
ipv4_address: 172.18.0.70
Backup of original at /mnt/user/appdata/dockge/stacks/ewa-apps/compose.yaml.bak.2026-05-17.
Final state after docker compose up -d:
| Container | IP |
|---|---|
| traefik | 172.18.0.3 |
| ewa-apps | 172.18.0.70 |
Follow-up
- Audit remaining Dockge stacks on
dockerproxyfor missingipv4_addressreservations. - Add a Uptime Kuma docker-state monitor for
traefik(HTTP probes route through Traefik and fail uselessly when Traefik itself is the outage). - Two Traefik-IP incidents in 12 days (also 2026-05-05 vaultwarden misroute) — consider a deliberate redesign of the dockerproxy IP map.