dockerproxy: redesign IPAM with static block + dynamic /25 pool

Recreated dockerproxy network with --ip-range 172.18.0.128/25 so Docker
auto-allocations are isolated from the .2-.127 static reservation block.
Eliminates IP-collision class that caused the 2026-05-17 Traefik outage.

Adds 13-DOCKERPROXY-NETWORK.md as the canonical reference for the
network spec, recreate command, and current IP assignments.
This commit is contained in:
jazzymc
2026-05-17 08:36:46 +03:00
parent 6506b22aea
commit dd1c15cf6b
2 changed files with 123 additions and 3 deletions
@@ -61,8 +61,37 @@ Final state after `docker compose up -d`:
---
## Structural Fix — IPAM Redesign (~05:15 UTC same day)
Compose-level pinning fixes one container at a time. To remove the root cause class, the `dockerproxy` network was recreated with a dedicated dynamic IP range. Since Docker network IPAM is immutable, the network had to be torn down and rebuilt with all 32 containers offline (~3 min outage).
New IPAM:
```
Subnet: 172.18.0.0/16
Gateway: 172.18.0.1
IPRange: 172.18.0.128/25 ← Docker only auto-assigns from .128.255
```
- `.2``.127` is now a **static-only reservation block**
- `.128``.255` is the dynamic pool
Procedure (scripted; full state snapshot at `/root/dockerproxy-recreate-2026-05-17/` on Unraid):
1. Capture container→static-IP map from `docker inspect`
2. `docker stop` all 32 in parallel
3. `docker network disconnect dockerproxy <each>`
4. `docker network rm dockerproxy`
5. `docker network create --driver bridge --subnet 172.18.0.0/16 --gateway 172.18.0.1 --ip-range 172.18.0.128/25 dockerproxy`
6. `docker network connect --ip <preserved-static> dockerproxy <each>` (no `--ip` for the one dynamic container, `traefik-manager`)
7. Start in dependency order: dockersocket → postgresql17 + Redis → traefik + traefik-manager → authentik + authentik-worker → rest
Final state: all 32 containers up, all static IPs preserved (.3.70), traefik-manager moved from `.2` to `.128` (first dynamic slot).
Reference: `docs/13-DOCKERPROXY-NETWORK.md` documents the network spec, recreate command, and current IP assignments.
## Follow-up
- Audit remaining Dockge stacks on `dockerproxy` for missing `ipv4_address` reservations.
- Add a Uptime Kuma docker-state monitor for `traefik` (HTTP probes route *through* Traefik and fail uselessly when Traefik itself is the outage).
- Two Traefik-IP incidents in 12 days (also 2026-05-05 vaultwarden misroute) — consider a deliberate redesign of the dockerproxy IP map.
- IPAM redesign eliminates the auto-allocation collision class. Future stacks that omit `ipv4_address` land safely in `.128+`.
- Network is imperative (not in any compose); recreate command is documented in `docs/13-DOCKERPROXY-NETWORK.md` for disaster recovery.
- Add an Uptime Kuma docker-state monitor for `traefik` (HTTP probes route through Traefik and fail uselessly when Traefik is the outage).