diff --git a/docs/13-DOCKERPROXY-NETWORK.md b/docs/13-DOCKERPROXY-NETWORK.md new file mode 100644 index 0000000..39ffc2e --- /dev/null +++ b/docs/13-DOCKERPROXY-NETWORK.md @@ -0,0 +1,91 @@ +# dockerproxy Docker Network + +User-defined Docker bridge on Unraid hosting Traefik and all reverse-proxied services. Defined imperatively (not in any compose file — stacks reference it as `external: true`). + +## IPAM + +| Property | Value | +|----------|-------| +| Driver | `bridge` | +| Subnet | `172.18.0.0/16` | +| Gateway | `172.18.0.1` | +| IP Range (dynamic pool) | `172.18.0.128/25` (.128–.255) | +| Static reservation block | `172.18.0.2 – 172.18.0.127` | + +The `--ip-range` constrains Docker's auto-allocation to `.128–.255`. Anything pinned via compose `ipv4_address` outside that range is conflict-free. Set up 2026-05-17 after the collision incident in `incidents/2026-05-17-traefik-ip-collision.md`. + +## Recreate Command + +If the network is ever lost (Docker reset, accidental `docker network rm`): + +```bash +docker network create \ + --driver bridge \ + --subnet 172.18.0.0/16 \ + --gateway 172.18.0.1 \ + --ip-range 172.18.0.128/25 \ + dockerproxy +``` + +After recreating, compose-managed containers reconnect via `docker compose up -d`. Standalone containers need `docker network connect [--ip ] dockerproxy `. + +## Static Assignments (2026-05-17) + +| IP | Container | +|----|-----------| +| .1 | (gateway) | +| .3 | traefik | +| .6 | dockersocket | +| .8 | authentik-worker | +| .9 | authentik | +| .10 | postgresql17 | +| .14 | Redis | +| .15 | vaultwarden | +| .16 | actual-budget | +| .18 | Uptime-Kuma-API | +| .19 | AutoKuma | +| .20 | UptimeKuma | +| .21 | speedtest-tracker | +| .22 | obsidian-livesync | +| .23 | SeekAndWatch | +| .25 | karakeep | +| .26 | transmission | +| .31 | gitea | +| .32 | woodpecker-server | +| .33 | woodpecker-agent | +| .43 | radarr | +| .44 | sonarr | +| .45 | prowlarr | +| .50 | dockhand | +| .53 | n8n | +| .60 | overseerr | +| .61 | plex_debrid | +| .62 | zurg | +| .63 | zurg-rclone | +| .65 | xtrm-agent | +| .66 | kasm | +| .70 | ewa-apps | +| .128+ | dynamic pool (traefik-manager landed here) | + +## Adding a New Service + +1. Pick a free IP in `.2–.127` (or omit and accept dynamic `.128+`) +2. In compose: + ```yaml + services: + myservice: + networks: + dockerproxy: + ipv4_address: 172.18.0.X + networks: + dockerproxy: + external: true + ``` +3. Append to the table above and commit. + +## Snapshot of Pre-Recreate State + +On Unraid: `/root/dockerproxy-recreate-2026-05-17/` +- `network-before.json` — full `docker network inspect` output +- `state.tsv` — per-container name/static-IP/runtime-IP/status/restart-policy +- `containers.txt` — sorted container list (32 entries) diff --git a/docs/incidents/2026-05-17-traefik-ip-collision.md b/docs/incidents/2026-05-17-traefik-ip-collision.md index f479daf..da9c493 100644 --- a/docs/incidents/2026-05-17-traefik-ip-collision.md +++ b/docs/incidents/2026-05-17-traefik-ip-collision.md @@ -61,8 +61,37 @@ Final state after `docker compose up -d`: --- +## Structural Fix — IPAM Redesign (~05:15 UTC same day) + +Compose-level pinning fixes one container at a time. To remove the root cause class, the `dockerproxy` network was recreated with a dedicated dynamic IP range. Since Docker network IPAM is immutable, the network had to be torn down and rebuilt with all 32 containers offline (~3 min outage). + +New IPAM: + +``` +Subnet: 172.18.0.0/16 +Gateway: 172.18.0.1 +IPRange: 172.18.0.128/25 ← Docker only auto-assigns from .128–.255 +``` + +- `.2`–`.127` is now a **static-only reservation block** +- `.128`–`.255` is the dynamic pool + +Procedure (scripted; full state snapshot at `/root/dockerproxy-recreate-2026-05-17/` on Unraid): + +1. Capture container→static-IP map from `docker inspect` +2. `docker stop` all 32 in parallel +3. `docker network disconnect dockerproxy ` +4. `docker network rm dockerproxy` +5. `docker network create --driver bridge --subnet 172.18.0.0/16 --gateway 172.18.0.1 --ip-range 172.18.0.128/25 dockerproxy` +6. `docker network connect --ip dockerproxy ` (no `--ip` for the one dynamic container, `traefik-manager`) +7. Start in dependency order: dockersocket → postgresql17 + Redis → traefik + traefik-manager → authentik + authentik-worker → rest + +Final state: all 32 containers up, all static IPs preserved (.3–.70), traefik-manager moved from `.2` to `.128` (first dynamic slot). + +Reference: `docs/13-DOCKERPROXY-NETWORK.md` documents the network spec, recreate command, and current IP assignments. + ## Follow-up -- Audit remaining Dockge stacks on `dockerproxy` for missing `ipv4_address` reservations. -- Add a Uptime Kuma docker-state monitor for `traefik` (HTTP probes route *through* Traefik and fail uselessly when Traefik itself is the outage). -- Two Traefik-IP incidents in 12 days (also 2026-05-05 vaultwarden misroute) — consider a deliberate redesign of the dockerproxy IP map. +- IPAM redesign eliminates the auto-allocation collision class. Future stacks that omit `ipv4_address` land safely in `.128+`. +- Network is imperative (not in any compose); recreate command is documented in `docs/13-DOCKERPROXY-NETWORK.md` for disaster recovery. +- Add an Uptime Kuma docker-state monitor for `traefik` (HTTP probes route through Traefik and fail uselessly when Traefik is the outage).