Files
infrastructure/docs/CHANGELOG.md
T
jazzymc 02d07a8e52 docs: add Plex RD mount propagation fix to changelog
Fixed Transport endpoint not connected error — Plex volume mount
needed :rslave propagation to see rclone FUSE mount from zurg.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 21:15:38 +00:00

461 lines
25 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Infrastructure Changelog
**Purpose:** Major infrastructure events only. Minor changes are in git commit messages.
---
## 2026-05-05
### Plex Real-Debrid Mount Fix
- **[PLEX]** Fixed "Transport endpoint is not connected" — Plex couldn't see rclone FUSE mount for RD content
- **[PLEX]** Root cause: Plex volume `/mnt/user:/media` had default `rprivate` propagation; FUSE mounts from zurg-rclone didn't propagate into the container
- **[PLEX]** Fix: Added `:rslave` propagation to Plex compose volume mount in Dockge stack
- **[PLEX]** Triggered library scan on all 4 sections (Movies, Movies RD, TV Shows, TV Shows RD) — content playable again
### Uptime Kuma Cleanup
- **[MONITORING]** Updated 3 monitors with stale 192.168.31.x IPs to current 192.168.10.x addresses
- MikroTik Router: 192.168.31.1 → 192.168.10.1
- Unraid Server: 192.168.31.2 → 192.168.10.20
- Switch Ping: 192.168.31.9 → 192.168.10.3 (CSS326)
- **[MONITORING]** Deleted 7 duplicate monitors (MikroTik AdGuard Home ×3, MikroTik Tailscale ×2, MikroTik DNS ×2)
- **[MONITORING]** Final state: 27 clean monitors, no legacy 192.168.31.x references remaining
### Home Assistant Traefik Fix
- **[TRAEFIK]** Fixed HA service URL in services.yml: 192.168.10.200 (NanoKVM) → 192.168.10.20 (Unraid) — was causing 502
- **[HA]** Added `http:` section to configuration.yaml with `trusted_proxies: 172.18.0.0/16, 192.168.10.0/24` — HA was rejecting forwarded requests from Traefik (400)
### Nextcloud Upgrade
- **[NEXTCLOUD]** Ran `occ upgrade` — pending app/database migrations were causing 503 on all requests
### Uptime Kuma Monitor Fixes
- **[MONITORING]** Fixed Nextcloud monitor URL: nextcloud.xtrm-lab.org → cloud.xtrm-lab.org (matching actual Traefik route)
- **[MONITORING]** Fixed Speedtest monitor URL: speedtest.xtrm-lab.org → speed.xtrm-lab.org (matching actual Traefik route)
- **[MONITORING]** Deleted AdGuard DNS monitor (dns.xtrm-lab.org) — no Traefik route, duplicate of MikroTik AdGuard Home monitor
- **[MONITORING]** Deleted NetBox monitor — container not running/removed
- **[MONITORING]** Deleted MikroTik Tailscale monitor — 100.64.0.1 CGNAT not reachable from Unraid
- **[MONITORING]** Final state: 24 monitors, all green
### Kasm Workspaces Fix
- **[KASM]** Fixed session connection failure when accessing via Traefik reverse proxy
- **[KASM]** Changed zone `proxy_port` from 6333 to 443 in kasm_db — browser WebSocket now routes through Traefik instead of trying unreachable port 6333
- **[TRAEFIK]** Added `kasm-headers` middleware (same as `default-headers` but without `frameDeny`) — Kasm uses iframes for session rendering
- **[TRAEFIK]** Updated kasm-secure router to use `kasm-headers` middleware
- **[DOCS]** Added Kasm Workspaces to 03-SERVICES-OTHER.md
---
## 2026-03-12
### WiFi Optimization & Troubleshooting
- **[WIFI]** Moved HAP 5GHz from ch 36 (5180) to ch 149 (5745), skip-dfs-channels=all
- **[WIFI]** Moved CAP 5GHz from ch 52 (5260) to ch 36 (5180), band corrected from ax to ac
- **[WIFI]** Separated HAP/CAP 5GHz channels to avoid co-channel interference
- **[WIFI]** Fixed sec-xtrm2 security: WPA+WPA2 with TKIP+CCMP for IoT compatibility
- **[WIFI]** Fixed critical bug: empty inline security.authentication-types="" caused wifi2 to run as open network — IoT devices silently failed to connect
- **[WIFI]** Set explicit encryption on all interfaces and security profiles (never leave empty)
- **[WIFI]** Removed CAP-specific access-list catch-all rules — all clients now use unified MAC-based access list
- **[WIFI]** Fixed CAP interface IDs in access-list after re-provisioning (*20/*21 → *22/*23)
- **[WIFI]** Restored 2.4GHz band to 2ghz-g (was changed to 2ghz-n, breaking IoT devices)
- **[WIFI]** Disabled FT (802.11r) on wifi1 (5GHz) for stability
- **[DOCS]** Added 12-WIFI-TROUBLESHOOTING.md with diagnostic checklist and recovery commands
---
## 2026-02-28
### Docker Container Audit & Migration to Dockge
- **[DOCKER]** Removed 4 orphan images: nextcloud/all-in-one, olprog/unraid-docker-webui, ghcr.io/ich777/doh-server, ghcr.io/idmedia/hass-unraid
- **[DOCKER]** Removed ancient pgAdmin4 v2.1 (status=Created) and fenglc/pgadmin4 image
- **[DOCKER]** Removed spaceinvaderone/ha_inabox image (replaced by Home-Assistant-Container)
- **[TRAEFIK]** Removed Docker provider constraint (`traefik.constraint=valid`) — Docker labels now auto-discovered
- **[TRAEFIK]** Cleaned up dynamic.yml: removed 14 stale/migrated router+service pairs (pangolin, pihole, doh, netbox, and services now using Docker labels)
- **[TRAEFIK]** Added dockge-secure router to dynamic.yml
- **[DOCKER]** Created 6 new Dockge stacks: docker-socket-proxy, tuyagateway, firefly, seekandwatch, ha-time-machine, homeassistant (replaced inabox with Container)
- **[DOCKER]** Migrated ALL 53 containers from dockerman to Dockge compose stacks (100% coverage)
- **[DOCKER]** Fixed Nextcloud Traefik rule: empty Host() → Host(`cloud.xtrm-lab.org`)
- **[DOCKER]** Fixed UptimeKuma Traefik rule: empty Host() → Host(`uptime.xtrm-lab.org`)
- **[DOCKER]** Fixed Homarr domain: `homarr.xtrm-lab.org``xtrm-lab.org` (root domain)
- **[DOCKER]** Fixed Netdisco entrypoint: `websecure``https`
- **[DOCKER]** Removed stale `traefik.constraint=valid` from Dockhand
- **[DOCKER]** Fixed Transmission middleware: removed non-existent `transmission-headers@file`
- **[DOCKER]** Added Authentik forward auth middleware to: n8n, homarr, transmission, speedtest-tracker, uptime-kuma, firefly, seekandwatch, open-webui, traefik dashboard, dockge, netalertx, urbackup, unimus
- **[DOCKER]** Added Traefik labels to: vaultwarden, open-webui (ai.xtrm-lab.org), firefly, seekandwatch
- **[DOCKER]** Added missing Unraid labels (icon, managed, webui) to: ntfy, timemachine, ollama, docker-socket-proxy, tuyagateway, all new stacks
- **[DOCKER]** Moved ollama + open-webui from bridge to dockerproxy network
- **[DOCKER]** Moved fireflyiii + firefly-data-importer from none to dockerproxy network
- **[DOCKER]** Moved SeekAndWatch from bridge to dockerproxy network
- **[DOCKER]** Removed traefik labels from host-network containers (plex, netalertx) — routed via dynamic.yml only
- **[DOCKER]** Fixed NetAlertX: added read_only, proper capabilities (NET_RAW/NET_ADMIN), and UID 20211
- **[DOCKER]** Removed empty netbox stack directory
## 2026-03-09
### Claude Code Tooling Completion
- **[SERVICE]** Installed Cooperator CLI v3.36.1 on Unraid (`npm install -g @ampeco/cooperator`)
- **[SERVICE]** Ran `cooperator install --non-interactive` — symlinked commands, agents, 12 skills to `~/.claude/`
- **[SERVICE]** Created `~/.cooperator/.env` with Shortcut API token, Confluence token, git config
- **[SERVICE]** Installed glab CLI v1.89.0 on Unraid (`/usr/local/bin/glab`) — authenticated as kaloyan.danchev
- **[SERVICE]** Installed uv package manager + Python 3.12.13 on Unraid
- **[SERVICE]** Created Python venvs for mikrotik-mcp and unraid-mcp projects
- **[SERVICE]** Copied MikroTik SSH key from Mac to Unraid — SSH to HAP ax3 verified working
- **[SERVICE]** Synced 6 custom Claude skills to `/mnt/user/appdata/claude-code/custom-skills/` (ev-compliance-story, ev-protocol-expert, frontend-designer, mikrotik-admin, prd-generator, unraid-admin)
- **[SERVICE]** Built shortcut MCP server at `/mnt/user/appdata/claude-code/mcp-server-shortcut/`
- **[SERVICE]** Enabled Claude plugins: ralph-loop, claude-md-management, playground
- **[DOCS]** Updated 12-DEVELOPMENT-ENVIRONMENT.md with Cooperator, glab, Python, skills, MCP sections
#### TODO — MCP Server Registration
The following MCP servers are built/ready but need `claude mcp add` registration (requires interactive Claude session on Unraid):
- shortcut, mikrotik, unraid, playwright, smartbear
## 2026-03-08
### Development Environment Setup
- **[SERVICE]** Installed OpenVSCode Server as host-native process (port 3100, not a container) — accessible at https://code.xtrm-lab.org
- **[SERVICE]** Traefik route added in dynamic.yml with Authentik forward auth
- **[SERVICE]** Boot auto-start via `/boot/config/go``/mnt/user/appdata/openvscode/start.sh`
- **[SERVICE]** Claude Code updated to v2.1.71, persistent at `/mnt/user/appdata/claude-code/.npm-global/`
- **[SERVICE]** Cooperator CLI v3.36.1 installed globally (`npm install -g @ampeco/cooperator`)
- **[SERVICE]** Created `/mnt/user/projects/` workspace with 12 personal repos (Gitea) + 18 AMPECO work projects (GitLab)
- **[DOCS]** Added `12-DEVELOPMENT-ENVIRONMENT.md` documenting full dev environment setup
### Docker Maintenance
- **[DOCKER]** Created Unraid Docker Manager XML templates for 11 containers missing them (adguardhome, gitea, minecraft, ntfy, ollama, open-webui, etc.)
- **[DOCKER]** Pulled new images for all 30 active Dockge stacks, 14 containers received updates
- **[DOCKER]** Cleaned up dangling images: 10.95 GB reclaimed
- **[DOCKER]** Organized all 42 containers into Docker Folders (12 folders: Infrastructure, Security, Monitoring, DevOps, Media, etc.)
- **[DOCKER]** Pushed 6 local-only projects to Gitea (claude-skills, mikrotik-mcp, unraid-mcp, nanobot-mcp, nanobot-hkuds, openclaw)
### Service Fixes
- **[FIX]** Gitea DB connection: fixed hardcoded PostgreSQL IP (172.18.0.13) → hostname `postgresql17` in compose and app.ini
- **[FIX]** Traefik: removed stale stopped container blocking restart
- **[FIX]** Redis: removed stale stopped container blocking recreate
## 2026-02-26
### WiFi & CAP VLAN Fixes
- **[WIFI]** Fixed 5GHz channel overlap: HAP wifi1 reduced from 80MHz to 40MHz at 5180MHz, CAP cap-wifi1 at 5220MHz (no overlap)
- **[WIFI]** Restored all 29 WiFi access-list MAC→VLAN entries (were missing/lost)
- **[WIFI]** Fixed cap-wifi2 band mismatch: was `band=2ghz-n` with frequency=5220 (5GHz), corrected to frequency=2412
- **[CAPSMAN]** Enabled bridge VLAN filtering on CAP (cAP XL ac) — all VLANs now properly tagged through CAP
- **[CAPSMAN]** CAP bridgeLocal config: vlan-filtering=yes, pvid=10, VLANs 10/20/25/30/35/40 with proper tagged/untagged members
- **[CAPSMAN]** Set `capdp` datapath vlan-id=40 for default PVID on dynamic wifi bridge ports
- **[CAPSMAN]** VLAN assignment through CAP now working — access-list vlan-id entries propagate correctly
- **[NETWORK]** Fixed AdGuard Home IP conflict: container was at 192.168.10.2 (CAP's IP), now static at 192.168.10.10
- **[NETWORK]** Fixed adguardhome-sync IP conflict: was at 192.168.10.3 (CSS326's IP), now static at 192.168.10.11
- **[WIFI]** Added Xiaomi Air Purifier 2 (C8:5C:CC:40:B4:AA) to access-list as VLAN 30 (IoT)
### WiFi Quality Optimization
- **[WIFI]** Fixed 2.4GHz co-channel interference: HAP on ch 1 (2412), CAP moved from ch 1 to ch 6 (2437)
- **[WIFI]** Fixed 5GHz overlap: HAP stays ch 36 (5180, 40MHz), CAP moved from ch 44 (5220) to ch 52 (5260, DFS)
- **[WIFI]** Fixed CAP 2.4GHz width from 40MHz to 20MHz for IoT compatibility
- **[WIFI]** TX power kept at defaults (17/16 dBm) — reduction caused kitchen coverage loss through concrete walls
## 2026-02-24
### Motherboard Replacement & NVMe Cache Pool
- **[HARDWARE]** Replaced XTRM-U motherboard — new MAC `38:05:25:35:8E:7A`, DHCP lease updated on MikroTik
- **[HARDWARE]** Confirmed disk1 (10TB HGST HUH721010ALE601, serial 2TKK3K1D) mechanically dead — clicking heads, fails on multiple SATA ports and new motherboard
- **[STORAGE]** Created new Unraid-managed cache pool: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1 (~1.8TB usable)
- **[STORAGE]** Pool settings: autotrim=on, compression=on
- **[DOCKER]** Migrated Docker from btrfs loopback image (disk1 HDD) to ZFS on NVMe cache pool
- **[DOCKER]** Docker now uses ZFS storage driver directly on `cache/system/docker` dataset
- **[DOCKER]** Recreated `dockerproxy` bridge network, rebuilt all 39 container templates
- **[DOCKER]** Restarted Dockge and critical stacks (adguardhome, ntfy, gitea, woodpecker, etc.)
- **[STORAGE]** Deleted old `docker.img` (200GB) from disk1
- **[INCIDENT]** disk1 still running in parity-emulated mode — replacement drive needed
### Post-Migration Container Cleanup
- **[NETWORK]** Fixed Traefik unreachable: removed stale Docker bridge (duplicate 172.18.0.0/16 subnet) + 7 orphaned bridges
- **[DOCKER]** Removed deprecated containers: DoH-Server, binhex-plexpass (duplicate of Plex)
- **[DOCKER]** Removed obsolete containers: HomeAssistant_inabox, Docker-WebUI, hass-unraid
- **[DOCKER]** Removed nextcloud-aio-mastercontainer (replaced by Nextcloud container)
- **[SERVICE]** Fixed adguardhome-sync: recreated config file (was directory from migration), switched to br0 network for macvlan reachability
- **[SERVICE]** Fixed diode stack: recreated .env, nginx.conf, OAuth2 client config; ran Hydra DB migration and client bootstrap
- **[SERVICE]** Fixed diode-agent: corrected YAML format, secrets, and Hydra authentication
- **[SERVICE]** Started unmarr (Homarr fork, 172.18.0.81) and rustfs (S3-compatible storage)
- **[DOCKER]** Final state: 53 containers running, pgAdmin4 stopped (utility)
- **[DOCS]** Updated 03-SERVICES-OTHER.md with removed containers
---
## 2026-02-14
### CAP XL ac Recovery
- **[WIRELESS]** Factory reset CAP XL ac (lost credentials)
- **[WIRELESS]** Reconfigured CAPsMAN: regenerated certificate, CAP re-enrolled with `certificate=request`
- **[WIRELESS]** Both CAP radios now active: wifi1 (2.4GHz XTRM2) + wifi2 (5GHz XTRM)
- **[WIRELESS]** CAP now running RouterOS 7.21.1
- **[WIRELESS]** Enabled SSH on CAP port 2222 for user xtrm with mikrotik key
- **[WIRELESS]** Confirmed WiFi access list has no VLAN assignment (rolled back Jan 27)
### Roms Network Share
- **[SERVICE]** Shared /mnt/user/roms (2.3TB, 49 systems) via SMB from Unraid
- **[SERVICE]** Mounted on Nobara at /mnt/roms (fstab, CIFS guest, systemd.automount)
- **[SERVICE]** Mounted on Recalbox via custom.sh boot script (CIFS bind mounts)
- **[SERVICE]** Deleted local roms from Recalbox SD card (~12.5GB freed)
### WiFi DHCP Fix
- **[NETWORK]** Fixed DHCP not working on HAP1 local WiFi (wifi1/wifi2)
- **[NETWORK]** Root cause: VLAN 40 had wifi1/wifi2 as **tagged** instead of **untagged** — DHCP responses had 802.1Q tags clients couldn't process
- **[NETWORK]** Fix: `/interface bridge vlan set` wifi1,wifi2 to untagged for VLAN 40
### Minecraft Server Deployed
- **[SERVICE]** Deployed Minecraft Java Edition (itzg/minecraft-server) on Unraid
- **[SERVICE]** Version 1.21.11, Survival mode, 2GB RAM, max 10 players
- **[SERVICE]** Docker IP 172.18.0.80, port 25565, Dockge stack `minecraft`
- **[NETWORK]** NAT port forward WAN:25565 → 192.168.10.20:25565
- **[NETWORK]** Hairpin NAT for internal access via minecraft.xtrm-lab.org
- **[SERVICE]** Added Unraid labels with Minecraft icon
### Documentation Updates
- **[DOCS]** Updated 07-WIFI-CAPSMAN-CONFIG.md: CAP both radios working, access list status
- **[DOCS]** Updated 01-NETWORK-MAP.md: Fixed CAP IP (.6→.2), added Nobara and SMB shares
- **[DOCS]** Updated 04-HARDWARE-INVENTORY.md: CAP details, added Recalbox device
- **[DOCS]** Updated 06-VLAN-DEVICE-ASSIGNMENT.md: Added Nobara (VLAN 10) and Recalbox (VLAN 25)
- **[DOCS]** Updated 03-SERVICES-OTHER.md: Added Roms SMB share, Minecraft server section
---
## 2026-02-13
### Failover Infrastructure Deployed
- **[SERVICE]** Deployed Docker failover stack on XTRM-Nobara (Traefik, Vaultwarden, Authentik, AdGuard Home)
- **[SERVICE]** Installed Docker CE 29.2.1 + Docker Compose 5.0.2 on Nobara
- **[SERVICE]** Deployed Keepalived VRRP for automatic failover (VIP: 192.168.10.250)
- **[SERVICE]** Unraid: Keepalived as Docker container (local/keepalived, MASTER priority 150)
- **[SERVICE]** Nobara: Keepalived as systemd service (BACKUP priority 100)
- **[SERVICE]** Replicated data: Vaultwarden DB, Authentik PostgreSQL dump (864MB), AdGuard config, Traefik certs
- **[NETWORK]** Added VRRP protocol to Nobara firewall (firewalld)
- **[NETWORK]** Configured SSH key auth to Nobara (id_ed25519_nobara, passwordless)
- **[NETWORK]** Added SSH config alias: `ssh nobara`
- **[DOCS]** Created 10-FAILOVER-NOBARA.md with full failover documentation
- **[DOCS]** Updated 02-SERVICES-CRITICAL.md with failover section
- **[DOCS]** Updated 04-HARDWARE-INVENTORY.md with XTRM-Nobara specs
- **[DOCS]** Updated README.md and CLAUDE.md with Nobara references
---
## 2026-02-06
### Unraid Flash Drive Failure
- **[INCIDENT]** Unraid flash drive crashing - migration procedure created
- **[DOCS]** Created incident report with full flash drive replacement procedure
### Documentation Restructure
- **[DOCS]** Restructured docs/ from 23 files to clean 9-doc structure
- **[DOCS]** Archived 12 completed VLAN migration project docs to archive/vlan-migration/
- **[DOCS]** Archived 5 done/superseded WIP docs (VLAN proposals, AI stack, Fossorial, DNS backup)
- **[DOCS]** Created standing reference docs: 08-DNS-ARCHITECTURE.md, 09-TAILSCALE-VPN.md
- **[DOCS]** Renamed docs to clean numbering (05-PORT-UTILIZATION, 06-VLAN-DEVICE-ASSIGNMENT, 07-WIFI-CAPSMAN-CONFIG)
- **[DOCS]** Merged 00-CHANGELOG.md + 06-CHANGELOG.md → CHANGELOG.md
- **[DOCS]** Updated all core docs with current VLAN IPs (192.168.31.x → 192.168.10.x)
- **[DOCS]** Fixed CSS1 IP: 192.168.10.9 → 192.168.10.3, ZX1 IP: 192.168.10.7 → 192.168.10.4
- **[DOCS]** Cleaned 06-VLAN-DEVICE-ASSIGNMENT.md: removed migration-era columns and sections, fixed VLAN 25 subnet
- **[DOCS]** Updated README.md, CLAUDE.md, archive/README.md, wip/README.md
---
## 2026-02-01
### WIP Documentation
- **[DOCS]** Added KVM-SWITCH-MAC-NOBARA.md - Software KVM for Mac/Nobara switching
- DDC/CI monitor control (Dell U3821DW) + HID++ Logitech peripheral switching
- Scripts created on Mac at ~/scripts/
---
## 2026-01-31
### Docker Cleanup
- **[DOCKER]** Removed 18 unused images (~4.9 GB reclaimed)
- **[DOCKER]** Removed 12 dangling images (old builds, untagged)
- **[DOCKER]** Removed Slurpit stack images (warehouse, portal, scanner, scraper)
- **[DOCKER]** Removed unused MongoDB 8 and MariaDB 11 images
- **[DOCKER]** Removed 35 orphaned volumes (~1.15 GB reclaimed)
- **[DOCKER]** Removed 28 anonymous dangling volumes
- **[DOCKER]** Removed 6 nextcloud_aio_* volumes (from old AIO install)
- **[DOCKER]** Removed orphaned redis-data volume
- **[DOCKER]** **Total reclaimed: ~6 GB**
### Kept (Stopped Containers)
- open-webui, ollama (AI stack - for future use)
- pgAdmin4 (database management)
- diode-hydra-migrate, diode-auth-bootstrap (one-time migration jobs)
---
## 2026-01-27
### VLAN Filtering Rolled Back
- **[VLAN]** Enabled VLAN filtering - caused connectivity issues
- **[VLAN]** ZX1 switch unreachable after activation (no management IP responding)
- **[VLAN]** CSS326 traffic routing through ZX1 (not direct eth3 link)
- **[VLAN]** **Rolled back** - VLAN filtering disabled
- **[CONFIG]** Added eth4 (ZX1) to all VLAN tagged lists for future use
- **[STATUS]** Network back to Legacy mode (192.168.31.0/24)
- **[TODO]** Need physical access to ZX1 to configure VLAN trunking
### Issues Identified
- ZX1 switch not responding on documented IP 192.168.31.22
- ZX1 may need VLAN trunk configuration before re-enabling filtering
- All CSS326 traffic goes via ZX1→HAP1, not direct CSS326→HAP1 link (STP?)
---
## 2026-01-26
### VLAN Filtering Activated
- **[VLAN]** VLAN filtering enabled on MikroTik bridge - SUCCESSFUL
- **[VLAN]** Internet connectivity verified (ping 1.1.1.1, google.com)
- **[VLAN]** DNS resolution working through AdGuard
- **[VLAN]** All previous fixes (DHCP DNS, firewall, NAT masquerade) working correctly
- **[STATUS]** Network segmentation now ACTIVE
### Local AI Stack Deployed
- **[AI]** Deployed Ollama container with Intel GPU passthrough
- **[AI]** Deployed Open WebUI at http://192.168.31.2:3080
- **[AI]** Installed qwen2.5-coder:7b base model
- **[AI]** Created custom `unraid-assistant` model with infrastructure knowledge
- **[AI]** Created `/usr/local/bin/ai` terminal helper command
- **[AI]** Stopped non-critical containers for RAM: karakeep, unimus, homarr, netdisco-*
### VLAN Activation Attempt & Fixes
- **[VLAN]** Configured CSS326 switch VLANs via SwOS web interface
- **[VLAN]** Enabled VLAN filtering on MikroTik - caused internet outage
- **[VLAN]** Rolled back VLAN filtering to restore connectivity
- **[VLAN]** **ROOT CAUSE IDENTIFIED:** Multiple configuration issues
### Issues Fixed
- **[FIX]** DHCP DNS now points to each VLAN gateway instead of legacy 192.168.31.1
- **[FIX]** Added DNS redirect rules for all VLANs (src-address-list=all-vlans)
- **[FIX]** Added all VLAN interfaces to LAN firewall interface list
- **[FIX]** Added NAT masquerade rules for VLAN traffic to AdGuard container
- **[BACKUP]** MikroTik config saved before activation attempt
---
## 2026-01-25
### VLAN Phase 1 Complete
- **[VLAN]** Added VLAN 25 (Kids) - interface, IP, DHCP server, pool, bridge entry
- **[VLAN]** Fixed VLAN 10 (Management) leases - correct IPs per device assignment doc
- **[VLAN]** Fixed VLAN 30 (IoT) leases - all 14 devices with correct IPs
- **[VLAN]** Added VLAN 25 (Kids) leases - 6 devices including XTRM-Ally
- **[VLAN]** Added VLAN 50 (Guest) leases - 7 unknown devices
- **[VLAN]** Added firewall rules for VLAN 25 (Kids → IoT, Legacy, DNS)
- **[VLAN]** Total devices configured: 44
### VLAN Implementation (Prepared)
- **[VLAN]** Created 6 VLANs on MikroTik bridge (10, 20, 30, 35, 40, 50)
- **[VLAN]** Configured IP addresses for all VLAN interfaces
- **[VLAN]** Created DHCP servers and pools for each VLAN
- **[VLAN]** Added static DHCP leases mapping MACs to VLAN IPs
- **[VLAN]** Configured bridge VLAN table with tagged/untagged ports
- **[VLAN]** Set WiFi ports PVID=20 (Trusted VLAN default)
- **[VLAN]** Added inter-VLAN firewall rules (active)
- **[VLAN]** VLAN filtering NOT YET ENABLED (pending CSS326 switch config)
- **[DOCS]** Added docs/11-VLAN-IMPLEMENTATION.md
- **[SCRIPTS]** Added scripts/mikrotik-vlan-setup.rsc and mikrotik-vlan-enable.rsc
### DNS Configuration
- **[DNS]** Updated both AdGuard instances to use Quad9 DoH
- **[DNS]** Bootstrap DNS: 9.9.9.9, 149.112.112.112
### MikroTik Containers
- **[CONTAINER]** AdGuard Home container running on MikroTik (172.17.0.2)
- **[CONTAINER]** Tailscale container configured (172.17.0.3)
- **[CONTAINER]** Fixed Tailscale container authentication
- **[CONTAINER]** Container bridge (containers-br) with NAT
### Network
- **[NETWORK]** Enabled CSS326 SFP1 port - 10G backbone link to ZX1 now active
### Documentation
- **[DOCS]** Created 02-PORT-UTILIZATION.md with ASCII port diagrams
- **[DOCS]** Fixed ZX1 switch IP: 192.168.31.22 (was incorrectly documented as .7)
### Incident
- **[INCIDENT]** DNS outage after MikroTik restart - multiple root causes fixed:
- NAT rules blocking AdGuard outbound DNS (added exception rules)
- DHCP pushing wrong DNS (8.8.8.8 → 192.168.31.1)
- NAT redirect pointing to wrong IP/port (172.17.0.5:5355 → 192.168.31.4:53)
- Asymmetric routing (added srcnat masquerade for DNS redirect)
- **[SERVICE]** Removed MikroTik AdGuard Home container (storage/overlay errors)
- **[SERVICE]** Removed MikroTik Tailscale container (root directory missing)
- **[SERVICE]** Removed Pi-hole/Unbound leftovers from MikroTik (veth, mounts, envs)
- **[NETWORK]** Consolidated DNS architecture: MikroTik → Unraid AdGuard (192.168.31.4) only
- **[DOCS]** Created incident reports in docs/incidents/
- **[DOCS]** Restructured documentation - consolidated into 5 core docs + archive
- **[NETBOX]** Added shelf devices for rack organization (U9, U7, U3)
---
## 2026-01-24
- **[NETBOX]** Standardized device names to NetBox convention (HAP1, CSS1, ZX1)
- **[DOCS]** Created NETWORK-PHYSICAL-MAP.md with complete port maps
---
## 2026-01-23
- **[SERVICE]** Deployed Diode network discovery stack
- **[SERVICE]** Removed Slurp'it (replaced by Diode + NetDisco)
- **[SERVICE]** Consolidated NetBox Redis to shared instance
- **[SERVICE]** Removed redundant DNS services (Unbound, DoH-Server, stunnel-dot)
---
## 2026-01-22
- **[SERVICE]** Migrated NetBox to shared PostgreSQL 17
- **[SERVICE]** Deployed AdGuard Home on MikroTik (primary DNS)
- **[SERVICE]** Deployed AdGuard Home on Unraid (secondary DNS)
- **[SERVICE]** Removed Pi-hole (replaced by AdGuard Home)
- **[DOCS]** Created INFRASTRUCTURE-DIAGRAM.md
---
## 2026-01-21
- **[BACKUP]** Configured Rclone sync to Google Drive
---
## 2026-01-19
- **[SERVICE]** Deployed NetBox IPAM/DCIM
- **[SERVICE]** Deployed NetDisco network discovery
- **[NETWORK]** Enabled SNMP on all MikroTik devices
---
## 2026-01-18
- **[SERVICE]** Deployed Gitea git server
- **[SERVICE]** Deployed Woodpecker CI
- **[NETWORK]** Configured CAPsMAN on HAP1
- **[WIRELESS]** CAP added to CAPsMAN management
---
## 2026-01-17
- **[SERVICE]** Deployed Portainer CE
---
## Previous History
For detailed history before 2026-01-17, see archived changelogs in `archive/`.
---
## Format Guide
```markdown
### YYYY-MM-DD
- **[CATEGORY]** Brief description
Categories:
- [DEVICE] - Hardware added/removed/changed
- [SERVICE] - Container/service deployed/removed
- [NETWORK] - Network topology/config changes
- [WIRELESS] - WiFi/CAPsMAN changes
- [BACKUP] - Backup configuration
- [DOCS] - Major documentation changes
- [INCIDENT] - Outages and fixes
- [VLAN] - VLAN configuration changes
- [DOCKER] - Docker maintenance
```