All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Fixed Traefik networking (stale Docker bridge), adguardhome-sync config, diode stack (Hydra DB + OAuth2 bootstrap), diode-agent auth. Removed 5 deprecated/duplicate containers. Started unmarr + rustfs stacks. 53 containers now running. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 KiB
14 KiB
Infrastructure Changelog
Purpose: Major infrastructure events only. Minor changes are in git commit messages.
2026-02-24
Motherboard Replacement & NVMe Cache Pool
- [HARDWARE] Replaced XTRM-U motherboard — new MAC
38:05:25:35:8E:7A, DHCP lease updated on MikroTik - [HARDWARE] Confirmed disk1 (10TB HGST HUH721010ALE601, serial 2TKK3K1D) mechanically dead — clicking heads, fails on multiple SATA ports and new motherboard
- [STORAGE] Created new Unraid-managed cache pool: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1 (~1.8TB usable)
- [STORAGE] Pool settings: autotrim=on, compression=on
- [DOCKER] Migrated Docker from btrfs loopback image (disk1 HDD) to ZFS on NVMe cache pool
- [DOCKER] Docker now uses ZFS storage driver directly on
cache/system/dockerdataset - [DOCKER] Recreated
dockerproxybridge network, rebuilt all 39 container templates - [DOCKER] Restarted Dockge and critical stacks (adguardhome, ntfy, gitea, woodpecker, etc.)
- [STORAGE] Deleted old
docker.img(200GB) from disk1 - [INCIDENT] disk1 still running in parity-emulated mode — replacement drive needed
Post-Migration Container Cleanup
- [NETWORK] Fixed Traefik unreachable: removed stale Docker bridge (duplicate 172.18.0.0/16 subnet) + 7 orphaned bridges
- [DOCKER] Removed deprecated containers: DoH-Server, binhex-plexpass (duplicate of Plex)
- [DOCKER] Removed obsolete containers: HomeAssistant_inabox, Docker-WebUI, hass-unraid
- [DOCKER] Removed nextcloud-aio-mastercontainer (replaced by Nextcloud container)
- [SERVICE] Fixed adguardhome-sync: recreated config file (was directory from migration), switched to br0 network for macvlan reachability
- [SERVICE] Fixed diode stack: recreated .env, nginx.conf, OAuth2 client config; ran Hydra DB migration and client bootstrap
- [SERVICE] Fixed diode-agent: corrected YAML format, secrets, and Hydra authentication
- [SERVICE] Started unmarr (Homarr fork, 172.18.0.81) and rustfs (S3-compatible storage)
- [DOCKER] Final state: 53 containers running, pgAdmin4 stopped (utility)
- [DOCS] Updated 03-SERVICES-OTHER.md with removed containers
2026-02-14
CAP XL ac Recovery
- [WIRELESS] Factory reset CAP XL ac (lost credentials)
- [WIRELESS] Reconfigured CAPsMAN: regenerated certificate, CAP re-enrolled with
certificate=request - [WIRELESS] Both CAP radios now active: wifi1 (2.4GHz XTRM2) + wifi2 (5GHz XTRM)
- [WIRELESS] CAP now running RouterOS 7.21.1
- [WIRELESS] Enabled SSH on CAP port 2222 for user xtrm with mikrotik key
- [WIRELESS] Confirmed WiFi access list has no VLAN assignment (rolled back Jan 27)
Roms Network Share
- [SERVICE] Shared /mnt/user/roms (2.3TB, 49 systems) via SMB from Unraid
- [SERVICE] Mounted on Nobara at /mnt/roms (fstab, CIFS guest, systemd.automount)
- [SERVICE] Mounted on Recalbox via custom.sh boot script (CIFS bind mounts)
- [SERVICE] Deleted local roms from Recalbox SD card (~12.5GB freed)
WiFi DHCP Fix
- [NETWORK] Fixed DHCP not working on HAP1 local WiFi (wifi1/wifi2)
- [NETWORK] Root cause: VLAN 40 had wifi1/wifi2 as tagged instead of untagged — DHCP responses had 802.1Q tags clients couldn't process
- [NETWORK] Fix:
/interface bridge vlan setwifi1,wifi2 to untagged for VLAN 40
Minecraft Server Deployed
- [SERVICE] Deployed Minecraft Java Edition (itzg/minecraft-server) on Unraid
- [SERVICE] Version 1.21.11, Survival mode, 2GB RAM, max 10 players
- [SERVICE] Docker IP 172.18.0.80, port 25565, Dockge stack
minecraft - [NETWORK] NAT port forward WAN:25565 → 192.168.10.20:25565
- [NETWORK] Hairpin NAT for internal access via minecraft.xtrm-lab.org
- [SERVICE] Added Unraid labels with Minecraft icon
Documentation Updates
- [DOCS] Updated 07-WIFI-CAPSMAN-CONFIG.md: CAP both radios working, access list status
- [DOCS] Updated 01-NETWORK-MAP.md: Fixed CAP IP (.6→.2), added Nobara and SMB shares
- [DOCS] Updated 04-HARDWARE-INVENTORY.md: CAP details, added Recalbox device
- [DOCS] Updated 06-VLAN-DEVICE-ASSIGNMENT.md: Added Nobara (VLAN 10) and Recalbox (VLAN 25)
- [DOCS] Updated 03-SERVICES-OTHER.md: Added Roms SMB share, Minecraft server section
2026-02-13
Failover Infrastructure Deployed
- [SERVICE] Deployed Docker failover stack on XTRM-Nobara (Traefik, Vaultwarden, Authentik, AdGuard Home)
- [SERVICE] Installed Docker CE 29.2.1 + Docker Compose 5.0.2 on Nobara
- [SERVICE] Deployed Keepalived VRRP for automatic failover (VIP: 192.168.10.250)
- [SERVICE] Unraid: Keepalived as Docker container (local/keepalived, MASTER priority 150)
- [SERVICE] Nobara: Keepalived as systemd service (BACKUP priority 100)
- [SERVICE] Replicated data: Vaultwarden DB, Authentik PostgreSQL dump (864MB), AdGuard config, Traefik certs
- [NETWORK] Added VRRP protocol to Nobara firewall (firewalld)
- [NETWORK] Configured SSH key auth to Nobara (id_ed25519_nobara, passwordless)
- [NETWORK] Added SSH config alias:
ssh nobara - [DOCS] Created 10-FAILOVER-NOBARA.md with full failover documentation
- [DOCS] Updated 02-SERVICES-CRITICAL.md with failover section
- [DOCS] Updated 04-HARDWARE-INVENTORY.md with XTRM-Nobara specs
- [DOCS] Updated README.md and CLAUDE.md with Nobara references
2026-02-06
Unraid Flash Drive Failure
- [INCIDENT] Unraid flash drive crashing - migration procedure created
- [DOCS] Created incident report with full flash drive replacement procedure
Documentation Restructure
- [DOCS] Restructured docs/ from 23 files to clean 9-doc structure
- [DOCS] Archived 12 completed VLAN migration project docs to archive/vlan-migration/
- [DOCS] Archived 5 done/superseded WIP docs (VLAN proposals, AI stack, Fossorial, DNS backup)
- [DOCS] Created standing reference docs: 08-DNS-ARCHITECTURE.md, 09-TAILSCALE-VPN.md
- [DOCS] Renamed docs to clean numbering (05-PORT-UTILIZATION, 06-VLAN-DEVICE-ASSIGNMENT, 07-WIFI-CAPSMAN-CONFIG)
- [DOCS] Merged 00-CHANGELOG.md + 06-CHANGELOG.md → CHANGELOG.md
- [DOCS] Updated all core docs with current VLAN IPs (192.168.31.x → 192.168.10.x)
- [DOCS] Fixed CSS1 IP: 192.168.10.9 → 192.168.10.3, ZX1 IP: 192.168.10.7 → 192.168.10.4
- [DOCS] Cleaned 06-VLAN-DEVICE-ASSIGNMENT.md: removed migration-era columns and sections, fixed VLAN 25 subnet
- [DOCS] Updated README.md, CLAUDE.md, archive/README.md, wip/README.md
2026-02-01
WIP Documentation
- [DOCS] Added KVM-SWITCH-MAC-NOBARA.md - Software KVM for Mac/Nobara switching
- DDC/CI monitor control (Dell U3821DW) + HID++ Logitech peripheral switching
- Scripts created on Mac at ~/scripts/
2026-01-31
Docker Cleanup
- [DOCKER] Removed 18 unused images (~4.9 GB reclaimed)
- [DOCKER] Removed 12 dangling images (old builds, untagged)
- [DOCKER] Removed Slurpit stack images (warehouse, portal, scanner, scraper)
- [DOCKER] Removed unused MongoDB 8 and MariaDB 11 images
- [DOCKER] Removed 35 orphaned volumes (~1.15 GB reclaimed)
- [DOCKER] Removed 28 anonymous dangling volumes
- [DOCKER] Removed 6 nextcloud_aio_* volumes (from old AIO install)
- [DOCKER] Removed orphaned redis-data volume
- [DOCKER] Total reclaimed: ~6 GB
Kept (Stopped Containers)
- open-webui, ollama (AI stack - for future use)
- pgAdmin4 (database management)
- diode-hydra-migrate, diode-auth-bootstrap (one-time migration jobs)
2026-01-27
VLAN Filtering Rolled Back
- [VLAN] Enabled VLAN filtering - caused connectivity issues
- [VLAN] ZX1 switch unreachable after activation (no management IP responding)
- [VLAN] CSS326 traffic routing through ZX1 (not direct eth3 link)
- [VLAN] Rolled back - VLAN filtering disabled
- [CONFIG] Added eth4 (ZX1) to all VLAN tagged lists for future use
- [STATUS] Network back to Legacy mode (192.168.31.0/24)
- [TODO] Need physical access to ZX1 to configure VLAN trunking
Issues Identified
- ZX1 switch not responding on documented IP 192.168.31.22
- ZX1 may need VLAN trunk configuration before re-enabling filtering
- All CSS326 traffic goes via ZX1→HAP1, not direct CSS326→HAP1 link (STP?)
2026-01-26
VLAN Filtering Activated
- [VLAN] VLAN filtering enabled on MikroTik bridge - SUCCESSFUL
- [VLAN] Internet connectivity verified (ping 1.1.1.1, google.com)
- [VLAN] DNS resolution working through AdGuard
- [VLAN] All previous fixes (DHCP DNS, firewall, NAT masquerade) working correctly
- [STATUS] Network segmentation now ACTIVE
Local AI Stack Deployed
- [AI] Deployed Ollama container with Intel GPU passthrough
- [AI] Deployed Open WebUI at http://192.168.31.2:3080
- [AI] Installed qwen2.5-coder:7b base model
- [AI] Created custom
unraid-assistantmodel with infrastructure knowledge - [AI] Created
/usr/local/bin/aiterminal helper command - [AI] Stopped non-critical containers for RAM: karakeep, unimus, homarr, netdisco-*
VLAN Activation Attempt & Fixes
- [VLAN] Configured CSS326 switch VLANs via SwOS web interface
- [VLAN] Enabled VLAN filtering on MikroTik - caused internet outage
- [VLAN] Rolled back VLAN filtering to restore connectivity
- [VLAN] ROOT CAUSE IDENTIFIED: Multiple configuration issues
Issues Fixed
- [FIX] DHCP DNS now points to each VLAN gateway instead of legacy 192.168.31.1
- [FIX] Added DNS redirect rules for all VLANs (src-address-list=all-vlans)
- [FIX] Added all VLAN interfaces to LAN firewall interface list
- [FIX] Added NAT masquerade rules for VLAN traffic to AdGuard container
- [BACKUP] MikroTik config saved before activation attempt
2026-01-25
VLAN Phase 1 Complete
- [VLAN] Added VLAN 25 (Kids) - interface, IP, DHCP server, pool, bridge entry
- [VLAN] Fixed VLAN 10 (Management) leases - correct IPs per device assignment doc
- [VLAN] Fixed VLAN 30 (IoT) leases - all 14 devices with correct IPs
- [VLAN] Added VLAN 25 (Kids) leases - 6 devices including XTRM-Ally
- [VLAN] Added VLAN 50 (Guest) leases - 7 unknown devices
- [VLAN] Added firewall rules for VLAN 25 (Kids → IoT, Legacy, DNS)
- [VLAN] Total devices configured: 44
VLAN Implementation (Prepared)
- [VLAN] Created 6 VLANs on MikroTik bridge (10, 20, 30, 35, 40, 50)
- [VLAN] Configured IP addresses for all VLAN interfaces
- [VLAN] Created DHCP servers and pools for each VLAN
- [VLAN] Added static DHCP leases mapping MACs to VLAN IPs
- [VLAN] Configured bridge VLAN table with tagged/untagged ports
- [VLAN] Set WiFi ports PVID=20 (Trusted VLAN default)
- [VLAN] Added inter-VLAN firewall rules (active)
- [VLAN] VLAN filtering NOT YET ENABLED (pending CSS326 switch config)
- [DOCS] Added docs/11-VLAN-IMPLEMENTATION.md
- [SCRIPTS] Added scripts/mikrotik-vlan-setup.rsc and mikrotik-vlan-enable.rsc
DNS Configuration
- [DNS] Updated both AdGuard instances to use Quad9 DoH
- [DNS] Bootstrap DNS: 9.9.9.9, 149.112.112.112
MikroTik Containers
- [CONTAINER] AdGuard Home container running on MikroTik (172.17.0.2)
- [CONTAINER] Tailscale container configured (172.17.0.3)
- [CONTAINER] Fixed Tailscale container authentication
- [CONTAINER] Container bridge (containers-br) with NAT
Network
- [NETWORK] Enabled CSS326 SFP1 port - 10G backbone link to ZX1 now active
Documentation
- [DOCS] Created 02-PORT-UTILIZATION.md with ASCII port diagrams
- [DOCS] Fixed ZX1 switch IP: 192.168.31.22 (was incorrectly documented as .7)
Incident
- [INCIDENT] DNS outage after MikroTik restart - multiple root causes fixed:
- NAT rules blocking AdGuard outbound DNS (added exception rules)
- DHCP pushing wrong DNS (8.8.8.8 → 192.168.31.1)
- NAT redirect pointing to wrong IP/port (172.17.0.5:5355 → 192.168.31.4:53)
- Asymmetric routing (added srcnat masquerade for DNS redirect)
- [SERVICE] Removed MikroTik AdGuard Home container (storage/overlay errors)
- [SERVICE] Removed MikroTik Tailscale container (root directory missing)
- [SERVICE] Removed Pi-hole/Unbound leftovers from MikroTik (veth, mounts, envs)
- [NETWORK] Consolidated DNS architecture: MikroTik → Unraid AdGuard (192.168.31.4) only
- [DOCS] Created incident reports in docs/incidents/
- [DOCS] Restructured documentation - consolidated into 5 core docs + archive
- [NETBOX] Added shelf devices for rack organization (U9, U7, U3)
2026-01-24
- [NETBOX] Standardized device names to NetBox convention (HAP1, CSS1, ZX1)
- [DOCS] Created NETWORK-PHYSICAL-MAP.md with complete port maps
2026-01-23
- [SERVICE] Deployed Diode network discovery stack
- [SERVICE] Removed Slurp'it (replaced by Diode + NetDisco)
- [SERVICE] Consolidated NetBox Redis to shared instance
- [SERVICE] Removed redundant DNS services (Unbound, DoH-Server, stunnel-dot)
2026-01-22
- [SERVICE] Migrated NetBox to shared PostgreSQL 17
- [SERVICE] Deployed AdGuard Home on MikroTik (primary DNS)
- [SERVICE] Deployed AdGuard Home on Unraid (secondary DNS)
- [SERVICE] Removed Pi-hole (replaced by AdGuard Home)
- [DOCS] Created INFRASTRUCTURE-DIAGRAM.md
2026-01-21
- [BACKUP] Configured Rclone sync to Google Drive
2026-01-19
- [SERVICE] Deployed NetBox IPAM/DCIM
- [SERVICE] Deployed NetDisco network discovery
- [NETWORK] Enabled SNMP on all MikroTik devices
2026-01-18
- [SERVICE] Deployed Gitea git server
- [SERVICE] Deployed Woodpecker CI
- [NETWORK] Configured CAPsMAN on HAP1
- [WIRELESS] CAP added to CAPsMAN management
2026-01-17
- [SERVICE] Deployed Portainer CE
Previous History
For detailed history before 2026-01-17, see archived changelogs in archive/.
Format Guide
### YYYY-MM-DD
- **[CATEGORY]** Brief description
Categories:
- [DEVICE] - Hardware added/removed/changed
- [SERVICE] - Container/service deployed/removed
- [NETWORK] - Network topology/config changes
- [WIRELESS] - WiFi/CAPsMAN changes
- [BACKUP] - Backup configuration
- [DOCS] - Major documentation changes
- [INCIDENT] - Outages and fixes
- [VLAN] - VLAN configuration changes
- [DOCKER] - Docker maintenance