Files
infrastructure/docs/CHANGELOG.md
Kaloyan Danchev ec9659d0cb
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Restructure docs: archive VLAN migration, update IPs to VLAN 10
Major documentation cleanup after VLAN migration completion:
- Archive 12 VLAN project docs to archive/vlan-migration/
- Archive 5 done WIP docs (VLAN proposals, AI stack, Fossorial, DNS backup)
- Create standing reference docs 08-DNS-ARCHITECTURE and 09-TAILSCALE-VPN
- Renumber docs to clean 01-09 sequence with merged CHANGELOG
- Update all active docs from stale 192.168.31.x to current VLAN 10 IPs
- Fix CSS1 (.10.9→.10.3) and ZX1 (.10.7→.10.4) IPs in hardware inventory
- Clean 06-VLAN-DEVICE-ASSIGNMENT: remove migration columns/sections, fix VLAN 25 subnet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 12:45:16 +02:00

229 lines
8.8 KiB
Markdown

# Infrastructure Changelog
**Purpose:** Major infrastructure events only. Minor changes are in git commit messages.
---
## 2026-02-06
### Documentation Restructure
- **[DOCS]** Restructured docs/ from 23 files to clean 9-doc structure
- **[DOCS]** Archived 12 completed VLAN migration project docs to archive/vlan-migration/
- **[DOCS]** Archived 5 done/superseded WIP docs (VLAN proposals, AI stack, Fossorial, DNS backup)
- **[DOCS]** Created standing reference docs: 08-DNS-ARCHITECTURE.md, 09-TAILSCALE-VPN.md
- **[DOCS]** Renamed docs to clean numbering (05-PORT-UTILIZATION, 06-VLAN-DEVICE-ASSIGNMENT, 07-WIFI-CAPSMAN-CONFIG)
- **[DOCS]** Merged 00-CHANGELOG.md + 06-CHANGELOG.md → CHANGELOG.md
- **[DOCS]** Updated all core docs with current VLAN IPs (192.168.31.x → 192.168.10.x)
- **[DOCS]** Fixed CSS1 IP: 192.168.10.9 → 192.168.10.3, ZX1 IP: 192.168.10.7 → 192.168.10.4
- **[DOCS]** Cleaned 06-VLAN-DEVICE-ASSIGNMENT.md: removed migration-era columns and sections, fixed VLAN 25 subnet
- **[DOCS]** Updated README.md, CLAUDE.md, archive/README.md, wip/README.md
---
## 2026-02-01
### WIP Documentation
- **[DOCS]** Added KVM-SWITCH-MAC-NOBARA.md - Software KVM for Mac/Nobara switching
- DDC/CI monitor control (Dell U3821DW) + HID++ Logitech peripheral switching
- Scripts created on Mac at ~/scripts/
---
## 2026-01-31
### Docker Cleanup
- **[DOCKER]** Removed 18 unused images (~4.9 GB reclaimed)
- **[DOCKER]** Removed 12 dangling images (old builds, untagged)
- **[DOCKER]** Removed Slurpit stack images (warehouse, portal, scanner, scraper)
- **[DOCKER]** Removed unused MongoDB 8 and MariaDB 11 images
- **[DOCKER]** Removed 35 orphaned volumes (~1.15 GB reclaimed)
- **[DOCKER]** Removed 28 anonymous dangling volumes
- **[DOCKER]** Removed 6 nextcloud_aio_* volumes (from old AIO install)
- **[DOCKER]** Removed orphaned redis-data volume
- **[DOCKER]** **Total reclaimed: ~6 GB**
### Kept (Stopped Containers)
- open-webui, ollama (AI stack - for future use)
- pgAdmin4 (database management)
- diode-hydra-migrate, diode-auth-bootstrap (one-time migration jobs)
---
## 2026-01-27
### VLAN Filtering Rolled Back
- **[VLAN]** Enabled VLAN filtering - caused connectivity issues
- **[VLAN]** ZX1 switch unreachable after activation (no management IP responding)
- **[VLAN]** CSS326 traffic routing through ZX1 (not direct eth3 link)
- **[VLAN]** **Rolled back** - VLAN filtering disabled
- **[CONFIG]** Added eth4 (ZX1) to all VLAN tagged lists for future use
- **[STATUS]** Network back to Legacy mode (192.168.31.0/24)
- **[TODO]** Need physical access to ZX1 to configure VLAN trunking
### Issues Identified
- ZX1 switch not responding on documented IP 192.168.31.22
- ZX1 may need VLAN trunk configuration before re-enabling filtering
- All CSS326 traffic goes via ZX1→HAP1, not direct CSS326→HAP1 link (STP?)
---
## 2026-01-26
### VLAN Filtering Activated
- **[VLAN]** VLAN filtering enabled on MikroTik bridge - SUCCESSFUL
- **[VLAN]** Internet connectivity verified (ping 1.1.1.1, google.com)
- **[VLAN]** DNS resolution working through AdGuard
- **[VLAN]** All previous fixes (DHCP DNS, firewall, NAT masquerade) working correctly
- **[STATUS]** Network segmentation now ACTIVE
### Local AI Stack Deployed
- **[AI]** Deployed Ollama container with Intel GPU passthrough
- **[AI]** Deployed Open WebUI at http://192.168.31.2:3080
- **[AI]** Installed qwen2.5-coder:7b base model
- **[AI]** Created custom `unraid-assistant` model with infrastructure knowledge
- **[AI]** Created `/usr/local/bin/ai` terminal helper command
- **[AI]** Stopped non-critical containers for RAM: karakeep, unimus, homarr, netdisco-*
### VLAN Activation Attempt & Fixes
- **[VLAN]** Configured CSS326 switch VLANs via SwOS web interface
- **[VLAN]** Enabled VLAN filtering on MikroTik - caused internet outage
- **[VLAN]** Rolled back VLAN filtering to restore connectivity
- **[VLAN]** **ROOT CAUSE IDENTIFIED:** Multiple configuration issues
### Issues Fixed
- **[FIX]** DHCP DNS now points to each VLAN gateway instead of legacy 192.168.31.1
- **[FIX]** Added DNS redirect rules for all VLANs (src-address-list=all-vlans)
- **[FIX]** Added all VLAN interfaces to LAN firewall interface list
- **[FIX]** Added NAT masquerade rules for VLAN traffic to AdGuard container
- **[BACKUP]** MikroTik config saved before activation attempt
---
## 2026-01-25
### VLAN Phase 1 Complete
- **[VLAN]** Added VLAN 25 (Kids) - interface, IP, DHCP server, pool, bridge entry
- **[VLAN]** Fixed VLAN 10 (Management) leases - correct IPs per device assignment doc
- **[VLAN]** Fixed VLAN 30 (IoT) leases - all 14 devices with correct IPs
- **[VLAN]** Added VLAN 25 (Kids) leases - 6 devices including XTRM-Ally
- **[VLAN]** Added VLAN 50 (Guest) leases - 7 unknown devices
- **[VLAN]** Added firewall rules for VLAN 25 (Kids → IoT, Legacy, DNS)
- **[VLAN]** Total devices configured: 44
### VLAN Implementation (Prepared)
- **[VLAN]** Created 6 VLANs on MikroTik bridge (10, 20, 30, 35, 40, 50)
- **[VLAN]** Configured IP addresses for all VLAN interfaces
- **[VLAN]** Created DHCP servers and pools for each VLAN
- **[VLAN]** Added static DHCP leases mapping MACs to VLAN IPs
- **[VLAN]** Configured bridge VLAN table with tagged/untagged ports
- **[VLAN]** Set WiFi ports PVID=20 (Trusted VLAN default)
- **[VLAN]** Added inter-VLAN firewall rules (active)
- **[VLAN]** VLAN filtering NOT YET ENABLED (pending CSS326 switch config)
- **[DOCS]** Added docs/11-VLAN-IMPLEMENTATION.md
- **[SCRIPTS]** Added scripts/mikrotik-vlan-setup.rsc and mikrotik-vlan-enable.rsc
### DNS Configuration
- **[DNS]** Updated both AdGuard instances to use Quad9 DoH
- **[DNS]** Bootstrap DNS: 9.9.9.9, 149.112.112.112
### MikroTik Containers
- **[CONTAINER]** AdGuard Home container running on MikroTik (172.17.0.2)
- **[CONTAINER]** Tailscale container configured (172.17.0.3)
- **[CONTAINER]** Fixed Tailscale container authentication
- **[CONTAINER]** Container bridge (containers-br) with NAT
### Network
- **[NETWORK]** Enabled CSS326 SFP1 port - 10G backbone link to ZX1 now active
### Documentation
- **[DOCS]** Created 02-PORT-UTILIZATION.md with ASCII port diagrams
- **[DOCS]** Fixed ZX1 switch IP: 192.168.31.22 (was incorrectly documented as .7)
### Incident
- **[INCIDENT]** DNS outage after MikroTik restart - multiple root causes fixed:
- NAT rules blocking AdGuard outbound DNS (added exception rules)
- DHCP pushing wrong DNS (8.8.8.8 → 192.168.31.1)
- NAT redirect pointing to wrong IP/port (172.17.0.5:5355 → 192.168.31.4:53)
- Asymmetric routing (added srcnat masquerade for DNS redirect)
- **[SERVICE]** Removed MikroTik AdGuard Home container (storage/overlay errors)
- **[SERVICE]** Removed MikroTik Tailscale container (root directory missing)
- **[SERVICE]** Removed Pi-hole/Unbound leftovers from MikroTik (veth, mounts, envs)
- **[NETWORK]** Consolidated DNS architecture: MikroTik → Unraid AdGuard (192.168.31.4) only
- **[DOCS]** Created incident reports in docs/incidents/
- **[DOCS]** Restructured documentation - consolidated into 5 core docs + archive
- **[NETBOX]** Added shelf devices for rack organization (U9, U7, U3)
---
## 2026-01-24
- **[NETBOX]** Standardized device names to NetBox convention (HAP1, CSS1, ZX1)
- **[DOCS]** Created NETWORK-PHYSICAL-MAP.md with complete port maps
---
## 2026-01-23
- **[SERVICE]** Deployed Diode network discovery stack
- **[SERVICE]** Removed Slurp'it (replaced by Diode + NetDisco)
- **[SERVICE]** Consolidated NetBox Redis to shared instance
- **[SERVICE]** Removed redundant DNS services (Unbound, DoH-Server, stunnel-dot)
---
## 2026-01-22
- **[SERVICE]** Migrated NetBox to shared PostgreSQL 17
- **[SERVICE]** Deployed AdGuard Home on MikroTik (primary DNS)
- **[SERVICE]** Deployed AdGuard Home on Unraid (secondary DNS)
- **[SERVICE]** Removed Pi-hole (replaced by AdGuard Home)
- **[DOCS]** Created INFRASTRUCTURE-DIAGRAM.md
---
## 2026-01-21
- **[BACKUP]** Configured Rclone sync to Google Drive
---
## 2026-01-19
- **[SERVICE]** Deployed NetBox IPAM/DCIM
- **[SERVICE]** Deployed NetDisco network discovery
- **[NETWORK]** Enabled SNMP on all MikroTik devices
---
## 2026-01-18
- **[SERVICE]** Deployed Gitea git server
- **[SERVICE]** Deployed Woodpecker CI
- **[NETWORK]** Configured CAPsMAN on HAP1
- **[WIRELESS]** CAP added to CAPsMAN management
---
## 2026-01-17
- **[SERVICE]** Deployed Portainer CE
---
## Previous History
For detailed history before 2026-01-17, see archived changelogs in `archive/`.
---
## Format Guide
```markdown
### YYYY-MM-DD
- **[CATEGORY]** Brief description
Categories:
- [DEVICE] - Hardware added/removed/changed
- [SERVICE] - Container/service deployed/removed
- [NETWORK] - Network topology/config changes
- [WIRELESS] - WiFi/CAPsMAN changes
- [BACKUP] - Backup configuration
- [DOCS] - Major documentation changes
- [INCIDENT] - Outages and fixes
- [VLAN] - VLAN configuration changes
- [DOCKER] - Docker maintenance
```