Compare commits

...

10 Commits

Author SHA1 Message Date
Kaloyan Danchev
6320c0f8d9 Docs: Claude Code tooling setup on Unraid — Cooperator, glab, skills, MCP prep
Installed Cooperator CLI, glab, uv+Python 3.12, 6 custom skills,
and built MCP servers (shortcut, mikrotik, unraid). MCP registration
via `claude mcp add` still pending as TODO.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:13:44 +02:00
jazzymc
8aef54992a Docker audit: migrate all containers to Dockge, clean up Traefik config
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2026-02-28 20:39:16 +02:00
Kaloyan Danchev
7867b5c950 WiFi VLAN fixes, CAP bridge filtering, AdGuard IP conflicts, channel optimization
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Enable bridge VLAN filtering on CAP for proper per-client VLAN assignment
- Fix AdGuard container IP conflicts (.2→.10, .3→.11) with static IPs
- Fix 2.4GHz co-channel interference (both APs were on ch 1, CAP now ch 6)
- Fix 5GHz overlap (HAP ch 36/5180, CAP moved to ch 52/5260)
- Update WiFi access-list: VLAN assignment now active with per-device VLAN IDs
- Add Xiaomi Air Purifier MC1 to VLAN 30 access-list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 09:40:29 +02:00
Kaloyan Danchev
cdb961f943 Post-migration container cleanup: fix broken services, remove obsolete containers
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Fixed Traefik networking (stale Docker bridge), adguardhome-sync config,
diode stack (Hydra DB + OAuth2 bootstrap), diode-agent auth. Removed 5
deprecated/duplicate containers. Started unmarr + rustfs stacks. 53
containers now running.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 17:30:15 +02:00
Kaloyan Danchev
877aa71d3e Update docs: motherboard swap, NVMe cache pool, Docker migration
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- New motherboard installed, MAC/DHCP updated
- 3x Samsung 990 EVO Plus 1TB NVMe cache pool (ZFS RAIDZ1)
- Docker migrated from HDD loopback to NVMe ZFS storage driver
- disk1 confirmed dead (clicking heads), still on parity emulation
- Hardware inventory, changelog, and incident report updated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 14:47:07 +02:00
Kaloyan Danchev
bf6a62a275 Add incident report: disk1 hardware failure (clicking/head crash)
HGST Ultrastar 10TB drive (serial 2TKK3K1D) failed on Feb 18.
Array running degraded on parity emulation. Recovery plan documented.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 17:54:23 +02:00
Kaloyan Danchev
0119c4d4d8 docs: add Minecraft server, WiFi DHCP fix to changelog
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- Added Minecraft Java server section to 03-SERVICES-OTHER.md
- Documented WiFi DHCP fix (VLAN 40 tagged→untagged on wifi1/wifi2)
- Documented Minecraft deployment and hairpin NAT setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 22:16:32 +02:00
Kaloyan Danchev
2a522d56d2 docs: update configs after CAP recovery and roms share setup
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- 07-WIFI-CAPSMAN: CAP both radios working, access list no VLAN assignment
- 01-NETWORK-MAP: fix CAP IP .6→.2, add Nobara and SMB shares section
- 04-HARDWARE-INVENTORY: CAP SSH/version details, add Recalbox device
- 06-VLAN-DEVICE-ASSIGNMENT: add Nobara (VLAN 10), Recalbox (VLAN 25)
- 03-SERVICES-OTHER: add Roms SMB share section with mount details
- CHANGELOG: add 2026-02-14 entries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 16:50:01 +02:00
Kaloyan Danchev
4e726a4963 Add cross-VLAN casting docs, update device assignments
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
- New doc: 11-CROSS-VLAN-CASTING.md with full MikroTik config
  (firewall rules, FastTrack exclusion, mDNS, IGMP proxy,
  AirPlay/Chromecast troubleshooting)
- Update device IPs: LG TV .40/.41, Chromecast .42
- Move HP printer from VLAN 40 to VLAN 30 at .30

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:28:55 +02:00
Kaloyan Danchev
ecbce1ca94 Add VRRP failover infrastructure documentation (Nobara)
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Deployed automatic failover for critical services (Traefik, Vaultwarden,
Authentik, AdGuard) from Unraid to Nobara workstation via Keepalived VRRP
with VIP 192.168.10.250. ~4 second failover time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 18:03:26 +02:00
14 changed files with 1223 additions and 74 deletions

View File

@@ -7,6 +7,17 @@ When user says "connect unraid", use this command:
ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422
```
## Connect to Nobara (Failover Node)
```bash
ssh nobara
# or: ssh -i ~/.ssh/id_ed25519_nobara jazzymc@192.168.10.103
# sudo password: (same as SSH login)
```
Failover stack: `/home/failover/docker-compose.yml`
Keepalived: `systemctl status keepalived`
## Connect to MikroTik HAP ax³
SSH port is **2222** (not 22):
@@ -56,6 +67,7 @@ infrastructure/
├── 07-WIFI-CAPSMAN-CONFIG.md # WiFi and CAPsMAN settings
├── 08-DNS-ARCHITECTURE.md # DNS failover architecture
├── 09-TAILSCALE-VPN.md # Tailscale VPN setup
├── 10-FAILOVER-NOBARA.md # VRRP failover to Nobara
├── CHANGELOG.md # Change history
├── archive/ # Completed/legacy docs
│ └── vlan-migration/ # VLAN migration project artifacts

View File

@@ -15,6 +15,7 @@
| **CI/CD** | https://ci.xtrm-lab.org |
| **DNS Primary** | dns.xtrm-lab.org |
| **DNS Secondary** | dns2.xtrm-lab.org |
| **Failover VIP** | 192.168.10.250 |
---
@@ -31,6 +32,7 @@ docs/
├── 07-WIFI-CAPSMAN-CONFIG.md # WiFi and CAPsMAN settings
├── 08-DNS-ARCHITECTURE.md # DNS failover architecture
├── 09-TAILSCALE-VPN.md # Tailscale VPN setup
├── 10-FAILOVER-NOBARA.md # VRRP failover to Nobara workstation
├── CHANGELOG.md # Change history
├── archive/ # Completed/legacy docs
│ └── vlan-migration/ # VLAN migration project artifacts
@@ -46,6 +48,7 @@ docs/
|--------|-----|------|
| HAP1 | 192.168.10.1 | Router, DNS, WiFi Controller |
| XTRM-U | 192.168.10.20 | Production Server (Unraid) |
| XTRM-Nobara | 192.168.10.103 | Failover Node (Nobara Linux) |
| CSS1 | 192.168.10.3 | Distribution Switch |
| ZX1 | 192.168.10.4 | Core Switch (2.5G) |
| CAP | 192.168.10.6 | Wireless Access Point |
@@ -60,6 +63,9 @@ ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422
# MikroTik Router
ssh -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.1
# Nobara (failover node)
ssh nobara
```
---
@@ -69,7 +75,8 @@ ssh -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.1
1. **DNS down?** → Automatic failover to 192.168.10.10 (secondary), see `08-DNS-ARCHITECTURE.md`
2. **Internet down?** → Check HAP1 at 192.168.10.1
3. **Services down?** → Check Unraid at 192.168.10.20
4. **Full outage?** → See `02-SERVICES-CRITICAL.md` startup order
4. **Unraid maintenance?** → VRRP failover to Nobara (192.168.10.250 VIP), see `10-FAILOVER-NOBARA.md`
5. **Full outage?** → See `02-SERVICES-CRITICAL.md` startup order
---

View File

@@ -1,6 +1,6 @@
# Network Map - xtrm-lab.org
**Last Updated:** 2026-02-06
**Last Updated:** 2026-02-14
**Domain:** xtrm-lab.org
**WAN IP:** 62.73.120.142
@@ -39,7 +39,7 @@ flowchart TB
end
subgraph Wireless["WiFi"]
CAP["CAP | cAP XL ac<br/>192.168.10.6"]
CAP["CAP | cAP XL ac<br/>192.168.10.2"]
end
ISP -->|"ether1 WAN"| HAP1
@@ -116,9 +116,10 @@ flowchart TB
| 192.168.10.1 | HAP1 \| hAP ax³ | Router |
| 192.168.10.3 | CSS1 \| CSS326-24G-2S+ | Switch |
| 192.168.10.4 | ZX1 \| ZX-SWTGW218AS | Switch |
| 192.168.10.6 | CAP \| cAP XL ac | Access Point |
| 192.168.10.2 | CAP \| cAP XL ac | Access Point |
| 192.168.10.10 | AdGuard Home (Unraid macvlan) | DNS Secondary |
| 192.168.10.20 | XTRM-U | Server |
| 192.168.10.103 | XTRM-Nobara | Failover Node |
| 192.168.10.200 | NanoKVM | Remote KVM |
For complete device-to-VLAN mapping, see `06-VLAN-DEVICE-ASSIGNMENT.md`.
@@ -301,10 +302,9 @@ flowchart TB
| SSID | Band | Security | Purpose |
|------|------|----------|---------|
| XTRM | 5GHz | WPA2/WPA3 | Primary devices |
| XTRM | 2.4GHz | WPA/WPA2 | Legacy support |
| XTRM2 | 2.4GHz | WPA/WPA2 | IoT devices |
**CAPsMAN:** HAP1 manages CAP access point
**CAPsMAN:** HAP1 manages CAP XL ac (192.168.10.2) - both 2.4GHz and 5GHz radios active
---
@@ -356,6 +356,14 @@ flowchart TB
---
## SMB Shares
| Share | Path | Size | Access | Consumers |
|-------|------|------|--------|-----------|
| roms | /mnt/user/roms | 2.3 TB | Guest (read-only) | Nobara (/mnt/roms), Recalbox (network mount) |
---
## Shared Databases
### PostgreSQL 17 (172.18.0.13)

View File

@@ -204,6 +204,25 @@ When recovering from full outage:
---
## Active Failover: XTRM-Nobara
Critical services are replicated on the Nobara workstation with automatic VRRP failover:
| Service | Primary (XTRM-U) | Failover (XTRM-Nobara) |
|---------|-------------------|------------------------|
| Traefik | 192.168.10.20 | 192.168.10.103 |
| Vaultwarden | 192.168.10.20 | 192.168.10.103 |
| Authentik | 192.168.10.20 | 192.168.10.103 |
| AdGuard Home | 192.168.10.20 | 192.168.10.103 |
**VIP:** 192.168.10.250 (floats between XTRM-U and XTRM-Nobara via Keepalived VRRP)
**Failover time:** ~4 seconds
See: `10-FAILOVER-NOBARA.md` for full documentation.
---
## Future: XTRM-N1 Survival Node
When hardware upgrade completes, these services will have replicas on XTRM-N1:

View File

@@ -1,6 +1,6 @@
# Other Services
**Last Updated:** 2026-02-06
**Last Updated:** 2026-02-24
Non-critical services that enhance functionality but don't affect core network operation.
@@ -104,6 +104,26 @@ Non-critical services that enhance functionality but don't affect core network o
---
## Gaming
### Minecraft Server
| Component | IP | Port | Address |
|-----------|-----|------|---------|
| minecraft | 172.18.0.80 | 25565 | minecraft.xtrm-lab.org |
**Image:** itzg/minecraft-server (Java Edition)
**Version:** Latest (1.21.11)
**Mode:** Survival, Normal difficulty, PVP enabled
**Max Players:** 10
**RAM:** 2 GB
**Online Mode:** Yes (requires paid account)
**Data:** `/mnt/user/appdata/minecraft/data`
**NAT:** WAN:25565 → 192.168.10.20:25565 + hairpin NAT
**Dockge Stack:** `minecraft`
---
## Media
### Plex
@@ -130,6 +150,23 @@ Non-critical services that enhance functionality but don't affect core network o
**Purpose:** Torrent client
### Roms (SMB Share)
| Property | Value |
|----------|-------|
| Share Path | /mnt/user/roms |
| Protocol | SMB (guest access, read-only) |
| Size | 2.3 TB (49 systems) |
**Consumers:**
| Device | Mount Point | Method |
|--------|-------------|--------|
| Nobara | /mnt/roms | fstab (CIFS, guest, systemd.automount) |
| Recalbox | /recalbox/share/roms_network | custom.sh boot script (CIFS) |
**Recalbox:** Network roms are bind-mounted over local rom directories at boot via `/recalbox/share/system/custom.sh`. Local roms were deleted from SD card to save space.
---
## Productivity
@@ -263,3 +300,8 @@ Non-critical services that enhance functionality but don't affect core network o
| Pi-hole | Replaced by AdGuard Home | Removed |
| Pangolin | Not in use | Removed |
| Slurp'it | Replaced by Diode | Removed |
| binhex-plexpass | Duplicate of Plex | Removed |
| HomeAssistant_inabox | Duplicate of Home-Assistant-Container | Removed |
| Docker-WebUI | Unused, non-functional | Removed |
| hass-unraid | No config, unused | Removed |
| nextcloud-aio-mastercontainer | Replaced by Nextcloud container | Removed |

View File

@@ -1,6 +1,6 @@
# Hardware Inventory
**Last Updated:** 2026-01-31
**Last Updated:** 2026-02-24
---
@@ -75,12 +75,15 @@
|----------|-------|
| **Role** | Wireless Access Point |
| **Location** | Corridor (ceiling) |
| **IP** | 192.168.10.6 |
| **IP** | 192.168.10.2 |
| **MAC** | 18:FD:74:54:3D:BC |
| **OS** | RouterOS 7.x |
| **OS** | RouterOS 7.21.1 |
| **Serial** | HCT085KBH8B |
| **SSH** | `ssh -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.2` |
**Managed by:** HAP1 CAPsMAN
**Radios:** wifi1 (2.4GHz XTRM2), wifi2 (5GHz XTRM) - both active
**Factory reset:** 2026-02-13 (CAPsMAN certificate regenerated)
---
@@ -106,18 +109,27 @@
| **IP** | 192.168.10.20 |
| **OS** | Unraid 6.x |
**Motherboard:** Replaced 2026-02-24 (new board, details TBD)
**Network:**
| Interface | MAC | Speed |
|-----------|-----|-------|
| eth1 | A8:B8:E0:02:B6:15 | 2.5G |
| eth2 | A8:B8:E0:02:B6:16 | 2.5G |
| eth3 | A8:B8:E0:02:B6:17 | 2.5G |
| eth4 | A8:B8:E0:02:B6:18 | 2.5G |
| **bond0** | (virtual) | 5G aggregate |
| br0 | 38:05:25:35:8E:7A | 2.5G |
**Storage:**
- Cache: (current NVMe)
- Array: 3.5" HDDs
**Storage:**
| Device | Model | Size | Role | Status |
|--------|-------|------|------|--------|
| sdb | HUH721010ALE601 (serial 7PHBNYZC) | 10TB | Parity | OK |
| disk1 | HUH721010ALE601 (serial 2TKK3K1D) | 10TB | Data (ZFS) | **FAILED** — clicking/head crash, emulated from parity |
| nvme0n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK |
| nvme1n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK |
| nvme2n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK |
**ZFS Pools:**
| Pool | Devices | Profile | Usable | Purpose |
|------|---------|---------|--------|---------|
| disk1 | md1p1 (parity-emulated) | single | 9.1TB | Main data (roms, media, appdata, backups) |
| cache | 3x Samsung 990 EVO Plus 1TB NVMe | RAIDZ1 | ~1.8TB | Docker, containers |
**Virtual IPs:**
| IP | Purpose |
@@ -160,16 +172,56 @@
---
## Workstations
### XTRM-Nobara | Nobara Linux Workstation
| Property | Value |
|----------|-------|
| **Role** | Workstation + Failover Node |
| **Location** | Main Bedroom |
| **IP** | 192.168.10.103 |
| **MAC** | 08:92:04:C6:07:C5 |
| **OS** | Nobara Linux (Fedora 43 based) |
| **CPU** | AMD Ryzen 9 6900HX (8C/16T) |
| **RAM** | 16 GB |
| **Storage** | 477GB NVMe (OS) + 1.8TB NVMe (btrfs pool with OS drive) |
| **Network** | enp5s0 (2.5G Ethernet) |
| **Switch Port** | CSS1-20 via PP1 M2 |
| **SSH** | `ssh nobara` (key: ~/.ssh/id_ed25519_nobara) |
**Failover Services:** Traefik, Vaultwarden, Authentik, AdGuard Home
**Keepalived:** systemd service, BACKUP priority 100, VIP 192.168.10.250
---
## End Devices (Wired)
| Device | Room | Outlet | Switch Port | MAC |
|--------|------|--------|-------------|-----|
| LGTV | Living Room | L3 | CSS1-24 | - |
| XTRM-Nobara | Main Bedroom | M2 | CSS1-20 | 08:92:04:C6:07:C5 |
| Dell Display | Main Bedroom | M3 | CSS1-21 | - |
| Dancho | Boys Room | B1 | CSS1-18 | - |
| KVM Switch | - | Direct | CSS1-2 | - |
## End Devices (WiFi)
### Recalbox | Raspberry Pi 3
| Property | Value |
|----------|-------|
| **Role** | Retro Gaming Console |
| **Location** | Living Room |
| **IP** | 192.168.25.30 |
| **MAC** | B8:27:EB:32:B2:13 |
| **OS** | Recalbox |
| **VLAN** | 25 (Kids) |
| **SSID** | XTRM2 (2.4GHz) |
| **SSH** | `ssh root@192.168.25.30` (password: `recalboxroot`) |
**Roms:** Network-mounted from Unraid SMB share (//192.168.10.20/roms)
**Boot script:** `/recalbox/share/system/custom.sh` (mounts roms at boot)
---
## Future Hardware (Planned)
@@ -180,6 +232,7 @@ See: `wip/UPGRADE-2026-HARDWARE.md`
|--------|------|--------|
| XTRM-N5 (Minisforum N5 Air) | Production server | Planned |
| XTRM-N1 (N100 ITX) | Survival node | Planned |
| 3x Samsung 990 EVO Plus 1TB | XTRM-N5 NVMe pool | Planned |
| 3x Samsung 990 EVO Plus 1TB | XTRM-U cache pool (RAIDZ1) | **Installed** 2026-02-24 |
| 2x Fikwot FX501Pro 512GB | XTRM-N1 mirror | Planned |
| 1x 10TB+ HDD | Replace failed disk1 | **Needed** |
| MikroTik CRS310-8G+2S+IN | Replace ZX1 | Future |

View File

@@ -1,6 +1,6 @@
# VLAN Device Assignment Map
**Last Updated:** 2026-02-06
**Last Updated:** 2026-02-14
**Purpose:** Complete inventory of all network devices with VLAN assignments
---
@@ -29,6 +29,7 @@
| 192.168.10.3 | F4:1E:57:C9:BD:09 | CSS326-24G-2S+ | 24-port switch | Room distribution |
| 192.168.10.4 | 1C:2A:A3:1E:78:67 | ZX1 (ZX-SWTGW218AS) | 8-port 2.5G switch | Server rack |
| 192.168.10.20 | A8:B8:E0:02:B6:15 | XTRM-U (Unraid) | Main server | Docker host, NAS |
| 192.168.10.103 | 08:92:04:C6:07:C5 | XTRM-Nobara | Failover node | Keepalived BACKUP |
| 192.168.10.200 | 48:DA:35:6F:BE:50 | NanoKVM | Remote KVM | IPMI alternative |
| 172.17.0.2 | 46:D0:27:F7:1F:CA | AdGuard (MikroTik) | DNS (Router) | Primary DNS, DoH/DoT |
| 172.17.0.3 | 0C:AB:39:8D:8C:FC | Tailscale (MikroTik) | VPN container | Remote access |
@@ -59,6 +60,7 @@
| 192.168.25.14 | 90:91:64:70:0D:86 | Notebook | Kimi | |
| 192.168.25.15 | 2A:2B:BA:86:D4:AF | iPhone | Kimi | |
| 192.168.25.18 | A4:D1:D2:7B:52:BE | iPad | Compusbg | Work tablet |
| 192.168.25.30 | B8:27:EB:32:B2:13 | Recalbox (RPi3) | Gaming | Retro gaming, WiFi XTRM2 |
---
@@ -67,9 +69,10 @@
| IP | MAC Address | Device | Location | Comment |
|----|-------------|--------|----------|---------|
| 192.168.30.10 | 50:2C:C6:7A:55:39 | Air Conditioner | Living Room | GREE Electric |
| 192.168.30.11 | B0:37:95:79:AF:9B | LG TV | Living Room | LAN (not connected) |
| 192.168.30.12 | DC:03:98:6B:5A:3A | LG TV | Living Room | WiFi (active) |
| 192.168.30.13 | D0:E7:82:F7:65:DD | Chromecast | Living Room | Streaming |
| 192.168.30.30 | 64:4E:D7:D8:43:3E | HP LaserJet M110w | Office | WiFi printer |
| 192.168.30.40 | B0:37:95:79:AF:9B | LG TV (Ethernet) | Living Room | Use ONE interface only for AirPlay |
| 192.168.30.41 | DC:03:98:6B:5A:3A | LG TV (WiFi) | Living Room | Use ONE interface only for AirPlay |
| 192.168.30.42 | D0:E7:82:F7:65:DD | Chromecast | Living Room | Requires WPA2+AES (no TKIP) |
| 192.168.30.14 | B0:4A:39:3F:9A:14 | Roborock S7 Vacuum | Living Room | Needs cloud access |
| 192.168.30.20 | 94:27:70:1E:0C:EE | Bosch Smart Oven | Kitchen | Home Connect app |
| 192.168.30.21 | C8:D7:78:40:65:40 | Bosch Dishwasher | Kitchen | Home Connect app |
@@ -95,7 +98,7 @@
| IP | MAC Address | Device | Purpose | Comment |
|----|-------------|--------|---------|---------|
| 192.168.40.19 | 64:4E:D7:D8:43:3E | HP LaserJet | Network printer | Wired connection |
| — | — | — | — | Printer moved to VLAN 30 |
---
@@ -123,6 +126,7 @@ A8:B8:E0:02:B6:15 XTRM-U
F4:1E:57:C9:BD:09 CSS326
1C:2A:A3:1E:78:67 ZX1
48:DA:35:6F:BE:50 NanoKVM
08:92:04:C6:07:C5 XTRM-Nobara (Failover)
```
**VLAN 20 - Trusted:**
@@ -140,7 +144,8 @@ A4:D1:D2:7B:52:BE Compusbg iPad
**VLAN 30 - IoT:**
```
B0:37:95:79:AF:9B LG TV (LAN)
64:4E:D7:D8:43:3E HP LaserJet M110w
B0:37:95:79:AF:9B LG TV (Ethernet)
DC:03:98:6B:5A:3A LG TV (WiFi)
D0:E7:82:F7:65:DD Chromecast
B0:4A:39:3F:9A:14 Roborock Vacuum
@@ -163,7 +168,7 @@ FC:D5:D9:EB:6A:82 Settop Box (LAN)
**VLAN 40 - Servers:**
```
64:4E:D7:D8:43:3E HP LaserJet
(empty - printer moved to VLAN 30)
```
**VLAN 50 - Guest:**
@@ -180,14 +185,14 @@ D0:C9:07:8C:C9:46 Private Vendor 2
| VLAN | Device Count | Comment |
|------|--------------|---------|
| 10 - Mgmt | 9 | Infrastructure only |
| 10 - Mgmt | 10 | Infrastructure + failover |
| 20 - Trusted | 9 | Family devices |
| 25 - Kids | 4 | Kids devices (subset of 20) |
| 25 - Kids | 5 | Kids devices + Recalbox |
| 30 - IoT | 14 | Smart home devices |
| 35 - Cameras | 1 | Security |
| 40 - Servers | 1 | Services |
| 50 - Guest | 4 | Unknown/unidentified devices |
| **Total** | **38** | All devices categorized |
| **Total** | **40** | All devices categorized |
---

View File

@@ -1,6 +1,6 @@
# WiFi and CAPsMAN Configuration
**Last Updated:** 2026-02-02
**Last Updated:** 2026-02-26
**Purpose:** Document WiFi network settings, CAPsMAN configuration, and device compatibility requirements
---
@@ -23,8 +23,8 @@
| SSID | XTRM |
| Band | 5GHz |
| Mode | 802.11ax (WiFi 6) |
| Channel | Auto (DFS enabled) |
| Width | 80MHz |
| Channel | 5180 MHz (ch 36) |
| Width | 40MHz |
| Security | WPA2-PSK + WPA3-PSK |
| Cipher | CCMP (AES) |
| 802.11r (FT) | Enabled |
@@ -98,47 +98,75 @@ If devices still can't connect, use WPA-only with TKIP-only:
| Interfaces | bridge, vlan10-mgmt |
| Certificate | Auto-generated |
### CAP Device (CAP XL ac - 192.168.10.2)
### CAP Device (cAP XL ac - 192.168.10.2)
| Setting | Value |
|---------|-------|
| caps-man-addresses | 192.168.10.1 |
| discovery-interfaces | bridgeLocal |
| slaves-datapath | capdp (bridge=bridgeLocal, vlan-id=40) |
| certificate | request |
| RouterOS | 7.21.1 |
| SSH Port | 2222 |
| SSH (via proxy) | See ProxyJump command below |
**SSH Access:** Direct SSH to CAP is unreliable. Use ProxyJump through Unraid:
```bash
ssh -o ProxyCommand="ssh -i ~/.ssh/id_ed25519_unraid -p 422 -W %h:%p root@192.168.10.20" -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.2
```
### CAP Bridge VLAN Filtering
The CAP runs bridge VLAN filtering to properly tag/untag WiFi client traffic before sending it to the HAP over the trunk link (ether1):
| Setting | Value |
|---------|-------|
| bridgeLocal | vlan-filtering=yes, pvid=10 |
| ether1 (trunk) | bridge port, PVID=10 |
| wifi1, wifi2 | dynamic bridge ports, PVID=40 (set by datapath vlan-id) |
**Bridge VLAN Table:**
| VLAN | ether1 | wifi1 | wifi2 | bridgeLocal | Purpose |
|------|--------|-------|-------|-------------|---------|
| 10 | untagged | - | - | untagged | Management |
| 20 | tagged | tagged | tagged | - | Trusted |
| 25 | tagged | tagged | tagged | - | Kids |
| 30 | tagged | tagged | tagged | - | IoT |
| 35 | tagged | tagged | tagged | - | Cameras |
| 40 | tagged | untagged | untagged | - | CatchAll (default) |
### CAP Interfaces
| Interface | Radio | Band | SSID | Status |
|-----------|-------|------|------|--------|
| cap-wifi1 | wifi1 | 2.4GHz | XTRM2 | Working |
| cap-wifi2 | wifi2 | 5GHz | XTRM | Channel issues (disabled) |
| Interface | Radio | Band | SSID | Security | Status |
|-----------|-------|------|------|----------|--------|
| cap-wifi1 | wifi2 | 5GHz | XTRM | WPA2/WPA3-PSK, CCMP | Working (Ch 52/5260, 40MHz, DFS) |
| cap-wifi2 | wifi1 | 2.4GHz | XTRM2 | WPA2-PSK, CCMP | Working (Ch 6/2437, 20MHz) |
### CAP Access List Rule
CAP clients bypass VLAN assignment (go to VLAN 10):
```routeros
/interface wifi access-list add \
interface=cap-wifi1 \
action=accept \
comment="CAP clients - no VLAN" \
place-before=0
```
**Note:** cap-wifi2 uses WPA2+CCMP only (not WPA+TKIP like HAP's local wifi2). Legacy IoT devices requiring TKIP will only work on HAP1's local wifi2.
---
## WiFi Access List (VLAN Assignment)
## WiFi Access List
Devices are assigned to VLANs based on MAC address:
**Status:** VLAN assignment via access list is **active**. Each entry has a `vlan-id` that assigns the device to the correct VLAN upon WiFi association. This works on both HAP (local) and CAP (remote, via bridge VLAN filtering).
| VLAN | Purpose | Example Devices |
|------|---------|-----------------||
| 20 | Trusted | MacBooks, iPhones, Samsung phones |
| 25 | Kids | Kids devices |
| 30 | IoT | Smart home devices, Chromecast, Bosch appliances |
| 40 | Catch-All | Unknown devices (default) |
**30+ entries** configured (MAC-based accept rules with VLAN IDs + 1 default catch-all):
### Current Access List
| # | MAC | Device | VLAN |
|---|-----|--------|------|
| 0 | AA:ED:8B:2A:40:F1 | Samsung S25 Ultra - Kaloyan | 20 |
| 1 | 82:6D:FB:D9:E0:47 | MacBook Air - Nora | 20 |
| 12 | CE:B8:11:EA:8D:55 | MacBook - Kaloyan | 20 |
| 13 | BE:A7:95:87:19:4A | MacBook 5GHz - Kaloyan | 20 |
| 27 | B8:27:EB:32:B2:13 | RecalBox RPi3 | 25 |
| 28 | CC:5E:F8:D3:37:D3 | ASUS ROG Ally - Kaloyan | 20 |
| 31 | C8:5C:CC:40:B4:AA | Xiaomi Air Purifier 2 | 30 |
| 32 | (any) | Default - VLAN40 | 40 (catch-all) |
**Default behavior:** Devices not in the access list get VLAN 40 (CatchAll) via the default rule and the datapath `vlan-id=40`.
### Show Full Access List
```routeros
/interface wifi access-list print

View File

@@ -1,6 +1,6 @@
# DNS Architecture with AdGuard Failover
**Last Updated:** 2026-02-06
**Last Updated:** 2026-02-26
---
@@ -194,8 +194,10 @@ Settings are synced from Unraid (source of truth) to MikroTik every 30 minutes.
### Sync Container
Container: `adguardhome-sync` at 192.168.10.11 (br0 macvlan, static IP)
```yaml
# /mnt/user/appdata/adguard-sync/adguardhome-sync.yaml
# /mnt/user/appdata/dockge/stacks/adguard-sync/adguardhome-sync.yaml
cron: "*/30 * * * *"
runOnStart: true
@@ -204,22 +206,13 @@ origin:
username: jazzymc
password: 7RqWElENNbZnPW
replicas:
- url: http://192.168.10.1:3000
username: jazzymc
password: 7RqWElENNbZnPW
features:
dns:
serverConfig: false
accessLists: true
rewrites: true
filters: true
clientSettings: true
services: true
replica:
url: http://192.168.10.1:3000
username: jazzymc
password: 7RqWElENNbZnPW
```
**Note:** The sync container must be connected to both `dockerproxy` and `br0` networks to reach both AdGuard instances.
**Note:** The sync container is on the `br0` macvlan network with a static IP to avoid conflicts with infrastructure devices.
---

276
docs/10-FAILOVER-NOBARA.md Normal file
View File

@@ -0,0 +1,276 @@
# Failover Infrastructure - Nobara (XTRM-Nobara)
**Last Updated:** 2026-02-13
**Purpose:** Temporary failover for critical services during Unraid maintenance windows.
---
## Overview
A Docker-based replica of critical services runs on the Nobara Linux workstation (XTRM-Nobara) with automatic failover via Keepalived VRRP. When Unraid goes offline, the virtual IP floats to Nobara and services continue operating.
```
Clients → 192.168.10.250 (VIP) → XTRM-U (MASTER, priority 150)
↓ failover (~4 seconds)
XTRM-Nobara (BACKUP, priority 100)
```
---
## Machines
| Role | Host | IP | Interface | Priority |
|------|------|-----|-----------|----------|
| **MASTER** | XTRM-U (Unraid) | 192.168.10.20 | br0 | 150 |
| **BACKUP** | XTRM-Nobara | 192.168.10.103 | enp5s0 | 100 |
| **VIP** | Shared | 192.168.10.250 | — | — |
---
## Replicated Services
| Service | Image | Ports (Nobara) | Domain |
|---------|-------|----------------|--------|
| **Traefik** | traefik:latest | 80, 443, 8080 | *.xtrm-lab.org |
| **Vaultwarden** | vaultwarden/server:latest | internal:80 | vault.xtrm-lab.org |
| **Authentik** | ghcr.io/goauthentik/server:2025.8.1 | internal:9000 | auth.xtrm-lab.org |
| **Authentik Worker** | ghcr.io/goauthentik/server:2025.8.1 | — | — |
| **PostgreSQL** | postgres:17 | internal:5432 | — |
| **Redis** | redis:7-alpine | internal:6379 | — |
| **AdGuard Home** | adguard/adguardhome:latest | 192.168.10.103:53, 3000 | — |
---
## File Locations
### Nobara (XTRM-Nobara)
| Path | Contents |
|------|----------|
| `/home/failover/docker-compose.yml` | Main compose stack |
| `/home/failover/traefik/` | Traefik config, certs, acme.json |
| `/home/failover/vaultwarden/` | Vaultwarden data (copy from Unraid) |
| `/home/failover/authentik/` | Authentik media & templates |
| `/home/failover/postgres/` | PostgreSQL data + initial dump |
| `/home/failover/redis/` | Redis data |
| `/home/failover/adguard/` | AdGuard conf & work dirs |
| `/etc/keepalived/keepalived.conf` | Keepalived VRRP config |
| `/usr/local/bin/check_failover.sh` | Health check script |
| `/usr/local/bin/failover-notify.sh` | State change notification script |
| `/var/log/keepalived-failover.log` | Failover event log |
### Unraid (XTRM-U)
| Path | Contents |
|------|----------|
| `/mnt/user/appdata/keepalived/keepalived.conf` | Keepalived VRRP config |
| `/mnt/user/appdata/keepalived/check_services.sh` | Health check script |
---
## Keepalived Configuration
### VRRP Parameters
| Parameter | Value |
|-----------|-------|
| Virtual Router ID | 51 |
| Auth Type | PASS |
| Auth Password | xtrm2026 |
| Advertisement Interval | 1 second |
| Health Check Interval | 5 seconds |
| Fail Threshold | 3 missed checks |
| Recovery Threshold | 2 successful checks |
### Unraid (MASTER)
- Runs as Docker container: `local/keepalived` (built from alpine + keepalived + curl)
- Priority: 150 (+ health check weight 2 = 152 when healthy)
- Health check: curls `http://localhost:8183/api/overview` (Traefik dashboard)
- Preemption: enabled (will reclaim VIP from Nobara when healthy)
```bash
# Start/stop on Unraid
docker start keepalived
docker stop keepalived
docker logs keepalived
```
### Nobara (BACKUP)
- Runs as systemd service: `keepalived.service`
- Priority: 100 (+ health check weight 2 = 102 when healthy)
- Health check: verifies Traefik and Vaultwarden containers are running
- `nopreempt` set (won't fight for VIP if Unraid is healthy)
```bash
# Start/stop on Nobara
sudo systemctl start keepalived
sudo systemctl stop keepalived
sudo journalctl -u keepalived -f
```
---
## DNS Strategy
**Approach:** Local DNS override via AdGuard Home.
To route traffic through the VIP for internal clients, configure AdGuard DNS rewrite rules to resolve `*.xtrm-lab.org``192.168.10.250`. External (Cloudflare) DNS remains pointed at Unraid's public IP.
---
## Operations
### Before Maintenance (Data Sync)
Run these commands from the Mac to sync latest data to Nobara:
```bash
# 1. Sync Vaultwarden data
ssh unraid "tar czf - -C /mnt/user/appdata vaultwarden/" | \
ssh nobara "tar xzf - -C /home/failover/"
# 2. Dump and sync Authentik database
ssh unraid "docker exec postgresql17 pg_dump -U authentik_user authentik_db" | \
ssh nobara "cat > /home/failover/postgres/authentik_dump.sql"
# 3. Sync AdGuard config
ssh unraid "tar czf - -C /mnt/user/appdata/adguardhome conf/ work/" | \
ssh nobara "tar xzf - -C /home/failover/adguard/"
# 4. Sync Traefik config and certs
ssh unraid "tar czf - -C /mnt/user/appdata/traefik traefik.yml dynamic.yml acme.json certs/" | \
ssh nobara "tar xzf - -C /home/failover/traefik/"
```
**Note:** `ssh unraid` = `ssh -i ~/.ssh/id_ed25519_unraid -p 422 root@192.168.10.20`
### Start Failover Services
```bash
# On Nobara
cd /home/failover
sudo docker compose up -d
sudo systemctl start keepalived
```
### Stop Failover Services
```bash
# On Nobara
cd /home/failover
sudo docker compose down
sudo systemctl stop keepalived
```
### Test Failover
```bash
# 1. Check VIP location
ssh unraid "ip addr show br0 | grep inet"
ssh nobara "ip addr show enp5s0 | grep inet"
# 2. Simulate Unraid failure
ssh unraid "docker stop keepalived"
# 3. Verify VIP moved to Nobara (wait ~4 seconds)
ssh nobara "ip addr show enp5s0 | grep inet"
# 4. Restore Unraid
ssh unraid "docker start keepalived"
# 5. Verify VIP returned to Unraid
ssh unraid "ip addr show br0 | grep inet"
```
### Check Status
```bash
# Nobara service status
ssh nobara "sudo docker ps --format 'table {{.Names}}\t{{.Status}}'"
# Nobara keepalived state
ssh nobara "sudo journalctl -u keepalived -n 10 --no-pager"
# Unraid keepalived state
ssh unraid "docker logs keepalived --tail 10"
# Which machine holds the VIP?
ping -c 1 192.168.10.250
```
---
## Traefik Configuration (Failover)
The Nobara Traefik instance has a **reduced** dynamic.yml that only serves the four critical services:
| Router | Domain | Backend |
|--------|--------|---------|
| vaultwarden-secure | vault.xtrm-lab.org | http://vaultwarden:80 |
| authentik-secure | auth.xtrm-lab.org | http://authentik:9000 |
| traefik-secure | traefik.xtrm-lab.org | api@internal |
TLS certificates are shared (copied from Unraid's acme.json + static certs).
---
## Limitations
- **Data is a point-in-time snapshot.** Changes made on Unraid after the last sync are not reflected on Nobara. Re-sync before maintenance.
- **No real-time replication.** Vaultwarden passwords saved during failover will not sync back to Unraid automatically.
- **Only critical services replicated.** Other services (Plex, Gitea, NetBox, etc.) will be offline during maintenance.
- **External DNS not updated.** Failover only works for clients using the local DNS (AdGuard) that resolves to the VIP. External access via Cloudflare will not failover.
---
## SSH Access
```bash
# From Mac to Nobara (passwordless, key-based)
ssh nobara
# or: ssh -i ~/.ssh/id_ed25519_nobara jazzymc@192.168.10.103
# Sudo on Nobara requires password: (check password manager)
```
---
## Recovery After Maintenance
1. Bring Unraid back online
2. Verify all Unraid services are running: `docker ps`
3. Keepalived on Unraid will auto-reclaim VIP (preemption)
4. Stop failover on Nobara: `cd /home/failover && sudo docker compose down`
5. If Vaultwarden was used during failover, manually export/import any new entries
---
## Architecture Diagram
```
┌─────────────────────┐
│ 192.168.10.250 │
│ (VRRP VIP) │
└─────────┬───────────┘
┌───────────────┼───────────────┐
│ │
┌─────────▼─────────┐ ┌─────────▼─────────┐
│ XTRM-U (Unraid) │ │ XTRM-Nobara │
│ 192.168.10.20 │ │ 192.168.10.103 │
│ MASTER (150) │ │ BACKUP (100) │
│ │ │ │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ Traefik │ │ │ │ Traefik │ │
│ │ Vaultwarden │ │ │ │ Vaultwarden │ │
│ │ Authentik │ │ │ │ Authentik │ │
│ │ AdGuard │ │ │ │ AdGuard │ │
│ │ + 25 more │ │ │ │ PostgreSQL │ │
│ └──────────────┘ │ │ │ Redis │ │
│ │ │ └──────────────┘ │
│ Keepalived (Docker)│ │ Keepalived (systemd)│
└────────────────────┘ └────────────────────┘
```

View File

@@ -0,0 +1,167 @@
# Cross-VLAN Casting & Streaming
Configuration for casting/streaming from VLANs 10 (Mgmt), 20 (Trusted), and 25 (Kids) to devices on VLAN 30 (IoT).
## Casting Devices
| Device | MAC (Ethernet) | MAC (WiFi) | Static IP | VLAN |
|--------|---------------|------------|-----------|------|
| LG TV (webOS) | B0:37:95:79:AF:9B | DC:03:98:6B:5A:3A | .40 (eth) / .41 (wifi) | 30 |
| Chromecast | — | D0:E7:82:F7:65:DD | .42 | 30 |
All IPs in subnet `192.168.30.0/24`.
## What Works
| Feature | From VLAN 20/25/10 | Notes |
|---------|-------------------|-------|
| AirPlay (Mac → LG TV) | Yes | TV must use ONE interface only (see below) |
| Smart View (Samsung → LG TV) | Yes | Works without issues |
| YouTube Cast (phone → TV/Chromecast) | Yes | Via TV Link Code, not device discovery |
| Chromecast casting | Yes | Requires mDNS repeater |
## What Doesn't Work
| Feature | Reason |
|---------|--------|
| LG ThinQ remote app | Client-side subnet check — app refuses if phone and TV are on different subnets. No workaround. |
## MikroTik Configuration
### 1. Address List
```routeros
/ip/firewall/address-list
add list=casting-devices address=192.168.30.40 comment="LG TV Ethernet"
add list=casting-devices address=192.168.30.41 comment="LG TV WiFi"
add list=casting-devices address=192.168.30.42 comment="Chromecast"
```
### 2. Firewall Rules (Forward Chain)
Bidirectional rules — casting devices need to initiate connections back (AirPlay uses separate UDP channels for timing/control).
```routeros
/ip/firewall/filter
# Forward: source VLANs → IoT
add chain=forward action=accept src-address=192.168.20.0/24 dst-address=192.168.30.0/24 comment="Allow Trusted to IoT (casting)"
add chain=forward action=accept src-address=192.168.25.0/24 dst-address=192.168.30.0/24 comment="Allow Kids to IoT (casting)"
add chain=forward action=accept src-address=192.168.10.0/24 dst-address=192.168.30.0/24 comment="Allow Mgmt to IoT"
# Return: casting devices → source VLANs
add chain=forward action=accept src-address-list=casting-devices dst-address=192.168.20.0/24 comment="Allow casting devices to Trusted (casting return)"
add chain=forward action=accept src-address-list=casting-devices dst-address=192.168.25.0/24 comment="Allow casting devices to Kids (casting return)"
add chain=forward action=accept src-address-list=casting-devices dst-address=192.168.10.0/24 comment="Allow casting devices to Mgmt (casting return)"
```
These rules must be **before** the IoT block rules:
```routeros
# Block IoT → other VLANs (AFTER the return rules above)
add chain=forward action=drop src-address=192.168.30.0/24 dst-address=192.168.10.0/24 comment="Block IoT to Management"
add chain=forward action=drop src-address=192.168.30.0/24 dst-address=192.168.20.0/24 comment="Block IoT to Trusted"
```
### 3. FastTrack Exclusion (Mangle)
FastTrack bypasses conntrack/firewall — must exclude inter-VLAN casting traffic.
```routeros
/ip/firewall/mangle
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.20.0/24 dst-address=192.168.30.0/24 comment="No FastTrack: Trusted<->IoT (casting)"
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.30.0/24 dst-address=192.168.20.0/24 comment="No FastTrack: IoT<->Trusted (casting)"
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.25.0/24 dst-address=192.168.30.0/24 comment="No FastTrack: Kids<->IoT (casting)"
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.30.0/24 dst-address=192.168.25.0/24 comment="No FastTrack: IoT<->Kids (casting)"
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.10.0/24 dst-address=192.168.30.0/24 comment="No FastTrack: Mgmt<->IoT (casting)"
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.30.0/24 dst-address=192.168.10.0/24 comment="No FastTrack: IoT<->Mgmt (casting)"
```
FastTrack rule must use `connection-mark=no-mark`:
```routeros
/ip/firewall/filter
add chain=forward action=fasttrack-connection connection-state=established,related connection-mark=no-mark comment="defconf: fasttrack"
```
### 4. mDNS Repeater
Enables cross-VLAN device discovery (AirPlay, Chromecast).
```routeros
/ip/dns/set mdns-repeat-ifaces=1-vlan10-mgmt,2-vlan20-trusted,3-vlan25-family,4-vlan30-iot
```
### 5. IGMP Proxy
Enables multicast forwarding (SSDP/UPnP discovery).
```routeros
/routing/igmp-proxy/interface
add interface=4-vlan30-iot upstream=yes threshold=1
add interface=2-vlan20-trusted upstream=no threshold=1
add interface=3-vlan25-family upstream=no threshold=1
add interface=1-vlan10-mgmt upstream=no threshold=1
```
### 6. DHCP Static Leases
```routeros
/ip/dhcp-server/lease
add address=192.168.30.40 mac-address=B0:37:95:79:AF:9B server=dhcp-vlan30 comment="LG TV Ethernet"
add address=192.168.30.41 mac-address=DC:03:98:6B:5A:3A server=dhcp-vlan30 comment="LG TV WiFi"
add address=192.168.30.42 mac-address=D0:E7:82:F7:65:DD server=dhcp-vlan30 comment="Chromecast"
```
### 7. WiFi Access List
```routeros
/interface/wifi/access-list
add mac-address=DC:03:98:6B:5A:3A action=accept vlan-id=30 comment="LG TV WiFi"
add mac-address=D0:E7:82:F7:65:DD action=accept vlan-id=30 comment="Chromecast"
```
## Troubleshooting
### AirPlay Black Screen on LG TV
**Root cause**: LG TV connected via both Ethernet AND WiFi simultaneously.
The TV advertises AirPlay via mDNS on one interface but streams on the other, creating asymmetric routing. The Mac connects to one IP, but the TV sends return traffic from a different IP.
**Fix**: Use only ONE connection on the TV — either Ethernet or WiFi, not both. Disconnect the unused one in TV settings.
- Ethernet MAC: `B0:37:95:79:AF:9B` → 192.168.30.40
- WiFi MAC: `DC:03:98:6B:5A:3A` → 192.168.30.41
### Do NOT Use Masquerade NAT
Masquerade (srcnat) was tried to make cross-VLAN traffic appear local. This breaks AirPlay because:
- AirPlay negotiates separate UDP feedback channels (timing port 7010, control 6001, timing 6002)
- With masquerade, TV sends feedback to the router IP (192.168.30.1) instead of the Mac's real IP
- Result: control channel works but video/audio never arrives → black screen
### Chromecast Setup Issues
The Chromecast can only be set up via the Google Home app (no web interface).
**Common setup failure**: Google Home app finds the Chromecast via Bluetooth, connects to its setup WiFi hotspot, but then says "Could not communicate with your Chromecast."
**Fix** (on phone before setup):
1. Disable mobile data
2. Disable VPN
3. Turn off "Switch to mobile data when WiFi is unstable"
4. Enable Location services (required by Google Home)
5. Clear Google Home app cache
**WiFi requirements**: Chromecast requires **WPA2 with AES/CCMP** encryption. It will NOT connect to networks using TKIP. The XTRM2 (2.4GHz) security profile was changed from TKIP to CCMP to support this:
```routeros
/interface/wifi/security/set sec-xtrm2 encryption=ccmp
```
### VPN Interference
If your Mac is connected to WireGuard VPN, the VPN overrides the default route — local traffic bypasses WiFi and goes through the VPN tunnel. Disconnect VPN before casting.
### CAP VLAN Limit
The CAP XL ac may show "maximum VLAN count for interface was reached." If a device can't connect to WiFi, try disabling the CAP interfaces temporarily to force connection to the HAP's radio directly.

View File

@@ -0,0 +1,275 @@
# Development Environment
**Last Updated:** 2026-03-08
Web-based development environment running directly on Unraid, providing VS Code IDE with full host access to Claude Code, Cooperator CLI, Docker, and all project repositories.
---
## OpenVSCode Server
| Property | Value |
|----------|-------|
| **URL** | https://code.xtrm-lab.org |
| **Auth** | Authentik forward auth (SSO) |
| **Port** | 3100 (host-native, not a container) |
| **Binary** | `/mnt/user/appdata/openvscode/current/` (symlink) |
| **Config** | `/mnt/user/appdata/openvscode/config/` |
| **Boot Script** | `/mnt/user/appdata/openvscode/start.sh` |
| **Log** | `/mnt/user/appdata/openvscode/server.log` |
**Why host-native?** Running directly on Unraid (not in a container) means the VS Code terminal has full access to `claude`, `cooperator`, `node`, `npm`, `docker`, `git`, and all host tools. No volume mount hacks or container-breaking updates.
### Persistence
All data lives on the array (`/mnt/user/`) — survives Unraid OS updates:
| Component | Path | Purpose |
|-----------|------|---------|
| Server binary | `/mnt/user/appdata/openvscode/openvscode-server-v1.109.5-linux-x64/` | VS Code server |
| Symlink | `/mnt/user/appdata/openvscode/current` → version dir | Easy version switching |
| VS Code config | `/mnt/user/appdata/openvscode/config/` | Extensions, settings, themes |
| Start script | `/mnt/user/appdata/openvscode/start.sh` | Startup with PATH setup |
### Updating OpenVSCode Server
```bash
# Download new version
cd /mnt/user/appdata/openvscode
curl -fsSL "https://github.com/gitpod-io/openvscode-server/releases/download/openvscode-server-vX.Y.Z/openvscode-server-vX.Y.Z-linux-x64.tar.gz" -o new.tar.gz
tar xzf new.tar.gz && rm new.tar.gz
# Switch symlink and restart
ln -sfn openvscode-server-vX.Y.Z-linux-x64 current
pkill -f "openvscode-server.*--port 3100"
/mnt/user/appdata/openvscode/start.sh
```
Extensions and settings are preserved (stored separately in `config/`).
### Traefik Routing
Defined in `/mnt/user/appdata/traefik/dynamic.yml`:
```yaml
openvscode-secure:
rule: "Host(`code.xtrm-lab.org`)"
entryPoints: [https]
middlewares: [default-headers, authentik-forward-auth]
tls:
certResolver: cloudflare
service: openvscode
# ...
openvscode:
loadBalancer:
servers:
- url: "http://192.168.10.20:3100"
```
---
## Claude Code
| Property | Value |
|----------|-------|
| **Version** | 2.1.71 |
| **Binary** | `/mnt/user/appdata/claude-code/.npm-global/bin/claude` |
| **Symlink** | `/root/.local/bin/claude` |
| **Config** | `/mnt/user/appdata/claude-code/.claude.json``/root/.claude.json` |
| **Settings** | `/mnt/user/appdata/claude-code/.claude/``/root/.claude/` |
| **Boot Script** | `/mnt/user/appdata/claude-code/install-claude.sh` |
### Persistence
npm global prefix set to `/mnt/user/appdata/claude-code/.npm-global/` (array-backed). Boot script creates symlinks from `/root/` to persistent paths.
### Updating Claude Code
```bash
source /root/.bashrc
npm install -g @anthropic-ai/claude-code
claude --version
```
---
## Cooperator CLI
| Property | Value |
|----------|-------|
| **Version** | 3.36.1 |
| **Binary** | `/mnt/user/appdata/claude-code/.npm-global/bin/cooperator` |
| **Config** | `~/.cooperator/.env` (Shortcut token, Confluence, git config) |
| **Registry** | `@ampeco:registry=https://gitlab.com/api/v4/projects/71775017/packages/npm/` |
| **npm auth** | `/root/.npmrc` (GitLab PAT) |
### What Cooperator Install Sets Up
- **Commands** — `~/.claude/commands/cooperator` → cooperator's claude-commands
- **Agents** — `~/.claude/agents/implementation-task-executor.md`
- **Skills** — 12 cooperator skills (shortcut-operations, create-feature-story, gitlab-operations, etc.)
- **Shortcut API** — validated via `~/.cooperator/.env` token
### Updating Cooperator
```bash
source /root/.bashrc
npm install -g @ampeco/cooperator
cooperator --version
```
**Note:** `/root/.npmrc` is in RAM — recreated on boot if needed. The GitLab PAT is stored in `/boot/config/go` would need a persistent `.npmrc` setup if token changes frequently.
---
## GitLab CLI (glab)
| Property | Value |
|----------|-------|
| **Version** | 1.89.0 |
| **Binary** | `/usr/local/bin/glab` (RAM — lost on reboot) |
| **Config** | `~/.config/glab-cli/config.yml` |
| **Auth** | GitLab PAT (same as npm registry token) |
**Note:** glab binary at `/usr/local/bin/` is lost on Unraid reboot. Add to boot script or persist to appdata.
---
## Python (via uv)
| Property | Value |
|----------|-------|
| **uv** | `/root/.local/bin/uv` |
| **Python** | 3.12.13 (managed by uv) |
| **mikrotik-mcp venv** | `/mnt/user/projects/mikrotik-mcp/venv/` |
| **unraid-mcp venv** | `/mnt/user/projects/unraid-mcp/.venv/` |
---
## Custom Skills
6 custom skills synced from Mac to `/mnt/user/appdata/claude-code/custom-skills/`:
| Skill | Description |
|-------|-------------|
| ev-compliance-story | EV regulatory compliance story creation |
| ev-protocol-expert | OCPP/OCPI/AFIR protocol expertise |
| frontend-designer | Nova/Vue component design |
| mikrotik-admin | MikroTik router management via MCP |
| prd-generator | Product requirements documents |
| unraid-admin | Unraid server management via MCP |
Symlinked to `~/.claude/skills/` alongside 12 cooperator skills (18 total).
---
## MCP Servers
### Registered (TODO)
The following MCP servers need to be registered via `claude mcp add` on Unraid:
| Server | Command | Status |
|--------|---------|--------|
| **shortcut** | `node /mnt/user/appdata/claude-code/mcp-server-shortcut/dist/index.js` | Built, needs `claude mcp add` |
| **mikrotik** | `/mnt/user/projects/mikrotik-mcp/venv/bin/python -m mikrotik_mcp.server` | Venv ready, needs `claude mcp add` |
| **unraid** | `/mnt/user/projects/unraid-mcp/.venv/bin/python -m unraid_mcp.main` | Venv ready, needs `claude mcp add` |
| **playwright** | `npx -y @playwright/mcp@latest --isolated` | npx available, needs `claude mcp add` |
| **smartbear** | `npx -y @smartbear/mcp@latest` | npx available, needs `claude mcp add` |
### Environment Variables for MCPs
- **mikrotik**: `DEVICES_PATH=/mnt/user/projects/mikrotik-mcp/devices.json`
- **unraid**: `UNRAID_API_URL`, `UNRAID_API_KEY`, `UNRAID_MCP_TRANSPORT=stdio`, `UNRAID_VERIFY_SSL=false`
- **shortcut**: `SHORTCUT_API_TOKEN` (from `~/.cooperator/.env`)
---
## Projects Workspace
All projects at `/mnt/user/projects/`, opened as default folder in VS Code.
### Personal Projects (Gitea)
| Project | Gitea Repo | Description |
|---------|-----------|-------------|
| infrastructure | jazzymc/infrastructure | This repo — home lab documentation |
| claude-skills | jazzymc/claude-skills | Claude Code custom skills |
| mikrotik-mcp | jazzymc/mikrotik-mcp | MikroTik MCP server |
| unraid-mcp | jazzymc/unraid-mcp | Unraid MCP server |
| unraid-glass | jazzymc/unraid-glass | Unraid dashboard plugin |
| openclaw | jazzymc/openclaw | OpenClaw game project |
| nanobot-mcp | jazzymc/nanobot-mcp | Nanobot MCP server |
| nanobot-hkuds | jazzymc/nanobot-hkuds | Nanobot HKU DS |
| xtrm-agent | jazzymc/xtrm-agent | AI agent framework |
| geekmagic-smalltv | jazzymc/geekmagic-smalltv | SmallTV firmware |
| homarr | jazzymc/homarr | Homarr dashboard fork |
| shortcut-daily-sync | jazzymc/shortcut-daily-sync | Shortcut sync tool |
**Remote URL format:** `https://jazzymc:<token>@git.xtrm-lab.org/jazzymc/<repo>.git`
### AMPECO Work Projects
| Project | Source | Type |
|---------|--------|------|
| backend | GitLab (ampeco/apps/charge/backend) | Git clone |
| crm | GitLab (ampeco/apps/charge/crm) | Git clone |
| marketplace | GitLab (ampeco/apps/charge/marketplace) | Git clone |
| mobile-2 | GitLab (ampeco/apps/charge/mobile-2) | Git clone |
| ad-hoc-payment-web-app | GitLab (ampeco/apps/charge/external-apps/) | Git clone |
| dev-proxy | GitLab (ampeco/apps/shared/dev-proxy) | Git clone |
| ampeco-custom-dashboard-widgets-boilerplate | GitHub (ampeco/) | Git clone |
| docs | Local rsync | Reference docs |
| stories | Local rsync | Product stories |
| booking-ewa | Local rsync | Booking app |
| ewa-ui | Local rsync | EWA frontend |
| design-tokens | Local rsync | Design system tokens |
| ampeco-backup | Local rsync | Configuration backups |
| central_registry | Local rsync | Service registry |
| CCode-UI-Distribution-1.0.0 | Local rsync | UI distribution |
| automations | Local rsync | Automation scripts |
**GitLab auth:** OAuth2 PAT in remote URLs.
---
## Boot Sequence
`/boot/config/go` triggers on Unraid boot:
1. **Wait for array** — polls for `/mnt/user/appdata/claude-code` (up to 5 min)
2. **Claude Code setup**`/mnt/user/appdata/claude-code/install-claude.sh`
- Creates symlinks (`/root/.local/bin/claude`, `/root/.claude`, `/root/.claude.json`)
- Writes `.bashrc` with persistent npm PATH
3. **OpenVSCode Server**`/mnt/user/appdata/openvscode/start.sh`
- Kills any existing instance
- Starts on port 3100 with persistent config dir
- Sources Claude/Cooperator PATH for terminal sessions
---
## Architecture Diagram
```
Browser → https://code.xtrm-lab.org
Traefik (443) → Authentik SSO check
OpenVSCode Server (:3100, host-native)
Unraid Host Shell
├── claude (2.1.71)
├── cooperator (3.36.1)
├── glab (1.89.0)
├── node (22.18.0) / npm (10.9.3) / bun (1.3.10)
├── uv + python 3.12
├── docker / docker compose
├── git
└── /mnt/user/projects/
├── ampeco/ (18 AMPECO work projects)
├── infrastructure/
├── claude-skills/
├── mikrotik-mcp/
└── ... (12 personal repos)
```

View File

@@ -2,6 +2,179 @@
**Purpose:** Major infrastructure events only. Minor changes are in git commit messages.
---
## 2026-02-28
### Docker Container Audit & Migration to Dockge
- **[DOCKER]** Removed 4 orphan images: nextcloud/all-in-one, olprog/unraid-docker-webui, ghcr.io/ich777/doh-server, ghcr.io/idmedia/hass-unraid
- **[DOCKER]** Removed ancient pgAdmin4 v2.1 (status=Created) and fenglc/pgadmin4 image
- **[DOCKER]** Removed spaceinvaderone/ha_inabox image (replaced by Home-Assistant-Container)
- **[TRAEFIK]** Removed Docker provider constraint (`traefik.constraint=valid`) — Docker labels now auto-discovered
- **[TRAEFIK]** Cleaned up dynamic.yml: removed 14 stale/migrated router+service pairs (pangolin, pihole, doh, netbox, and services now using Docker labels)
- **[TRAEFIK]** Added dockge-secure router to dynamic.yml
- **[DOCKER]** Created 6 new Dockge stacks: docker-socket-proxy, tuyagateway, firefly, seekandwatch, ha-time-machine, homeassistant (replaced inabox with Container)
- **[DOCKER]** Migrated ALL 53 containers from dockerman to Dockge compose stacks (100% coverage)
- **[DOCKER]** Fixed Nextcloud Traefik rule: empty Host() → Host(`cloud.xtrm-lab.org`)
- **[DOCKER]** Fixed UptimeKuma Traefik rule: empty Host() → Host(`uptime.xtrm-lab.org`)
- **[DOCKER]** Fixed Homarr domain: `homarr.xtrm-lab.org``xtrm-lab.org` (root domain)
- **[DOCKER]** Fixed Netdisco entrypoint: `websecure``https`
- **[DOCKER]** Removed stale `traefik.constraint=valid` from Dockhand
- **[DOCKER]** Fixed Transmission middleware: removed non-existent `transmission-headers@file`
- **[DOCKER]** Added Authentik forward auth middleware to: n8n, homarr, transmission, speedtest-tracker, uptime-kuma, firefly, seekandwatch, open-webui, traefik dashboard, dockge, netalertx, urbackup, unimus
- **[DOCKER]** Added Traefik labels to: vaultwarden, open-webui (ai.xtrm-lab.org), firefly, seekandwatch
- **[DOCKER]** Added missing Unraid labels (icon, managed, webui) to: ntfy, timemachine, ollama, docker-socket-proxy, tuyagateway, all new stacks
- **[DOCKER]** Moved ollama + open-webui from bridge to dockerproxy network
- **[DOCKER]** Moved fireflyiii + firefly-data-importer from none to dockerproxy network
- **[DOCKER]** Moved SeekAndWatch from bridge to dockerproxy network
- **[DOCKER]** Removed traefik labels from host-network containers (plex, netalertx) — routed via dynamic.yml only
- **[DOCKER]** Fixed NetAlertX: added read_only, proper capabilities (NET_RAW/NET_ADMIN), and UID 20211
- **[DOCKER]** Removed empty netbox stack directory
## 2026-03-09
### Claude Code Tooling Completion
- **[SERVICE]** Installed Cooperator CLI v3.36.1 on Unraid (`npm install -g @ampeco/cooperator`)
- **[SERVICE]** Ran `cooperator install --non-interactive` — symlinked commands, agents, 12 skills to `~/.claude/`
- **[SERVICE]** Created `~/.cooperator/.env` with Shortcut API token, Confluence token, git config
- **[SERVICE]** Installed glab CLI v1.89.0 on Unraid (`/usr/local/bin/glab`) — authenticated as kaloyan.danchev
- **[SERVICE]** Installed uv package manager + Python 3.12.13 on Unraid
- **[SERVICE]** Created Python venvs for mikrotik-mcp and unraid-mcp projects
- **[SERVICE]** Copied MikroTik SSH key from Mac to Unraid — SSH to HAP ax3 verified working
- **[SERVICE]** Synced 6 custom Claude skills to `/mnt/user/appdata/claude-code/custom-skills/` (ev-compliance-story, ev-protocol-expert, frontend-designer, mikrotik-admin, prd-generator, unraid-admin)
- **[SERVICE]** Built shortcut MCP server at `/mnt/user/appdata/claude-code/mcp-server-shortcut/`
- **[SERVICE]** Enabled Claude plugins: ralph-loop, claude-md-management, playground
- **[DOCS]** Updated 12-DEVELOPMENT-ENVIRONMENT.md with Cooperator, glab, Python, skills, MCP sections
#### TODO — MCP Server Registration
The following MCP servers are built/ready but need `claude mcp add` registration (requires interactive Claude session on Unraid):
- shortcut, mikrotik, unraid, playwright, smartbear
## 2026-03-08
### Development Environment Setup
- **[SERVICE]** Installed OpenVSCode Server as host-native process (port 3100, not a container) — accessible at https://code.xtrm-lab.org
- **[SERVICE]** Traefik route added in dynamic.yml with Authentik forward auth
- **[SERVICE]** Boot auto-start via `/boot/config/go``/mnt/user/appdata/openvscode/start.sh`
- **[SERVICE]** Claude Code updated to v2.1.71, persistent at `/mnt/user/appdata/claude-code/.npm-global/`
- **[SERVICE]** Cooperator CLI v3.36.1 installed globally (`npm install -g @ampeco/cooperator`)
- **[SERVICE]** Created `/mnt/user/projects/` workspace with 12 personal repos (Gitea) + 18 AMPECO work projects (GitLab)
- **[DOCS]** Added `12-DEVELOPMENT-ENVIRONMENT.md` documenting full dev environment setup
### Docker Maintenance
- **[DOCKER]** Created Unraid Docker Manager XML templates for 11 containers missing them (adguardhome, gitea, minecraft, ntfy, ollama, open-webui, etc.)
- **[DOCKER]** Pulled new images for all 30 active Dockge stacks, 14 containers received updates
- **[DOCKER]** Cleaned up dangling images: 10.95 GB reclaimed
- **[DOCKER]** Organized all 42 containers into Docker Folders (12 folders: Infrastructure, Security, Monitoring, DevOps, Media, etc.)
- **[DOCKER]** Pushed 6 local-only projects to Gitea (claude-skills, mikrotik-mcp, unraid-mcp, nanobot-mcp, nanobot-hkuds, openclaw)
### Service Fixes
- **[FIX]** Gitea DB connection: fixed hardcoded PostgreSQL IP (172.18.0.13) → hostname `postgresql17` in compose and app.ini
- **[FIX]** Traefik: removed stale stopped container blocking restart
- **[FIX]** Redis: removed stale stopped container blocking recreate
## 2026-02-26
### WiFi & CAP VLAN Fixes
- **[WIFI]** Fixed 5GHz channel overlap: HAP wifi1 reduced from 80MHz to 40MHz at 5180MHz, CAP cap-wifi1 at 5220MHz (no overlap)
- **[WIFI]** Restored all 29 WiFi access-list MAC→VLAN entries (were missing/lost)
- **[WIFI]** Fixed cap-wifi2 band mismatch: was `band=2ghz-n` with frequency=5220 (5GHz), corrected to frequency=2412
- **[CAPSMAN]** Enabled bridge VLAN filtering on CAP (cAP XL ac) — all VLANs now properly tagged through CAP
- **[CAPSMAN]** CAP bridgeLocal config: vlan-filtering=yes, pvid=10, VLANs 10/20/25/30/35/40 with proper tagged/untagged members
- **[CAPSMAN]** Set `capdp` datapath vlan-id=40 for default PVID on dynamic wifi bridge ports
- **[CAPSMAN]** VLAN assignment through CAP now working — access-list vlan-id entries propagate correctly
- **[NETWORK]** Fixed AdGuard Home IP conflict: container was at 192.168.10.2 (CAP's IP), now static at 192.168.10.10
- **[NETWORK]** Fixed adguardhome-sync IP conflict: was at 192.168.10.3 (CSS326's IP), now static at 192.168.10.11
- **[WIFI]** Added Xiaomi Air Purifier 2 (C8:5C:CC:40:B4:AA) to access-list as VLAN 30 (IoT)
### WiFi Quality Optimization
- **[WIFI]** Fixed 2.4GHz co-channel interference: HAP on ch 1 (2412), CAP moved from ch 1 to ch 6 (2437)
- **[WIFI]** Fixed 5GHz overlap: HAP stays ch 36 (5180, 40MHz), CAP moved from ch 44 (5220) to ch 52 (5260, DFS)
- **[WIFI]** Fixed CAP 2.4GHz width from 40MHz to 20MHz for IoT compatibility
- **[WIFI]** TX power kept at defaults (17/16 dBm) — reduction caused kitchen coverage loss through concrete walls
## 2026-02-24
### Motherboard Replacement & NVMe Cache Pool
- **[HARDWARE]** Replaced XTRM-U motherboard — new MAC `38:05:25:35:8E:7A`, DHCP lease updated on MikroTik
- **[HARDWARE]** Confirmed disk1 (10TB HGST HUH721010ALE601, serial 2TKK3K1D) mechanically dead — clicking heads, fails on multiple SATA ports and new motherboard
- **[STORAGE]** Created new Unraid-managed cache pool: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1 (~1.8TB usable)
- **[STORAGE]** Pool settings: autotrim=on, compression=on
- **[DOCKER]** Migrated Docker from btrfs loopback image (disk1 HDD) to ZFS on NVMe cache pool
- **[DOCKER]** Docker now uses ZFS storage driver directly on `cache/system/docker` dataset
- **[DOCKER]** Recreated `dockerproxy` bridge network, rebuilt all 39 container templates
- **[DOCKER]** Restarted Dockge and critical stacks (adguardhome, ntfy, gitea, woodpecker, etc.)
- **[STORAGE]** Deleted old `docker.img` (200GB) from disk1
- **[INCIDENT]** disk1 still running in parity-emulated mode — replacement drive needed
### Post-Migration Container Cleanup
- **[NETWORK]** Fixed Traefik unreachable: removed stale Docker bridge (duplicate 172.18.0.0/16 subnet) + 7 orphaned bridges
- **[DOCKER]** Removed deprecated containers: DoH-Server, binhex-plexpass (duplicate of Plex)
- **[DOCKER]** Removed obsolete containers: HomeAssistant_inabox, Docker-WebUI, hass-unraid
- **[DOCKER]** Removed nextcloud-aio-mastercontainer (replaced by Nextcloud container)
- **[SERVICE]** Fixed adguardhome-sync: recreated config file (was directory from migration), switched to br0 network for macvlan reachability
- **[SERVICE]** Fixed diode stack: recreated .env, nginx.conf, OAuth2 client config; ran Hydra DB migration and client bootstrap
- **[SERVICE]** Fixed diode-agent: corrected YAML format, secrets, and Hydra authentication
- **[SERVICE]** Started unmarr (Homarr fork, 172.18.0.81) and rustfs (S3-compatible storage)
- **[DOCKER]** Final state: 53 containers running, pgAdmin4 stopped (utility)
- **[DOCS]** Updated 03-SERVICES-OTHER.md with removed containers
---
## 2026-02-14
### CAP XL ac Recovery
- **[WIRELESS]** Factory reset CAP XL ac (lost credentials)
- **[WIRELESS]** Reconfigured CAPsMAN: regenerated certificate, CAP re-enrolled with `certificate=request`
- **[WIRELESS]** Both CAP radios now active: wifi1 (2.4GHz XTRM2) + wifi2 (5GHz XTRM)
- **[WIRELESS]** CAP now running RouterOS 7.21.1
- **[WIRELESS]** Enabled SSH on CAP port 2222 for user xtrm with mikrotik key
- **[WIRELESS]** Confirmed WiFi access list has no VLAN assignment (rolled back Jan 27)
### Roms Network Share
- **[SERVICE]** Shared /mnt/user/roms (2.3TB, 49 systems) via SMB from Unraid
- **[SERVICE]** Mounted on Nobara at /mnt/roms (fstab, CIFS guest, systemd.automount)
- **[SERVICE]** Mounted on Recalbox via custom.sh boot script (CIFS bind mounts)
- **[SERVICE]** Deleted local roms from Recalbox SD card (~12.5GB freed)
### WiFi DHCP Fix
- **[NETWORK]** Fixed DHCP not working on HAP1 local WiFi (wifi1/wifi2)
- **[NETWORK]** Root cause: VLAN 40 had wifi1/wifi2 as **tagged** instead of **untagged** — DHCP responses had 802.1Q tags clients couldn't process
- **[NETWORK]** Fix: `/interface bridge vlan set` wifi1,wifi2 to untagged for VLAN 40
### Minecraft Server Deployed
- **[SERVICE]** Deployed Minecraft Java Edition (itzg/minecraft-server) on Unraid
- **[SERVICE]** Version 1.21.11, Survival mode, 2GB RAM, max 10 players
- **[SERVICE]** Docker IP 172.18.0.80, port 25565, Dockge stack `minecraft`
- **[NETWORK]** NAT port forward WAN:25565 → 192.168.10.20:25565
- **[NETWORK]** Hairpin NAT for internal access via minecraft.xtrm-lab.org
- **[SERVICE]** Added Unraid labels with Minecraft icon
### Documentation Updates
- **[DOCS]** Updated 07-WIFI-CAPSMAN-CONFIG.md: CAP both radios working, access list status
- **[DOCS]** Updated 01-NETWORK-MAP.md: Fixed CAP IP (.6→.2), added Nobara and SMB shares
- **[DOCS]** Updated 04-HARDWARE-INVENTORY.md: CAP details, added Recalbox device
- **[DOCS]** Updated 06-VLAN-DEVICE-ASSIGNMENT.md: Added Nobara (VLAN 10) and Recalbox (VLAN 25)
- **[DOCS]** Updated 03-SERVICES-OTHER.md: Added Roms SMB share, Minecraft server section
---
## 2026-02-13
### Failover Infrastructure Deployed
- **[SERVICE]** Deployed Docker failover stack on XTRM-Nobara (Traefik, Vaultwarden, Authentik, AdGuard Home)
- **[SERVICE]** Installed Docker CE 29.2.1 + Docker Compose 5.0.2 on Nobara
- **[SERVICE]** Deployed Keepalived VRRP for automatic failover (VIP: 192.168.10.250)
- **[SERVICE]** Unraid: Keepalived as Docker container (local/keepalived, MASTER priority 150)
- **[SERVICE]** Nobara: Keepalived as systemd service (BACKUP priority 100)
- **[SERVICE]** Replicated data: Vaultwarden DB, Authentik PostgreSQL dump (864MB), AdGuard config, Traefik certs
- **[NETWORK]** Added VRRP protocol to Nobara firewall (firewalld)
- **[NETWORK]** Configured SSH key auth to Nobara (id_ed25519_nobara, passwordless)
- **[NETWORK]** Added SSH config alias: `ssh nobara`
- **[DOCS]** Created 10-FAILOVER-NOBARA.md with full failover documentation
- **[DOCS]** Updated 02-SERVICES-CRITICAL.md with failover section
- **[DOCS]** Updated 04-HARDWARE-INVENTORY.md with XTRM-Nobara specs
- **[DOCS]** Updated README.md and CLAUDE.md with Nobara references
---
## 2026-02-06

View File

@@ -0,0 +1,91 @@
# Incident: Disk1 Hardware Failure (Clicking / SATA Link Failure)
**Date:** 2026-02-20
**Severity:** P2 - Degraded (no redundancy)
**Status:** Open — awaiting replacement drive (motherboard replaced, NVMe cache pool added Feb 24)
**Affected:** XTRM-U (Unraid NAS) — disk1 (data drive)
---
## Summary
disk1 (10TB HGST Ultrastar HUH721010ALE601, serial `2TKK3K1D`) has physically failed. The drive dropped off the SATA bus on Feb 18 at 19:15 and is now exhibiting clicking (head failure). The Unraid md array is running in **degraded/emulated mode**, reconstructing disk1 data from parity on the fly. All data is intact but there is **zero redundancy**.
---
## Timeline
| When | What |
|------|------|
| Feb 18 ~19:15 | `ata5: qc timeout` → multiple hard/soft resets → `reset failed, giving up``ata5.00: disable device` |
| Feb 18 19:17 | `super.dat` updated — md array marked disk1 as `DISK_DSBL` (213 errors) |
| Feb 20 13:14 | Investigation started. `sdc` completely absent from `/dev/`. ZFS pool `disk1` running on emulated `md1p1` with 0 errors |
| Feb 20 ~13:30 | Server rebooted, disk moved to new SATA port (ata5 → ata6). Same failure: `ata6: reset failed, giving up`. Clicking noise confirmed |
| Feb 24 | Motherboard replaced. Dead drive confirmed still dead on new hardware. New SATA port assignment. Drive is mechanically failed (clicking heads) |
| Feb 24 | New cache pool created: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1. Docker migrated from HDD loopback to NVMe ZFS |
## Drive Details
| Field | Value |
|-------|-------|
| Model | HUH721010ALE601 (HGST/WD Ultrastar He10) |
| Serial | 2TKK3K1D |
| Capacity | 10TB (9766436812 sectors) |
| Array slot | disk1 (slot 1) |
| Filesystem | ZFS (on md1p1) |
| Last known device | sdc |
| Accumulated md errors | 213 |
## Current State
- **Array**: STARTED, degraded — disk1 emulated from parity (`sdb`)
- **ZFS pool `disk1`**: ONLINE, 0 errors, mounted on `md1p1` (parity reconstruction)
- **Parity drive** (`sdb`, serial `7PHBNYZC`): DISK_OK, 0 errors
- **All services**: Running normally (Docker containers, VMs)
- **Risk**: If parity drive fails, data is **unrecoverable**
## Diagnosis
- Drive fails on multiple SATA ports → not a port/cable issue
- Clicking noise on boot → mechanical head failure
- dmesg shows link responds but device never becomes ready → drive electronics partially functional, platters/heads dead
- Drive is beyond DIY repair
## Root Cause
Mechanical failure of the hard drive (clicking = head crash or seized actuator). Not related to cache drive migration that happened around the same time — confirmed by syslog showing clean SATA link failure.
---
## Recovery Plan
### Step 1: Get Replacement Drive
- Must be 10TB or larger
- Check WD warranty: serial `HUH721010ALE601_2TKK3K1D` at https://support-en.wd.com/app/warrantycheck
- Any 3.5" SATA drive works (doesn't need to match model)
### Step 2: Install & Rebuild
1. Power off the server
2. Remove dead drive, install replacement in any SATA port
3. Boot Unraid
4. Go to **Main** → click on **Disk 1** (will show as "Not installed" or unmapped)
5. Stop the array
6. Assign the new drive to the **Disk 1** slot
7. Start the array — Unraid will prompt to **rebuild** from parity
8. Rebuild will take many hours for 10TB — do NOT interrupt
### Step 3: Post-Rebuild
1. Verify ZFS pool `disk1` is healthy: `zpool status disk1`
2. Run parity check from Unraid UI
3. Run SMART extended test on new drive: `smartctl -t long /dev/sdX`
4. Verify all ZFS datasets are intact
---
## Notes
- Server is safe to run in degraded mode indefinitely, just without parity protection
- Avoid heavy writes if possible to reduce risk to parity drive
- New cache pool (3x Samsung 990 EVO Plus 1TB, ZFS RAIDZ1) now hosts all Docker containers
- Old docker.img loopback deleted from disk1 (200GB freed)
- Since disk1 uses ZFS on md, the rebuild reconstructs the raw block device — ZFS doesn't need any separate repair