Compare commits
14 Commits
ec9659d0cb
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
6320c0f8d9 | ||
|
|
8aef54992a | ||
|
|
7867b5c950 | ||
|
|
cdb961f943 | ||
|
|
877aa71d3e | ||
|
|
bf6a62a275 | ||
|
|
0119c4d4d8 | ||
|
|
2a522d56d2 | ||
|
|
4e726a4963 | ||
|
|
ecbce1ca94 | ||
|
|
d2f49e9130 | ||
|
|
4305657ad0 | ||
|
|
5af3c9478b | ||
|
|
c93f7da733 |
12
CLAUDE.md
12
CLAUDE.md
@@ -7,6 +7,17 @@ When user says "connect unraid", use this command:
|
||||
ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422
|
||||
```
|
||||
|
||||
## Connect to Nobara (Failover Node)
|
||||
|
||||
```bash
|
||||
ssh nobara
|
||||
# or: ssh -i ~/.ssh/id_ed25519_nobara jazzymc@192.168.10.103
|
||||
# sudo password: (same as SSH login)
|
||||
```
|
||||
|
||||
Failover stack: `/home/failover/docker-compose.yml`
|
||||
Keepalived: `systemctl status keepalived`
|
||||
|
||||
## Connect to MikroTik HAP ax³
|
||||
|
||||
SSH port is **2222** (not 22):
|
||||
@@ -56,6 +67,7 @@ infrastructure/
|
||||
├── 07-WIFI-CAPSMAN-CONFIG.md # WiFi and CAPsMAN settings
|
||||
├── 08-DNS-ARCHITECTURE.md # DNS failover architecture
|
||||
├── 09-TAILSCALE-VPN.md # Tailscale VPN setup
|
||||
├── 10-FAILOVER-NOBARA.md # VRRP failover to Nobara
|
||||
├── CHANGELOG.md # Change history
|
||||
├── archive/ # Completed/legacy docs
|
||||
│ └── vlan-migration/ # VLAN migration project artifacts
|
||||
|
||||
@@ -15,6 +15,7 @@
|
||||
| **CI/CD** | https://ci.xtrm-lab.org |
|
||||
| **DNS Primary** | dns.xtrm-lab.org |
|
||||
| **DNS Secondary** | dns2.xtrm-lab.org |
|
||||
| **Failover VIP** | 192.168.10.250 |
|
||||
|
||||
---
|
||||
|
||||
@@ -31,6 +32,7 @@ docs/
|
||||
├── 07-WIFI-CAPSMAN-CONFIG.md # WiFi and CAPsMAN settings
|
||||
├── 08-DNS-ARCHITECTURE.md # DNS failover architecture
|
||||
├── 09-TAILSCALE-VPN.md # Tailscale VPN setup
|
||||
├── 10-FAILOVER-NOBARA.md # VRRP failover to Nobara workstation
|
||||
├── CHANGELOG.md # Change history
|
||||
├── archive/ # Completed/legacy docs
|
||||
│ └── vlan-migration/ # VLAN migration project artifacts
|
||||
@@ -46,6 +48,7 @@ docs/
|
||||
|--------|-----|------|
|
||||
| HAP1 | 192.168.10.1 | Router, DNS, WiFi Controller |
|
||||
| XTRM-U | 192.168.10.20 | Production Server (Unraid) |
|
||||
| XTRM-Nobara | 192.168.10.103 | Failover Node (Nobara Linux) |
|
||||
| CSS1 | 192.168.10.3 | Distribution Switch |
|
||||
| ZX1 | 192.168.10.4 | Core Switch (2.5G) |
|
||||
| CAP | 192.168.10.6 | Wireless Access Point |
|
||||
@@ -60,6 +63,9 @@ ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422
|
||||
|
||||
# MikroTik Router
|
||||
ssh -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.1
|
||||
|
||||
# Nobara (failover node)
|
||||
ssh nobara
|
||||
```
|
||||
|
||||
---
|
||||
@@ -69,7 +75,8 @@ ssh -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.1
|
||||
1. **DNS down?** → Automatic failover to 192.168.10.10 (secondary), see `08-DNS-ARCHITECTURE.md`
|
||||
2. **Internet down?** → Check HAP1 at 192.168.10.1
|
||||
3. **Services down?** → Check Unraid at 192.168.10.20
|
||||
4. **Full outage?** → See `02-SERVICES-CRITICAL.md` startup order
|
||||
4. **Unraid maintenance?** → VRRP failover to Nobara (192.168.10.250 VIP), see `10-FAILOVER-NOBARA.md`
|
||||
5. **Full outage?** → See `02-SERVICES-CRITICAL.md` startup order
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Network Map - xtrm-lab.org
|
||||
|
||||
**Last Updated:** 2026-02-06
|
||||
**Last Updated:** 2026-02-14
|
||||
**Domain:** xtrm-lab.org
|
||||
**WAN IP:** 62.73.120.142
|
||||
|
||||
@@ -39,7 +39,7 @@ flowchart TB
|
||||
end
|
||||
|
||||
subgraph Wireless["WiFi"]
|
||||
CAP["CAP | cAP XL ac<br/>192.168.10.6"]
|
||||
CAP["CAP | cAP XL ac<br/>192.168.10.2"]
|
||||
end
|
||||
|
||||
ISP -->|"ether1 WAN"| HAP1
|
||||
@@ -116,9 +116,10 @@ flowchart TB
|
||||
| 192.168.10.1 | HAP1 \| hAP ax³ | Router |
|
||||
| 192.168.10.3 | CSS1 \| CSS326-24G-2S+ | Switch |
|
||||
| 192.168.10.4 | ZX1 \| ZX-SWTGW218AS | Switch |
|
||||
| 192.168.10.6 | CAP \| cAP XL ac | Access Point |
|
||||
| 192.168.10.2 | CAP \| cAP XL ac | Access Point |
|
||||
| 192.168.10.10 | AdGuard Home (Unraid macvlan) | DNS Secondary |
|
||||
| 192.168.10.20 | XTRM-U | Server |
|
||||
| 192.168.10.103 | XTRM-Nobara | Failover Node |
|
||||
| 192.168.10.200 | NanoKVM | Remote KVM |
|
||||
|
||||
For complete device-to-VLAN mapping, see `06-VLAN-DEVICE-ASSIGNMENT.md`.
|
||||
@@ -301,10 +302,9 @@ flowchart TB
|
||||
| SSID | Band | Security | Purpose |
|
||||
|------|------|----------|---------|
|
||||
| XTRM | 5GHz | WPA2/WPA3 | Primary devices |
|
||||
| XTRM | 2.4GHz | WPA/WPA2 | Legacy support |
|
||||
| XTRM2 | 2.4GHz | WPA/WPA2 | IoT devices |
|
||||
|
||||
**CAPsMAN:** HAP1 manages CAP access point
|
||||
**CAPsMAN:** HAP1 manages CAP XL ac (192.168.10.2) - both 2.4GHz and 5GHz radios active
|
||||
|
||||
---
|
||||
|
||||
@@ -356,6 +356,14 @@ flowchart TB
|
||||
|
||||
---
|
||||
|
||||
## SMB Shares
|
||||
|
||||
| Share | Path | Size | Access | Consumers |
|
||||
|-------|------|------|--------|-----------|
|
||||
| roms | /mnt/user/roms | 2.3 TB | Guest (read-only) | Nobara (/mnt/roms), Recalbox (network mount) |
|
||||
|
||||
---
|
||||
|
||||
## Shared Databases
|
||||
|
||||
### PostgreSQL 17 (172.18.0.13)
|
||||
|
||||
@@ -204,6 +204,25 @@ When recovering from full outage:
|
||||
|
||||
---
|
||||
|
||||
## Active Failover: XTRM-Nobara
|
||||
|
||||
Critical services are replicated on the Nobara workstation with automatic VRRP failover:
|
||||
|
||||
| Service | Primary (XTRM-U) | Failover (XTRM-Nobara) |
|
||||
|---------|-------------------|------------------------|
|
||||
| Traefik | 192.168.10.20 | 192.168.10.103 |
|
||||
| Vaultwarden | 192.168.10.20 | 192.168.10.103 |
|
||||
| Authentik | 192.168.10.20 | 192.168.10.103 |
|
||||
| AdGuard Home | 192.168.10.20 | 192.168.10.103 |
|
||||
|
||||
**VIP:** 192.168.10.250 (floats between XTRM-U and XTRM-Nobara via Keepalived VRRP)
|
||||
|
||||
**Failover time:** ~4 seconds
|
||||
|
||||
See: `10-FAILOVER-NOBARA.md` for full documentation.
|
||||
|
||||
---
|
||||
|
||||
## Future: XTRM-N1 Survival Node
|
||||
|
||||
When hardware upgrade completes, these services will have replicas on XTRM-N1:
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Other Services
|
||||
|
||||
**Last Updated:** 2026-02-06
|
||||
**Last Updated:** 2026-02-24
|
||||
|
||||
Non-critical services that enhance functionality but don't affect core network operation.
|
||||
|
||||
@@ -104,6 +104,26 @@ Non-critical services that enhance functionality but don't affect core network o
|
||||
|
||||
---
|
||||
|
||||
## Gaming
|
||||
|
||||
### Minecraft Server
|
||||
|
||||
| Component | IP | Port | Address |
|
||||
|-----------|-----|------|---------|
|
||||
| minecraft | 172.18.0.80 | 25565 | minecraft.xtrm-lab.org |
|
||||
|
||||
**Image:** itzg/minecraft-server (Java Edition)
|
||||
**Version:** Latest (1.21.11)
|
||||
**Mode:** Survival, Normal difficulty, PVP enabled
|
||||
**Max Players:** 10
|
||||
**RAM:** 2 GB
|
||||
**Online Mode:** Yes (requires paid account)
|
||||
**Data:** `/mnt/user/appdata/minecraft/data`
|
||||
**NAT:** WAN:25565 → 192.168.10.20:25565 + hairpin NAT
|
||||
**Dockge Stack:** `minecraft`
|
||||
|
||||
---
|
||||
|
||||
## Media
|
||||
|
||||
### Plex
|
||||
@@ -130,6 +150,23 @@ Non-critical services that enhance functionality but don't affect core network o
|
||||
|
||||
**Purpose:** Torrent client
|
||||
|
||||
### Roms (SMB Share)
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Share Path | /mnt/user/roms |
|
||||
| Protocol | SMB (guest access, read-only) |
|
||||
| Size | 2.3 TB (49 systems) |
|
||||
|
||||
**Consumers:**
|
||||
|
||||
| Device | Mount Point | Method |
|
||||
|--------|-------------|--------|
|
||||
| Nobara | /mnt/roms | fstab (CIFS, guest, systemd.automount) |
|
||||
| Recalbox | /recalbox/share/roms_network | custom.sh boot script (CIFS) |
|
||||
|
||||
**Recalbox:** Network roms are bind-mounted over local rom directories at boot via `/recalbox/share/system/custom.sh`. Local roms were deleted from SD card to save space.
|
||||
|
||||
---
|
||||
|
||||
## Productivity
|
||||
@@ -263,3 +300,8 @@ Non-critical services that enhance functionality but don't affect core network o
|
||||
| Pi-hole | Replaced by AdGuard Home | Removed |
|
||||
| Pangolin | Not in use | Removed |
|
||||
| Slurp'it | Replaced by Diode | Removed |
|
||||
| binhex-plexpass | Duplicate of Plex | Removed |
|
||||
| HomeAssistant_inabox | Duplicate of Home-Assistant-Container | Removed |
|
||||
| Docker-WebUI | Unused, non-functional | Removed |
|
||||
| hass-unraid | No config, unused | Removed |
|
||||
| nextcloud-aio-mastercontainer | Replaced by Nextcloud container | Removed |
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Hardware Inventory
|
||||
|
||||
**Last Updated:** 2026-01-31
|
||||
**Last Updated:** 2026-02-24
|
||||
|
||||
---
|
||||
|
||||
@@ -75,12 +75,15 @@
|
||||
|----------|-------|
|
||||
| **Role** | Wireless Access Point |
|
||||
| **Location** | Corridor (ceiling) |
|
||||
| **IP** | 192.168.10.6 |
|
||||
| **IP** | 192.168.10.2 |
|
||||
| **MAC** | 18:FD:74:54:3D:BC |
|
||||
| **OS** | RouterOS 7.x |
|
||||
| **OS** | RouterOS 7.21.1 |
|
||||
| **Serial** | HCT085KBH8B |
|
||||
| **SSH** | `ssh -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.2` |
|
||||
|
||||
**Managed by:** HAP1 CAPsMAN
|
||||
**Radios:** wifi1 (2.4GHz XTRM2), wifi2 (5GHz XTRM) - both active
|
||||
**Factory reset:** 2026-02-13 (CAPsMAN certificate regenerated)
|
||||
|
||||
---
|
||||
|
||||
@@ -106,18 +109,27 @@
|
||||
| **IP** | 192.168.10.20 |
|
||||
| **OS** | Unraid 6.x |
|
||||
|
||||
**Motherboard:** Replaced 2026-02-24 (new board, details TBD)
|
||||
|
||||
**Network:**
|
||||
| Interface | MAC | Speed |
|
||||
|-----------|-----|-------|
|
||||
| eth1 | A8:B8:E0:02:B6:15 | 2.5G |
|
||||
| eth2 | A8:B8:E0:02:B6:16 | 2.5G |
|
||||
| eth3 | A8:B8:E0:02:B6:17 | 2.5G |
|
||||
| eth4 | A8:B8:E0:02:B6:18 | 2.5G |
|
||||
| **bond0** | (virtual) | 5G aggregate |
|
||||
| br0 | 38:05:25:35:8E:7A | 2.5G |
|
||||
|
||||
**Storage:**
|
||||
- Cache: (current NVMe)
|
||||
- Array: 3.5" HDDs
|
||||
| Device | Model | Size | Role | Status |
|
||||
|--------|-------|------|------|--------|
|
||||
| sdb | HUH721010ALE601 (serial 7PHBNYZC) | 10TB | Parity | OK |
|
||||
| disk1 | HUH721010ALE601 (serial 2TKK3K1D) | 10TB | Data (ZFS) | **FAILED** — clicking/head crash, emulated from parity |
|
||||
| nvme0n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK |
|
||||
| nvme1n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK |
|
||||
| nvme2n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK |
|
||||
|
||||
**ZFS Pools:**
|
||||
| Pool | Devices | Profile | Usable | Purpose |
|
||||
|------|---------|---------|--------|---------|
|
||||
| disk1 | md1p1 (parity-emulated) | single | 9.1TB | Main data (roms, media, appdata, backups) |
|
||||
| cache | 3x Samsung 990 EVO Plus 1TB NVMe | RAIDZ1 | ~1.8TB | Docker, containers |
|
||||
|
||||
**Virtual IPs:**
|
||||
| IP | Purpose |
|
||||
@@ -160,16 +172,56 @@
|
||||
|
||||
---
|
||||
|
||||
## Workstations
|
||||
|
||||
### XTRM-Nobara | Nobara Linux Workstation
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Role** | Workstation + Failover Node |
|
||||
| **Location** | Main Bedroom |
|
||||
| **IP** | 192.168.10.103 |
|
||||
| **MAC** | 08:92:04:C6:07:C5 |
|
||||
| **OS** | Nobara Linux (Fedora 43 based) |
|
||||
| **CPU** | AMD Ryzen 9 6900HX (8C/16T) |
|
||||
| **RAM** | 16 GB |
|
||||
| **Storage** | 477GB NVMe (OS) + 1.8TB NVMe (btrfs pool with OS drive) |
|
||||
| **Network** | enp5s0 (2.5G Ethernet) |
|
||||
| **Switch Port** | CSS1-20 via PP1 M2 |
|
||||
| **SSH** | `ssh nobara` (key: ~/.ssh/id_ed25519_nobara) |
|
||||
|
||||
**Failover Services:** Traefik, Vaultwarden, Authentik, AdGuard Home
|
||||
**Keepalived:** systemd service, BACKUP priority 100, VIP 192.168.10.250
|
||||
|
||||
---
|
||||
|
||||
## End Devices (Wired)
|
||||
|
||||
| Device | Room | Outlet | Switch Port | MAC |
|
||||
|--------|------|--------|-------------|-----|
|
||||
| LGTV | Living Room | L3 | CSS1-24 | - |
|
||||
| XTRM-Nobara | Main Bedroom | M2 | CSS1-20 | 08:92:04:C6:07:C5 |
|
||||
| Dell Display | Main Bedroom | M3 | CSS1-21 | - |
|
||||
| Dancho | Boys Room | B1 | CSS1-18 | - |
|
||||
| KVM Switch | - | Direct | CSS1-2 | - |
|
||||
|
||||
## End Devices (WiFi)
|
||||
|
||||
### Recalbox | Raspberry Pi 3
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Role** | Retro Gaming Console |
|
||||
| **Location** | Living Room |
|
||||
| **IP** | 192.168.25.30 |
|
||||
| **MAC** | B8:27:EB:32:B2:13 |
|
||||
| **OS** | Recalbox |
|
||||
| **VLAN** | 25 (Kids) |
|
||||
| **SSID** | XTRM2 (2.4GHz) |
|
||||
| **SSH** | `ssh root@192.168.25.30` (password: `recalboxroot`) |
|
||||
|
||||
**Roms:** Network-mounted from Unraid SMB share (//192.168.10.20/roms)
|
||||
**Boot script:** `/recalbox/share/system/custom.sh` (mounts roms at boot)
|
||||
|
||||
---
|
||||
|
||||
## Future Hardware (Planned)
|
||||
@@ -180,6 +232,7 @@ See: `wip/UPGRADE-2026-HARDWARE.md`
|
||||
|--------|------|--------|
|
||||
| XTRM-N5 (Minisforum N5 Air) | Production server | Planned |
|
||||
| XTRM-N1 (N100 ITX) | Survival node | Planned |
|
||||
| 3x Samsung 990 EVO Plus 1TB | XTRM-N5 NVMe pool | Planned |
|
||||
| 3x Samsung 990 EVO Plus 1TB | XTRM-U cache pool (RAIDZ1) | **Installed** 2026-02-24 |
|
||||
| 2x Fikwot FX501Pro 512GB | XTRM-N1 mirror | Planned |
|
||||
| 1x 10TB+ HDD | Replace failed disk1 | **Needed** |
|
||||
| MikroTik CRS310-8G+2S+IN | Replace ZX1 | Future |
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# VLAN Device Assignment Map
|
||||
|
||||
**Last Updated:** 2026-02-06
|
||||
**Last Updated:** 2026-02-14
|
||||
**Purpose:** Complete inventory of all network devices with VLAN assignments
|
||||
|
||||
---
|
||||
@@ -29,6 +29,7 @@
|
||||
| 192.168.10.3 | F4:1E:57:C9:BD:09 | CSS326-24G-2S+ | 24-port switch | Room distribution |
|
||||
| 192.168.10.4 | 1C:2A:A3:1E:78:67 | ZX1 (ZX-SWTGW218AS) | 8-port 2.5G switch | Server rack |
|
||||
| 192.168.10.20 | A8:B8:E0:02:B6:15 | XTRM-U (Unraid) | Main server | Docker host, NAS |
|
||||
| 192.168.10.103 | 08:92:04:C6:07:C5 | XTRM-Nobara | Failover node | Keepalived BACKUP |
|
||||
| 192.168.10.200 | 48:DA:35:6F:BE:50 | NanoKVM | Remote KVM | IPMI alternative |
|
||||
| 172.17.0.2 | 46:D0:27:F7:1F:CA | AdGuard (MikroTik) | DNS (Router) | Primary DNS, DoH/DoT |
|
||||
| 172.17.0.3 | 0C:AB:39:8D:8C:FC | Tailscale (MikroTik) | VPN container | Remote access |
|
||||
@@ -59,6 +60,7 @@
|
||||
| 192.168.25.14 | 90:91:64:70:0D:86 | Notebook | Kimi | |
|
||||
| 192.168.25.15 | 2A:2B:BA:86:D4:AF | iPhone | Kimi | |
|
||||
| 192.168.25.18 | A4:D1:D2:7B:52:BE | iPad | Compusbg | Work tablet |
|
||||
| 192.168.25.30 | B8:27:EB:32:B2:13 | Recalbox (RPi3) | Gaming | Retro gaming, WiFi XTRM2 |
|
||||
|
||||
---
|
||||
|
||||
@@ -67,9 +69,10 @@
|
||||
| IP | MAC Address | Device | Location | Comment |
|
||||
|----|-------------|--------|----------|---------|
|
||||
| 192.168.30.10 | 50:2C:C6:7A:55:39 | Air Conditioner | Living Room | GREE Electric |
|
||||
| 192.168.30.11 | B0:37:95:79:AF:9B | LG TV | Living Room | LAN (not connected) |
|
||||
| 192.168.30.12 | DC:03:98:6B:5A:3A | LG TV | Living Room | WiFi (active) |
|
||||
| 192.168.30.13 | D0:E7:82:F7:65:DD | Chromecast | Living Room | Streaming |
|
||||
| 192.168.30.30 | 64:4E:D7:D8:43:3E | HP LaserJet M110w | Office | WiFi printer |
|
||||
| 192.168.30.40 | B0:37:95:79:AF:9B | LG TV (Ethernet) | Living Room | Use ONE interface only for AirPlay |
|
||||
| 192.168.30.41 | DC:03:98:6B:5A:3A | LG TV (WiFi) | Living Room | Use ONE interface only for AirPlay |
|
||||
| 192.168.30.42 | D0:E7:82:F7:65:DD | Chromecast | Living Room | Requires WPA2+AES (no TKIP) |
|
||||
| 192.168.30.14 | B0:4A:39:3F:9A:14 | Roborock S7 Vacuum | Living Room | Needs cloud access |
|
||||
| 192.168.30.20 | 94:27:70:1E:0C:EE | Bosch Smart Oven | Kitchen | Home Connect app |
|
||||
| 192.168.30.21 | C8:D7:78:40:65:40 | Bosch Dishwasher | Kitchen | Home Connect app |
|
||||
@@ -95,7 +98,7 @@
|
||||
|
||||
| IP | MAC Address | Device | Purpose | Comment |
|
||||
|----|-------------|--------|---------|---------|
|
||||
| 192.168.40.19 | 64:4E:D7:D8:43:3E | HP LaserJet | Network printer | Wired connection |
|
||||
| — | — | — | — | Printer moved to VLAN 30 |
|
||||
|
||||
---
|
||||
|
||||
@@ -123,6 +126,7 @@ A8:B8:E0:02:B6:15 XTRM-U
|
||||
F4:1E:57:C9:BD:09 CSS326
|
||||
1C:2A:A3:1E:78:67 ZX1
|
||||
48:DA:35:6F:BE:50 NanoKVM
|
||||
08:92:04:C6:07:C5 XTRM-Nobara (Failover)
|
||||
```
|
||||
|
||||
**VLAN 20 - Trusted:**
|
||||
@@ -140,7 +144,8 @@ A4:D1:D2:7B:52:BE Compusbg iPad
|
||||
|
||||
**VLAN 30 - IoT:**
|
||||
```
|
||||
B0:37:95:79:AF:9B LG TV (LAN)
|
||||
64:4E:D7:D8:43:3E HP LaserJet M110w
|
||||
B0:37:95:79:AF:9B LG TV (Ethernet)
|
||||
DC:03:98:6B:5A:3A LG TV (WiFi)
|
||||
D0:E7:82:F7:65:DD Chromecast
|
||||
B0:4A:39:3F:9A:14 Roborock Vacuum
|
||||
@@ -163,7 +168,7 @@ FC:D5:D9:EB:6A:82 Settop Box (LAN)
|
||||
|
||||
**VLAN 40 - Servers:**
|
||||
```
|
||||
64:4E:D7:D8:43:3E HP LaserJet
|
||||
(empty - printer moved to VLAN 30)
|
||||
```
|
||||
|
||||
**VLAN 50 - Guest:**
|
||||
@@ -180,14 +185,14 @@ D0:C9:07:8C:C9:46 Private Vendor 2
|
||||
|
||||
| VLAN | Device Count | Comment |
|
||||
|------|--------------|---------|
|
||||
| 10 - Mgmt | 9 | Infrastructure only |
|
||||
| 10 - Mgmt | 10 | Infrastructure + failover |
|
||||
| 20 - Trusted | 9 | Family devices |
|
||||
| 25 - Kids | 4 | Kids devices (subset of 20) |
|
||||
| 25 - Kids | 5 | Kids devices + Recalbox |
|
||||
| 30 - IoT | 14 | Smart home devices |
|
||||
| 35 - Cameras | 1 | Security |
|
||||
| 40 - Servers | 1 | Services |
|
||||
| 50 - Guest | 4 | Unknown/unidentified devices |
|
||||
| **Total** | **38** | All devices categorized |
|
||||
| **Total** | **40** | All devices categorized |
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# WiFi and CAPsMAN Configuration
|
||||
|
||||
**Last Updated:** 2026-02-02
|
||||
**Last Updated:** 2026-02-26
|
||||
**Purpose:** Document WiFi network settings, CAPsMAN configuration, and device compatibility requirements
|
||||
|
||||
---
|
||||
@@ -23,8 +23,8 @@
|
||||
| SSID | XTRM |
|
||||
| Band | 5GHz |
|
||||
| Mode | 802.11ax (WiFi 6) |
|
||||
| Channel | Auto (DFS enabled) |
|
||||
| Width | 80MHz |
|
||||
| Channel | 5180 MHz (ch 36) |
|
||||
| Width | 40MHz |
|
||||
| Security | WPA2-PSK + WPA3-PSK |
|
||||
| Cipher | CCMP (AES) |
|
||||
| 802.11r (FT) | Enabled |
|
||||
@@ -98,47 +98,75 @@ If devices still can't connect, use WPA-only with TKIP-only:
|
||||
| Interfaces | bridge, vlan10-mgmt |
|
||||
| Certificate | Auto-generated |
|
||||
|
||||
### CAP Device (CAP XL ac - 192.168.10.2)
|
||||
### CAP Device (cAP XL ac - 192.168.10.2)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| caps-man-addresses | 192.168.10.1 |
|
||||
| discovery-interfaces | bridgeLocal |
|
||||
| slaves-datapath | capdp (bridge=bridgeLocal, vlan-id=40) |
|
||||
| certificate | request |
|
||||
| RouterOS | 7.21.1 |
|
||||
| SSH Port | 2222 |
|
||||
| SSH (via proxy) | See ProxyJump command below |
|
||||
|
||||
**SSH Access:** Direct SSH to CAP is unreliable. Use ProxyJump through Unraid:
|
||||
```bash
|
||||
ssh -o ProxyCommand="ssh -i ~/.ssh/id_ed25519_unraid -p 422 -W %h:%p root@192.168.10.20" -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.2
|
||||
```
|
||||
|
||||
### CAP Bridge VLAN Filtering
|
||||
|
||||
The CAP runs bridge VLAN filtering to properly tag/untag WiFi client traffic before sending it to the HAP over the trunk link (ether1):
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| bridgeLocal | vlan-filtering=yes, pvid=10 |
|
||||
| ether1 (trunk) | bridge port, PVID=10 |
|
||||
| wifi1, wifi2 | dynamic bridge ports, PVID=40 (set by datapath vlan-id) |
|
||||
|
||||
**Bridge VLAN Table:**
|
||||
|
||||
| VLAN | ether1 | wifi1 | wifi2 | bridgeLocal | Purpose |
|
||||
|------|--------|-------|-------|-------------|---------|
|
||||
| 10 | untagged | - | - | untagged | Management |
|
||||
| 20 | tagged | tagged | tagged | - | Trusted |
|
||||
| 25 | tagged | tagged | tagged | - | Kids |
|
||||
| 30 | tagged | tagged | tagged | - | IoT |
|
||||
| 35 | tagged | tagged | tagged | - | Cameras |
|
||||
| 40 | tagged | untagged | untagged | - | CatchAll (default) |
|
||||
|
||||
### CAP Interfaces
|
||||
|
||||
| Interface | Radio | Band | SSID | Status |
|
||||
|-----------|-------|------|------|--------|
|
||||
| cap-wifi1 | wifi1 | 2.4GHz | XTRM2 | Working |
|
||||
| cap-wifi2 | wifi2 | 5GHz | XTRM | Channel issues (disabled) |
|
||||
| Interface | Radio | Band | SSID | Security | Status |
|
||||
|-----------|-------|------|------|----------|--------|
|
||||
| cap-wifi1 | wifi2 | 5GHz | XTRM | WPA2/WPA3-PSK, CCMP | Working (Ch 52/5260, 40MHz, DFS) |
|
||||
| cap-wifi2 | wifi1 | 2.4GHz | XTRM2 | WPA2-PSK, CCMP | Working (Ch 6/2437, 20MHz) |
|
||||
|
||||
### CAP Access List Rule
|
||||
|
||||
CAP clients bypass VLAN assignment (go to VLAN 10):
|
||||
|
||||
```routeros
|
||||
/interface wifi access-list add \
|
||||
interface=cap-wifi1 \
|
||||
action=accept \
|
||||
comment="CAP clients - no VLAN" \
|
||||
place-before=0
|
||||
```
|
||||
**Note:** cap-wifi2 uses WPA2+CCMP only (not WPA+TKIP like HAP's local wifi2). Legacy IoT devices requiring TKIP will only work on HAP1's local wifi2.
|
||||
|
||||
---
|
||||
|
||||
## WiFi Access List (VLAN Assignment)
|
||||
## WiFi Access List
|
||||
|
||||
Devices are assigned to VLANs based on MAC address:
|
||||
**Status:** VLAN assignment via access list is **active**. Each entry has a `vlan-id` that assigns the device to the correct VLAN upon WiFi association. This works on both HAP (local) and CAP (remote, via bridge VLAN filtering).
|
||||
|
||||
| VLAN | Purpose | Example Devices |
|
||||
|------|---------|-----------------||
|
||||
| 20 | Trusted | MacBooks, iPhones, Samsung phones |
|
||||
| 25 | Kids | Kids devices |
|
||||
| 30 | IoT | Smart home devices, Chromecast, Bosch appliances |
|
||||
| 40 | Catch-All | Unknown devices (default) |
|
||||
**30+ entries** configured (MAC-based accept rules with VLAN IDs + 1 default catch-all):
|
||||
|
||||
### Current Access List
|
||||
| # | MAC | Device | VLAN |
|
||||
|---|-----|--------|------|
|
||||
| 0 | AA:ED:8B:2A:40:F1 | Samsung S25 Ultra - Kaloyan | 20 |
|
||||
| 1 | 82:6D:FB:D9:E0:47 | MacBook Air - Nora | 20 |
|
||||
| 12 | CE:B8:11:EA:8D:55 | MacBook - Kaloyan | 20 |
|
||||
| 13 | BE:A7:95:87:19:4A | MacBook 5GHz - Kaloyan | 20 |
|
||||
| 27 | B8:27:EB:32:B2:13 | RecalBox RPi3 | 25 |
|
||||
| 28 | CC:5E:F8:D3:37:D3 | ASUS ROG Ally - Kaloyan | 20 |
|
||||
| 31 | C8:5C:CC:40:B4:AA | Xiaomi Air Purifier 2 | 30 |
|
||||
| 32 | (any) | Default - VLAN40 | 40 (catch-all) |
|
||||
|
||||
**Default behavior:** Devices not in the access list get VLAN 40 (CatchAll) via the default rule and the datapath `vlan-id=40`.
|
||||
|
||||
### Show Full Access List
|
||||
|
||||
```routeros
|
||||
/interface wifi access-list print
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# DNS Architecture with AdGuard Failover
|
||||
|
||||
**Last Updated:** 2026-02-06
|
||||
**Last Updated:** 2026-02-26
|
||||
|
||||
---
|
||||
|
||||
@@ -194,8 +194,10 @@ Settings are synced from Unraid (source of truth) to MikroTik every 30 minutes.
|
||||
|
||||
### Sync Container
|
||||
|
||||
Container: `adguardhome-sync` at 192.168.10.11 (br0 macvlan, static IP)
|
||||
|
||||
```yaml
|
||||
# /mnt/user/appdata/adguard-sync/adguardhome-sync.yaml
|
||||
# /mnt/user/appdata/dockge/stacks/adguard-sync/adguardhome-sync.yaml
|
||||
cron: "*/30 * * * *"
|
||||
runOnStart: true
|
||||
|
||||
@@ -204,22 +206,13 @@ origin:
|
||||
username: jazzymc
|
||||
password: 7RqWElENNbZnPW
|
||||
|
||||
replicas:
|
||||
- url: http://192.168.10.1:3000
|
||||
replica:
|
||||
url: http://192.168.10.1:3000
|
||||
username: jazzymc
|
||||
password: 7RqWElENNbZnPW
|
||||
|
||||
features:
|
||||
dns:
|
||||
serverConfig: false
|
||||
accessLists: true
|
||||
rewrites: true
|
||||
filters: true
|
||||
clientSettings: true
|
||||
services: true
|
||||
```
|
||||
|
||||
**Note:** The sync container must be connected to both `dockerproxy` and `br0` networks to reach both AdGuard instances.
|
||||
**Note:** The sync container is on the `br0` macvlan network with a static IP to avoid conflicts with infrastructure devices.
|
||||
|
||||
---
|
||||
|
||||
|
||||
276
docs/10-FAILOVER-NOBARA.md
Normal file
276
docs/10-FAILOVER-NOBARA.md
Normal file
@@ -0,0 +1,276 @@
|
||||
# Failover Infrastructure - Nobara (XTRM-Nobara)
|
||||
|
||||
**Last Updated:** 2026-02-13
|
||||
|
||||
**Purpose:** Temporary failover for critical services during Unraid maintenance windows.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
A Docker-based replica of critical services runs on the Nobara Linux workstation (XTRM-Nobara) with automatic failover via Keepalived VRRP. When Unraid goes offline, the virtual IP floats to Nobara and services continue operating.
|
||||
|
||||
```
|
||||
Clients → 192.168.10.250 (VIP) → XTRM-U (MASTER, priority 150)
|
||||
↓ failover (~4 seconds)
|
||||
XTRM-Nobara (BACKUP, priority 100)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Machines
|
||||
|
||||
| Role | Host | IP | Interface | Priority |
|
||||
|------|------|-----|-----------|----------|
|
||||
| **MASTER** | XTRM-U (Unraid) | 192.168.10.20 | br0 | 150 |
|
||||
| **BACKUP** | XTRM-Nobara | 192.168.10.103 | enp5s0 | 100 |
|
||||
| **VIP** | Shared | 192.168.10.250 | — | — |
|
||||
|
||||
---
|
||||
|
||||
## Replicated Services
|
||||
|
||||
| Service | Image | Ports (Nobara) | Domain |
|
||||
|---------|-------|----------------|--------|
|
||||
| **Traefik** | traefik:latest | 80, 443, 8080 | *.xtrm-lab.org |
|
||||
| **Vaultwarden** | vaultwarden/server:latest | internal:80 | vault.xtrm-lab.org |
|
||||
| **Authentik** | ghcr.io/goauthentik/server:2025.8.1 | internal:9000 | auth.xtrm-lab.org |
|
||||
| **Authentik Worker** | ghcr.io/goauthentik/server:2025.8.1 | — | — |
|
||||
| **PostgreSQL** | postgres:17 | internal:5432 | — |
|
||||
| **Redis** | redis:7-alpine | internal:6379 | — |
|
||||
| **AdGuard Home** | adguard/adguardhome:latest | 192.168.10.103:53, 3000 | — |
|
||||
|
||||
---
|
||||
|
||||
## File Locations
|
||||
|
||||
### Nobara (XTRM-Nobara)
|
||||
|
||||
| Path | Contents |
|
||||
|------|----------|
|
||||
| `/home/failover/docker-compose.yml` | Main compose stack |
|
||||
| `/home/failover/traefik/` | Traefik config, certs, acme.json |
|
||||
| `/home/failover/vaultwarden/` | Vaultwarden data (copy from Unraid) |
|
||||
| `/home/failover/authentik/` | Authentik media & templates |
|
||||
| `/home/failover/postgres/` | PostgreSQL data + initial dump |
|
||||
| `/home/failover/redis/` | Redis data |
|
||||
| `/home/failover/adguard/` | AdGuard conf & work dirs |
|
||||
| `/etc/keepalived/keepalived.conf` | Keepalived VRRP config |
|
||||
| `/usr/local/bin/check_failover.sh` | Health check script |
|
||||
| `/usr/local/bin/failover-notify.sh` | State change notification script |
|
||||
| `/var/log/keepalived-failover.log` | Failover event log |
|
||||
|
||||
### Unraid (XTRM-U)
|
||||
|
||||
| Path | Contents |
|
||||
|------|----------|
|
||||
| `/mnt/user/appdata/keepalived/keepalived.conf` | Keepalived VRRP config |
|
||||
| `/mnt/user/appdata/keepalived/check_services.sh` | Health check script |
|
||||
|
||||
---
|
||||
|
||||
## Keepalived Configuration
|
||||
|
||||
### VRRP Parameters
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Virtual Router ID | 51 |
|
||||
| Auth Type | PASS |
|
||||
| Auth Password | xtrm2026 |
|
||||
| Advertisement Interval | 1 second |
|
||||
| Health Check Interval | 5 seconds |
|
||||
| Fail Threshold | 3 missed checks |
|
||||
| Recovery Threshold | 2 successful checks |
|
||||
|
||||
### Unraid (MASTER)
|
||||
|
||||
- Runs as Docker container: `local/keepalived` (built from alpine + keepalived + curl)
|
||||
- Priority: 150 (+ health check weight 2 = 152 when healthy)
|
||||
- Health check: curls `http://localhost:8183/api/overview` (Traefik dashboard)
|
||||
- Preemption: enabled (will reclaim VIP from Nobara when healthy)
|
||||
|
||||
```bash
|
||||
# Start/stop on Unraid
|
||||
docker start keepalived
|
||||
docker stop keepalived
|
||||
docker logs keepalived
|
||||
```
|
||||
|
||||
### Nobara (BACKUP)
|
||||
|
||||
- Runs as systemd service: `keepalived.service`
|
||||
- Priority: 100 (+ health check weight 2 = 102 when healthy)
|
||||
- Health check: verifies Traefik and Vaultwarden containers are running
|
||||
- `nopreempt` set (won't fight for VIP if Unraid is healthy)
|
||||
|
||||
```bash
|
||||
# Start/stop on Nobara
|
||||
sudo systemctl start keepalived
|
||||
sudo systemctl stop keepalived
|
||||
sudo journalctl -u keepalived -f
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## DNS Strategy
|
||||
|
||||
**Approach:** Local DNS override via AdGuard Home.
|
||||
|
||||
To route traffic through the VIP for internal clients, configure AdGuard DNS rewrite rules to resolve `*.xtrm-lab.org` → `192.168.10.250`. External (Cloudflare) DNS remains pointed at Unraid's public IP.
|
||||
|
||||
---
|
||||
|
||||
## Operations
|
||||
|
||||
### Before Maintenance (Data Sync)
|
||||
|
||||
Run these commands from the Mac to sync latest data to Nobara:
|
||||
|
||||
```bash
|
||||
# 1. Sync Vaultwarden data
|
||||
ssh unraid "tar czf - -C /mnt/user/appdata vaultwarden/" | \
|
||||
ssh nobara "tar xzf - -C /home/failover/"
|
||||
|
||||
# 2. Dump and sync Authentik database
|
||||
ssh unraid "docker exec postgresql17 pg_dump -U authentik_user authentik_db" | \
|
||||
ssh nobara "cat > /home/failover/postgres/authentik_dump.sql"
|
||||
|
||||
# 3. Sync AdGuard config
|
||||
ssh unraid "tar czf - -C /mnt/user/appdata/adguardhome conf/ work/" | \
|
||||
ssh nobara "tar xzf - -C /home/failover/adguard/"
|
||||
|
||||
# 4. Sync Traefik config and certs
|
||||
ssh unraid "tar czf - -C /mnt/user/appdata/traefik traefik.yml dynamic.yml acme.json certs/" | \
|
||||
ssh nobara "tar xzf - -C /home/failover/traefik/"
|
||||
```
|
||||
|
||||
**Note:** `ssh unraid` = `ssh -i ~/.ssh/id_ed25519_unraid -p 422 root@192.168.10.20`
|
||||
|
||||
### Start Failover Services
|
||||
|
||||
```bash
|
||||
# On Nobara
|
||||
cd /home/failover
|
||||
sudo docker compose up -d
|
||||
sudo systemctl start keepalived
|
||||
```
|
||||
|
||||
### Stop Failover Services
|
||||
|
||||
```bash
|
||||
# On Nobara
|
||||
cd /home/failover
|
||||
sudo docker compose down
|
||||
sudo systemctl stop keepalived
|
||||
```
|
||||
|
||||
### Test Failover
|
||||
|
||||
```bash
|
||||
# 1. Check VIP location
|
||||
ssh unraid "ip addr show br0 | grep inet"
|
||||
ssh nobara "ip addr show enp5s0 | grep inet"
|
||||
|
||||
# 2. Simulate Unraid failure
|
||||
ssh unraid "docker stop keepalived"
|
||||
|
||||
# 3. Verify VIP moved to Nobara (wait ~4 seconds)
|
||||
ssh nobara "ip addr show enp5s0 | grep inet"
|
||||
|
||||
# 4. Restore Unraid
|
||||
ssh unraid "docker start keepalived"
|
||||
|
||||
# 5. Verify VIP returned to Unraid
|
||||
ssh unraid "ip addr show br0 | grep inet"
|
||||
```
|
||||
|
||||
### Check Status
|
||||
|
||||
```bash
|
||||
# Nobara service status
|
||||
ssh nobara "sudo docker ps --format 'table {{.Names}}\t{{.Status}}'"
|
||||
|
||||
# Nobara keepalived state
|
||||
ssh nobara "sudo journalctl -u keepalived -n 10 --no-pager"
|
||||
|
||||
# Unraid keepalived state
|
||||
ssh unraid "docker logs keepalived --tail 10"
|
||||
|
||||
# Which machine holds the VIP?
|
||||
ping -c 1 192.168.10.250
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Traefik Configuration (Failover)
|
||||
|
||||
The Nobara Traefik instance has a **reduced** dynamic.yml that only serves the four critical services:
|
||||
|
||||
| Router | Domain | Backend |
|
||||
|--------|--------|---------|
|
||||
| vaultwarden-secure | vault.xtrm-lab.org | http://vaultwarden:80 |
|
||||
| authentik-secure | auth.xtrm-lab.org | http://authentik:9000 |
|
||||
| traefik-secure | traefik.xtrm-lab.org | api@internal |
|
||||
|
||||
TLS certificates are shared (copied from Unraid's acme.json + static certs).
|
||||
|
||||
---
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Data is a point-in-time snapshot.** Changes made on Unraid after the last sync are not reflected on Nobara. Re-sync before maintenance.
|
||||
- **No real-time replication.** Vaultwarden passwords saved during failover will not sync back to Unraid automatically.
|
||||
- **Only critical services replicated.** Other services (Plex, Gitea, NetBox, etc.) will be offline during maintenance.
|
||||
- **External DNS not updated.** Failover only works for clients using the local DNS (AdGuard) that resolves to the VIP. External access via Cloudflare will not failover.
|
||||
|
||||
---
|
||||
|
||||
## SSH Access
|
||||
|
||||
```bash
|
||||
# From Mac to Nobara (passwordless, key-based)
|
||||
ssh nobara
|
||||
# or: ssh -i ~/.ssh/id_ed25519_nobara jazzymc@192.168.10.103
|
||||
|
||||
# Sudo on Nobara requires password: (check password manager)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recovery After Maintenance
|
||||
|
||||
1. Bring Unraid back online
|
||||
2. Verify all Unraid services are running: `docker ps`
|
||||
3. Keepalived on Unraid will auto-reclaim VIP (preemption)
|
||||
4. Stop failover on Nobara: `cd /home/failover && sudo docker compose down`
|
||||
5. If Vaultwarden was used during failover, manually export/import any new entries
|
||||
|
||||
---
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
┌─────────────────────┐
|
||||
│ 192.168.10.250 │
|
||||
│ (VRRP VIP) │
|
||||
└─────────┬───────────┘
|
||||
│
|
||||
┌───────────────┼───────────────┐
|
||||
│ │
|
||||
┌─────────▼─────────┐ ┌─────────▼─────────┐
|
||||
│ XTRM-U (Unraid) │ │ XTRM-Nobara │
|
||||
│ 192.168.10.20 │ │ 192.168.10.103 │
|
||||
│ MASTER (150) │ │ BACKUP (100) │
|
||||
│ │ │ │
|
||||
│ ┌──────────────┐ │ │ ┌──────────────┐ │
|
||||
│ │ Traefik │ │ │ │ Traefik │ │
|
||||
│ │ Vaultwarden │ │ │ │ Vaultwarden │ │
|
||||
│ │ Authentik │ │ │ │ Authentik │ │
|
||||
│ │ AdGuard │ │ │ │ AdGuard │ │
|
||||
│ │ + 25 more │ │ │ │ PostgreSQL │ │
|
||||
│ └──────────────┘ │ │ │ Redis │ │
|
||||
│ │ │ └──────────────┘ │
|
||||
│ Keepalived (Docker)│ │ Keepalived (systemd)│
|
||||
└────────────────────┘ └────────────────────┘
|
||||
```
|
||||
167
docs/11-CROSS-VLAN-CASTING.md
Normal file
167
docs/11-CROSS-VLAN-CASTING.md
Normal file
@@ -0,0 +1,167 @@
|
||||
# Cross-VLAN Casting & Streaming
|
||||
|
||||
Configuration for casting/streaming from VLANs 10 (Mgmt), 20 (Trusted), and 25 (Kids) to devices on VLAN 30 (IoT).
|
||||
|
||||
## Casting Devices
|
||||
|
||||
| Device | MAC (Ethernet) | MAC (WiFi) | Static IP | VLAN |
|
||||
|--------|---------------|------------|-----------|------|
|
||||
| LG TV (webOS) | B0:37:95:79:AF:9B | DC:03:98:6B:5A:3A | .40 (eth) / .41 (wifi) | 30 |
|
||||
| Chromecast | — | D0:E7:82:F7:65:DD | .42 | 30 |
|
||||
|
||||
All IPs in subnet `192.168.30.0/24`.
|
||||
|
||||
## What Works
|
||||
|
||||
| Feature | From VLAN 20/25/10 | Notes |
|
||||
|---------|-------------------|-------|
|
||||
| AirPlay (Mac → LG TV) | Yes | TV must use ONE interface only (see below) |
|
||||
| Smart View (Samsung → LG TV) | Yes | Works without issues |
|
||||
| YouTube Cast (phone → TV/Chromecast) | Yes | Via TV Link Code, not device discovery |
|
||||
| Chromecast casting | Yes | Requires mDNS repeater |
|
||||
|
||||
## What Doesn't Work
|
||||
|
||||
| Feature | Reason |
|
||||
|---------|--------|
|
||||
| LG ThinQ remote app | Client-side subnet check — app refuses if phone and TV are on different subnets. No workaround. |
|
||||
|
||||
## MikroTik Configuration
|
||||
|
||||
### 1. Address List
|
||||
|
||||
```routeros
|
||||
/ip/firewall/address-list
|
||||
add list=casting-devices address=192.168.30.40 comment="LG TV Ethernet"
|
||||
add list=casting-devices address=192.168.30.41 comment="LG TV WiFi"
|
||||
add list=casting-devices address=192.168.30.42 comment="Chromecast"
|
||||
```
|
||||
|
||||
### 2. Firewall Rules (Forward Chain)
|
||||
|
||||
Bidirectional rules — casting devices need to initiate connections back (AirPlay uses separate UDP channels for timing/control).
|
||||
|
||||
```routeros
|
||||
/ip/firewall/filter
|
||||
# Forward: source VLANs → IoT
|
||||
add chain=forward action=accept src-address=192.168.20.0/24 dst-address=192.168.30.0/24 comment="Allow Trusted to IoT (casting)"
|
||||
add chain=forward action=accept src-address=192.168.25.0/24 dst-address=192.168.30.0/24 comment="Allow Kids to IoT (casting)"
|
||||
add chain=forward action=accept src-address=192.168.10.0/24 dst-address=192.168.30.0/24 comment="Allow Mgmt to IoT"
|
||||
|
||||
# Return: casting devices → source VLANs
|
||||
add chain=forward action=accept src-address-list=casting-devices dst-address=192.168.20.0/24 comment="Allow casting devices to Trusted (casting return)"
|
||||
add chain=forward action=accept src-address-list=casting-devices dst-address=192.168.25.0/24 comment="Allow casting devices to Kids (casting return)"
|
||||
add chain=forward action=accept src-address-list=casting-devices dst-address=192.168.10.0/24 comment="Allow casting devices to Mgmt (casting return)"
|
||||
```
|
||||
|
||||
These rules must be **before** the IoT block rules:
|
||||
```routeros
|
||||
# Block IoT → other VLANs (AFTER the return rules above)
|
||||
add chain=forward action=drop src-address=192.168.30.0/24 dst-address=192.168.10.0/24 comment="Block IoT to Management"
|
||||
add chain=forward action=drop src-address=192.168.30.0/24 dst-address=192.168.20.0/24 comment="Block IoT to Trusted"
|
||||
```
|
||||
|
||||
### 3. FastTrack Exclusion (Mangle)
|
||||
|
||||
FastTrack bypasses conntrack/firewall — must exclude inter-VLAN casting traffic.
|
||||
|
||||
```routeros
|
||||
/ip/firewall/mangle
|
||||
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.20.0/24 dst-address=192.168.30.0/24 comment="No FastTrack: Trusted<->IoT (casting)"
|
||||
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.30.0/24 dst-address=192.168.20.0/24 comment="No FastTrack: IoT<->Trusted (casting)"
|
||||
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.25.0/24 dst-address=192.168.30.0/24 comment="No FastTrack: Kids<->IoT (casting)"
|
||||
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.30.0/24 dst-address=192.168.25.0/24 comment="No FastTrack: IoT<->Kids (casting)"
|
||||
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.10.0/24 dst-address=192.168.30.0/24 comment="No FastTrack: Mgmt<->IoT (casting)"
|
||||
add chain=forward action=mark-connection new-connection-mark=no-fasttrack passthrough=yes src-address=192.168.30.0/24 dst-address=192.168.10.0/24 comment="No FastTrack: IoT<->Mgmt (casting)"
|
||||
```
|
||||
|
||||
FastTrack rule must use `connection-mark=no-mark`:
|
||||
```routeros
|
||||
/ip/firewall/filter
|
||||
add chain=forward action=fasttrack-connection connection-state=established,related connection-mark=no-mark comment="defconf: fasttrack"
|
||||
```
|
||||
|
||||
### 4. mDNS Repeater
|
||||
|
||||
Enables cross-VLAN device discovery (AirPlay, Chromecast).
|
||||
|
||||
```routeros
|
||||
/ip/dns/set mdns-repeat-ifaces=1-vlan10-mgmt,2-vlan20-trusted,3-vlan25-family,4-vlan30-iot
|
||||
```
|
||||
|
||||
### 5. IGMP Proxy
|
||||
|
||||
Enables multicast forwarding (SSDP/UPnP discovery).
|
||||
|
||||
```routeros
|
||||
/routing/igmp-proxy/interface
|
||||
add interface=4-vlan30-iot upstream=yes threshold=1
|
||||
add interface=2-vlan20-trusted upstream=no threshold=1
|
||||
add interface=3-vlan25-family upstream=no threshold=1
|
||||
add interface=1-vlan10-mgmt upstream=no threshold=1
|
||||
```
|
||||
|
||||
### 6. DHCP Static Leases
|
||||
|
||||
```routeros
|
||||
/ip/dhcp-server/lease
|
||||
add address=192.168.30.40 mac-address=B0:37:95:79:AF:9B server=dhcp-vlan30 comment="LG TV Ethernet"
|
||||
add address=192.168.30.41 mac-address=DC:03:98:6B:5A:3A server=dhcp-vlan30 comment="LG TV WiFi"
|
||||
add address=192.168.30.42 mac-address=D0:E7:82:F7:65:DD server=dhcp-vlan30 comment="Chromecast"
|
||||
```
|
||||
|
||||
### 7. WiFi Access List
|
||||
|
||||
```routeros
|
||||
/interface/wifi/access-list
|
||||
add mac-address=DC:03:98:6B:5A:3A action=accept vlan-id=30 comment="LG TV WiFi"
|
||||
add mac-address=D0:E7:82:F7:65:DD action=accept vlan-id=30 comment="Chromecast"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### AirPlay Black Screen on LG TV
|
||||
|
||||
**Root cause**: LG TV connected via both Ethernet AND WiFi simultaneously.
|
||||
|
||||
The TV advertises AirPlay via mDNS on one interface but streams on the other, creating asymmetric routing. The Mac connects to one IP, but the TV sends return traffic from a different IP.
|
||||
|
||||
**Fix**: Use only ONE connection on the TV — either Ethernet or WiFi, not both. Disconnect the unused one in TV settings.
|
||||
|
||||
- Ethernet MAC: `B0:37:95:79:AF:9B` → 192.168.30.40
|
||||
- WiFi MAC: `DC:03:98:6B:5A:3A` → 192.168.30.41
|
||||
|
||||
### Do NOT Use Masquerade NAT
|
||||
|
||||
Masquerade (srcnat) was tried to make cross-VLAN traffic appear local. This breaks AirPlay because:
|
||||
|
||||
- AirPlay negotiates separate UDP feedback channels (timing port 7010, control 6001, timing 6002)
|
||||
- With masquerade, TV sends feedback to the router IP (192.168.30.1) instead of the Mac's real IP
|
||||
- Result: control channel works but video/audio never arrives → black screen
|
||||
|
||||
### Chromecast Setup Issues
|
||||
|
||||
The Chromecast can only be set up via the Google Home app (no web interface).
|
||||
|
||||
**Common setup failure**: Google Home app finds the Chromecast via Bluetooth, connects to its setup WiFi hotspot, but then says "Could not communicate with your Chromecast."
|
||||
|
||||
**Fix** (on phone before setup):
|
||||
1. Disable mobile data
|
||||
2. Disable VPN
|
||||
3. Turn off "Switch to mobile data when WiFi is unstable"
|
||||
4. Enable Location services (required by Google Home)
|
||||
5. Clear Google Home app cache
|
||||
|
||||
**WiFi requirements**: Chromecast requires **WPA2 with AES/CCMP** encryption. It will NOT connect to networks using TKIP. The XTRM2 (2.4GHz) security profile was changed from TKIP to CCMP to support this:
|
||||
|
||||
```routeros
|
||||
/interface/wifi/security/set sec-xtrm2 encryption=ccmp
|
||||
```
|
||||
|
||||
### VPN Interference
|
||||
|
||||
If your Mac is connected to WireGuard VPN, the VPN overrides the default route — local traffic bypasses WiFi and goes through the VPN tunnel. Disconnect VPN before casting.
|
||||
|
||||
### CAP VLAN Limit
|
||||
|
||||
The CAP XL ac may show "maximum VLAN count for interface was reached." If a device can't connect to WiFi, try disabling the CAP interfaces temporarily to force connection to the HAP's radio directly.
|
||||
275
docs/12-DEVELOPMENT-ENVIRONMENT.md
Normal file
275
docs/12-DEVELOPMENT-ENVIRONMENT.md
Normal file
@@ -0,0 +1,275 @@
|
||||
# Development Environment
|
||||
|
||||
**Last Updated:** 2026-03-08
|
||||
|
||||
Web-based development environment running directly on Unraid, providing VS Code IDE with full host access to Claude Code, Cooperator CLI, Docker, and all project repositories.
|
||||
|
||||
---
|
||||
|
||||
## OpenVSCode Server
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **URL** | https://code.xtrm-lab.org |
|
||||
| **Auth** | Authentik forward auth (SSO) |
|
||||
| **Port** | 3100 (host-native, not a container) |
|
||||
| **Binary** | `/mnt/user/appdata/openvscode/current/` (symlink) |
|
||||
| **Config** | `/mnt/user/appdata/openvscode/config/` |
|
||||
| **Boot Script** | `/mnt/user/appdata/openvscode/start.sh` |
|
||||
| **Log** | `/mnt/user/appdata/openvscode/server.log` |
|
||||
|
||||
**Why host-native?** Running directly on Unraid (not in a container) means the VS Code terminal has full access to `claude`, `cooperator`, `node`, `npm`, `docker`, `git`, and all host tools. No volume mount hacks or container-breaking updates.
|
||||
|
||||
### Persistence
|
||||
|
||||
All data lives on the array (`/mnt/user/`) — survives Unraid OS updates:
|
||||
|
||||
| Component | Path | Purpose |
|
||||
|-----------|------|---------|
|
||||
| Server binary | `/mnt/user/appdata/openvscode/openvscode-server-v1.109.5-linux-x64/` | VS Code server |
|
||||
| Symlink | `/mnt/user/appdata/openvscode/current` → version dir | Easy version switching |
|
||||
| VS Code config | `/mnt/user/appdata/openvscode/config/` | Extensions, settings, themes |
|
||||
| Start script | `/mnt/user/appdata/openvscode/start.sh` | Startup with PATH setup |
|
||||
|
||||
### Updating OpenVSCode Server
|
||||
|
||||
```bash
|
||||
# Download new version
|
||||
cd /mnt/user/appdata/openvscode
|
||||
curl -fsSL "https://github.com/gitpod-io/openvscode-server/releases/download/openvscode-server-vX.Y.Z/openvscode-server-vX.Y.Z-linux-x64.tar.gz" -o new.tar.gz
|
||||
tar xzf new.tar.gz && rm new.tar.gz
|
||||
|
||||
# Switch symlink and restart
|
||||
ln -sfn openvscode-server-vX.Y.Z-linux-x64 current
|
||||
pkill -f "openvscode-server.*--port 3100"
|
||||
/mnt/user/appdata/openvscode/start.sh
|
||||
```
|
||||
|
||||
Extensions and settings are preserved (stored separately in `config/`).
|
||||
|
||||
### Traefik Routing
|
||||
|
||||
Defined in `/mnt/user/appdata/traefik/dynamic.yml`:
|
||||
|
||||
```yaml
|
||||
openvscode-secure:
|
||||
rule: "Host(`code.xtrm-lab.org`)"
|
||||
entryPoints: [https]
|
||||
middlewares: [default-headers, authentik-forward-auth]
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
service: openvscode
|
||||
|
||||
# ...
|
||||
openvscode:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://192.168.10.20:3100"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Claude Code
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Version** | 2.1.71 |
|
||||
| **Binary** | `/mnt/user/appdata/claude-code/.npm-global/bin/claude` |
|
||||
| **Symlink** | `/root/.local/bin/claude` |
|
||||
| **Config** | `/mnt/user/appdata/claude-code/.claude.json` → `/root/.claude.json` |
|
||||
| **Settings** | `/mnt/user/appdata/claude-code/.claude/` → `/root/.claude/` |
|
||||
| **Boot Script** | `/mnt/user/appdata/claude-code/install-claude.sh` |
|
||||
|
||||
### Persistence
|
||||
|
||||
npm global prefix set to `/mnt/user/appdata/claude-code/.npm-global/` (array-backed). Boot script creates symlinks from `/root/` to persistent paths.
|
||||
|
||||
### Updating Claude Code
|
||||
|
||||
```bash
|
||||
source /root/.bashrc
|
||||
npm install -g @anthropic-ai/claude-code
|
||||
claude --version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cooperator CLI
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Version** | 3.36.1 |
|
||||
| **Binary** | `/mnt/user/appdata/claude-code/.npm-global/bin/cooperator` |
|
||||
| **Config** | `~/.cooperator/.env` (Shortcut token, Confluence, git config) |
|
||||
| **Registry** | `@ampeco:registry=https://gitlab.com/api/v4/projects/71775017/packages/npm/` |
|
||||
| **npm auth** | `/root/.npmrc` (GitLab PAT) |
|
||||
|
||||
### What Cooperator Install Sets Up
|
||||
|
||||
- **Commands** — `~/.claude/commands/cooperator` → cooperator's claude-commands
|
||||
- **Agents** — `~/.claude/agents/implementation-task-executor.md`
|
||||
- **Skills** — 12 cooperator skills (shortcut-operations, create-feature-story, gitlab-operations, etc.)
|
||||
- **Shortcut API** — validated via `~/.cooperator/.env` token
|
||||
|
||||
### Updating Cooperator
|
||||
|
||||
```bash
|
||||
source /root/.bashrc
|
||||
npm install -g @ampeco/cooperator
|
||||
cooperator --version
|
||||
```
|
||||
|
||||
**Note:** `/root/.npmrc` is in RAM — recreated on boot if needed. The GitLab PAT is stored in `/boot/config/go` would need a persistent `.npmrc` setup if token changes frequently.
|
||||
|
||||
---
|
||||
|
||||
## GitLab CLI (glab)
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Version** | 1.89.0 |
|
||||
| **Binary** | `/usr/local/bin/glab` (RAM — lost on reboot) |
|
||||
| **Config** | `~/.config/glab-cli/config.yml` |
|
||||
| **Auth** | GitLab PAT (same as npm registry token) |
|
||||
|
||||
**Note:** glab binary at `/usr/local/bin/` is lost on Unraid reboot. Add to boot script or persist to appdata.
|
||||
|
||||
---
|
||||
|
||||
## Python (via uv)
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **uv** | `/root/.local/bin/uv` |
|
||||
| **Python** | 3.12.13 (managed by uv) |
|
||||
| **mikrotik-mcp venv** | `/mnt/user/projects/mikrotik-mcp/venv/` |
|
||||
| **unraid-mcp venv** | `/mnt/user/projects/unraid-mcp/.venv/` |
|
||||
|
||||
---
|
||||
|
||||
## Custom Skills
|
||||
|
||||
6 custom skills synced from Mac to `/mnt/user/appdata/claude-code/custom-skills/`:
|
||||
|
||||
| Skill | Description |
|
||||
|-------|-------------|
|
||||
| ev-compliance-story | EV regulatory compliance story creation |
|
||||
| ev-protocol-expert | OCPP/OCPI/AFIR protocol expertise |
|
||||
| frontend-designer | Nova/Vue component design |
|
||||
| mikrotik-admin | MikroTik router management via MCP |
|
||||
| prd-generator | Product requirements documents |
|
||||
| unraid-admin | Unraid server management via MCP |
|
||||
|
||||
Symlinked to `~/.claude/skills/` alongside 12 cooperator skills (18 total).
|
||||
|
||||
---
|
||||
|
||||
## MCP Servers
|
||||
|
||||
### Registered (TODO)
|
||||
|
||||
The following MCP servers need to be registered via `claude mcp add` on Unraid:
|
||||
|
||||
| Server | Command | Status |
|
||||
|--------|---------|--------|
|
||||
| **shortcut** | `node /mnt/user/appdata/claude-code/mcp-server-shortcut/dist/index.js` | Built, needs `claude mcp add` |
|
||||
| **mikrotik** | `/mnt/user/projects/mikrotik-mcp/venv/bin/python -m mikrotik_mcp.server` | Venv ready, needs `claude mcp add` |
|
||||
| **unraid** | `/mnt/user/projects/unraid-mcp/.venv/bin/python -m unraid_mcp.main` | Venv ready, needs `claude mcp add` |
|
||||
| **playwright** | `npx -y @playwright/mcp@latest --isolated` | npx available, needs `claude mcp add` |
|
||||
| **smartbear** | `npx -y @smartbear/mcp@latest` | npx available, needs `claude mcp add` |
|
||||
|
||||
### Environment Variables for MCPs
|
||||
|
||||
- **mikrotik**: `DEVICES_PATH=/mnt/user/projects/mikrotik-mcp/devices.json`
|
||||
- **unraid**: `UNRAID_API_URL`, `UNRAID_API_KEY`, `UNRAID_MCP_TRANSPORT=stdio`, `UNRAID_VERIFY_SSL=false`
|
||||
- **shortcut**: `SHORTCUT_API_TOKEN` (from `~/.cooperator/.env`)
|
||||
|
||||
---
|
||||
|
||||
## Projects Workspace
|
||||
|
||||
All projects at `/mnt/user/projects/`, opened as default folder in VS Code.
|
||||
|
||||
### Personal Projects (Gitea)
|
||||
|
||||
| Project | Gitea Repo | Description |
|
||||
|---------|-----------|-------------|
|
||||
| infrastructure | jazzymc/infrastructure | This repo — home lab documentation |
|
||||
| claude-skills | jazzymc/claude-skills | Claude Code custom skills |
|
||||
| mikrotik-mcp | jazzymc/mikrotik-mcp | MikroTik MCP server |
|
||||
| unraid-mcp | jazzymc/unraid-mcp | Unraid MCP server |
|
||||
| unraid-glass | jazzymc/unraid-glass | Unraid dashboard plugin |
|
||||
| openclaw | jazzymc/openclaw | OpenClaw game project |
|
||||
| nanobot-mcp | jazzymc/nanobot-mcp | Nanobot MCP server |
|
||||
| nanobot-hkuds | jazzymc/nanobot-hkuds | Nanobot HKU DS |
|
||||
| xtrm-agent | jazzymc/xtrm-agent | AI agent framework |
|
||||
| geekmagic-smalltv | jazzymc/geekmagic-smalltv | SmallTV firmware |
|
||||
| homarr | jazzymc/homarr | Homarr dashboard fork |
|
||||
| shortcut-daily-sync | jazzymc/shortcut-daily-sync | Shortcut sync tool |
|
||||
|
||||
**Remote URL format:** `https://jazzymc:<token>@git.xtrm-lab.org/jazzymc/<repo>.git`
|
||||
|
||||
### AMPECO Work Projects
|
||||
|
||||
| Project | Source | Type |
|
||||
|---------|--------|------|
|
||||
| backend | GitLab (ampeco/apps/charge/backend) | Git clone |
|
||||
| crm | GitLab (ampeco/apps/charge/crm) | Git clone |
|
||||
| marketplace | GitLab (ampeco/apps/charge/marketplace) | Git clone |
|
||||
| mobile-2 | GitLab (ampeco/apps/charge/mobile-2) | Git clone |
|
||||
| ad-hoc-payment-web-app | GitLab (ampeco/apps/charge/external-apps/) | Git clone |
|
||||
| dev-proxy | GitLab (ampeco/apps/shared/dev-proxy) | Git clone |
|
||||
| ampeco-custom-dashboard-widgets-boilerplate | GitHub (ampeco/) | Git clone |
|
||||
| docs | Local rsync | Reference docs |
|
||||
| stories | Local rsync | Product stories |
|
||||
| booking-ewa | Local rsync | Booking app |
|
||||
| ewa-ui | Local rsync | EWA frontend |
|
||||
| design-tokens | Local rsync | Design system tokens |
|
||||
| ampeco-backup | Local rsync | Configuration backups |
|
||||
| central_registry | Local rsync | Service registry |
|
||||
| CCode-UI-Distribution-1.0.0 | Local rsync | UI distribution |
|
||||
| automations | Local rsync | Automation scripts |
|
||||
|
||||
**GitLab auth:** OAuth2 PAT in remote URLs.
|
||||
|
||||
---
|
||||
|
||||
## Boot Sequence
|
||||
|
||||
`/boot/config/go` triggers on Unraid boot:
|
||||
|
||||
1. **Wait for array** — polls for `/mnt/user/appdata/claude-code` (up to 5 min)
|
||||
2. **Claude Code setup** — `/mnt/user/appdata/claude-code/install-claude.sh`
|
||||
- Creates symlinks (`/root/.local/bin/claude`, `/root/.claude`, `/root/.claude.json`)
|
||||
- Writes `.bashrc` with persistent npm PATH
|
||||
3. **OpenVSCode Server** — `/mnt/user/appdata/openvscode/start.sh`
|
||||
- Kills any existing instance
|
||||
- Starts on port 3100 with persistent config dir
|
||||
- Sources Claude/Cooperator PATH for terminal sessions
|
||||
|
||||
---
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```
|
||||
Browser → https://code.xtrm-lab.org
|
||||
↓
|
||||
Traefik (443) → Authentik SSO check
|
||||
↓
|
||||
OpenVSCode Server (:3100, host-native)
|
||||
↓
|
||||
Unraid Host Shell
|
||||
├── claude (2.1.71)
|
||||
├── cooperator (3.36.1)
|
||||
├── glab (1.89.0)
|
||||
├── node (22.18.0) / npm (10.9.3) / bun (1.3.10)
|
||||
├── uv + python 3.12
|
||||
├── docker / docker compose
|
||||
├── git
|
||||
└── /mnt/user/projects/
|
||||
├── ampeco/ (18 AMPECO work projects)
|
||||
├── infrastructure/
|
||||
├── claude-skills/
|
||||
├── mikrotik-mcp/
|
||||
└── ... (12 personal repos)
|
||||
```
|
||||
@@ -2,10 +2,187 @@
|
||||
|
||||
**Purpose:** Major infrastructure events only. Minor changes are in git commit messages.
|
||||
|
||||
---
|
||||
## 2026-02-28
|
||||
|
||||
### Docker Container Audit & Migration to Dockge
|
||||
- **[DOCKER]** Removed 4 orphan images: nextcloud/all-in-one, olprog/unraid-docker-webui, ghcr.io/ich777/doh-server, ghcr.io/idmedia/hass-unraid
|
||||
- **[DOCKER]** Removed ancient pgAdmin4 v2.1 (status=Created) and fenglc/pgadmin4 image
|
||||
- **[DOCKER]** Removed spaceinvaderone/ha_inabox image (replaced by Home-Assistant-Container)
|
||||
- **[TRAEFIK]** Removed Docker provider constraint (`traefik.constraint=valid`) — Docker labels now auto-discovered
|
||||
- **[TRAEFIK]** Cleaned up dynamic.yml: removed 14 stale/migrated router+service pairs (pangolin, pihole, doh, netbox, and services now using Docker labels)
|
||||
- **[TRAEFIK]** Added dockge-secure router to dynamic.yml
|
||||
- **[DOCKER]** Created 6 new Dockge stacks: docker-socket-proxy, tuyagateway, firefly, seekandwatch, ha-time-machine, homeassistant (replaced inabox with Container)
|
||||
- **[DOCKER]** Migrated ALL 53 containers from dockerman to Dockge compose stacks (100% coverage)
|
||||
- **[DOCKER]** Fixed Nextcloud Traefik rule: empty Host() → Host(`cloud.xtrm-lab.org`)
|
||||
- **[DOCKER]** Fixed UptimeKuma Traefik rule: empty Host() → Host(`uptime.xtrm-lab.org`)
|
||||
- **[DOCKER]** Fixed Homarr domain: `homarr.xtrm-lab.org` → `xtrm-lab.org` (root domain)
|
||||
- **[DOCKER]** Fixed Netdisco entrypoint: `websecure` → `https`
|
||||
- **[DOCKER]** Removed stale `traefik.constraint=valid` from Dockhand
|
||||
- **[DOCKER]** Fixed Transmission middleware: removed non-existent `transmission-headers@file`
|
||||
- **[DOCKER]** Added Authentik forward auth middleware to: n8n, homarr, transmission, speedtest-tracker, uptime-kuma, firefly, seekandwatch, open-webui, traefik dashboard, dockge, netalertx, urbackup, unimus
|
||||
- **[DOCKER]** Added Traefik labels to: vaultwarden, open-webui (ai.xtrm-lab.org), firefly, seekandwatch
|
||||
- **[DOCKER]** Added missing Unraid labels (icon, managed, webui) to: ntfy, timemachine, ollama, docker-socket-proxy, tuyagateway, all new stacks
|
||||
- **[DOCKER]** Moved ollama + open-webui from bridge to dockerproxy network
|
||||
- **[DOCKER]** Moved fireflyiii + firefly-data-importer from none to dockerproxy network
|
||||
- **[DOCKER]** Moved SeekAndWatch from bridge to dockerproxy network
|
||||
- **[DOCKER]** Removed traefik labels from host-network containers (plex, netalertx) — routed via dynamic.yml only
|
||||
- **[DOCKER]** Fixed NetAlertX: added read_only, proper capabilities (NET_RAW/NET_ADMIN), and UID 20211
|
||||
- **[DOCKER]** Removed empty netbox stack directory
|
||||
|
||||
## 2026-03-09
|
||||
|
||||
### Claude Code Tooling Completion
|
||||
- **[SERVICE]** Installed Cooperator CLI v3.36.1 on Unraid (`npm install -g @ampeco/cooperator`)
|
||||
- **[SERVICE]** Ran `cooperator install --non-interactive` — symlinked commands, agents, 12 skills to `~/.claude/`
|
||||
- **[SERVICE]** Created `~/.cooperator/.env` with Shortcut API token, Confluence token, git config
|
||||
- **[SERVICE]** Installed glab CLI v1.89.0 on Unraid (`/usr/local/bin/glab`) — authenticated as kaloyan.danchev
|
||||
- **[SERVICE]** Installed uv package manager + Python 3.12.13 on Unraid
|
||||
- **[SERVICE]** Created Python venvs for mikrotik-mcp and unraid-mcp projects
|
||||
- **[SERVICE]** Copied MikroTik SSH key from Mac to Unraid — SSH to HAP ax3 verified working
|
||||
- **[SERVICE]** Synced 6 custom Claude skills to `/mnt/user/appdata/claude-code/custom-skills/` (ev-compliance-story, ev-protocol-expert, frontend-designer, mikrotik-admin, prd-generator, unraid-admin)
|
||||
- **[SERVICE]** Built shortcut MCP server at `/mnt/user/appdata/claude-code/mcp-server-shortcut/`
|
||||
- **[SERVICE]** Enabled Claude plugins: ralph-loop, claude-md-management, playground
|
||||
- **[DOCS]** Updated 12-DEVELOPMENT-ENVIRONMENT.md with Cooperator, glab, Python, skills, MCP sections
|
||||
|
||||
#### TODO — MCP Server Registration
|
||||
The following MCP servers are built/ready but need `claude mcp add` registration (requires interactive Claude session on Unraid):
|
||||
- shortcut, mikrotik, unraid, playwright, smartbear
|
||||
|
||||
## 2026-03-08
|
||||
|
||||
### Development Environment Setup
|
||||
- **[SERVICE]** Installed OpenVSCode Server as host-native process (port 3100, not a container) — accessible at https://code.xtrm-lab.org
|
||||
- **[SERVICE]** Traefik route added in dynamic.yml with Authentik forward auth
|
||||
- **[SERVICE]** Boot auto-start via `/boot/config/go` → `/mnt/user/appdata/openvscode/start.sh`
|
||||
- **[SERVICE]** Claude Code updated to v2.1.71, persistent at `/mnt/user/appdata/claude-code/.npm-global/`
|
||||
- **[SERVICE]** Cooperator CLI v3.36.1 installed globally (`npm install -g @ampeco/cooperator`)
|
||||
- **[SERVICE]** Created `/mnt/user/projects/` workspace with 12 personal repos (Gitea) + 18 AMPECO work projects (GitLab)
|
||||
- **[DOCS]** Added `12-DEVELOPMENT-ENVIRONMENT.md` documenting full dev environment setup
|
||||
|
||||
### Docker Maintenance
|
||||
- **[DOCKER]** Created Unraid Docker Manager XML templates for 11 containers missing them (adguardhome, gitea, minecraft, ntfy, ollama, open-webui, etc.)
|
||||
- **[DOCKER]** Pulled new images for all 30 active Dockge stacks, 14 containers received updates
|
||||
- **[DOCKER]** Cleaned up dangling images: 10.95 GB reclaimed
|
||||
- **[DOCKER]** Organized all 42 containers into Docker Folders (12 folders: Infrastructure, Security, Monitoring, DevOps, Media, etc.)
|
||||
- **[DOCKER]** Pushed 6 local-only projects to Gitea (claude-skills, mikrotik-mcp, unraid-mcp, nanobot-mcp, nanobot-hkuds, openclaw)
|
||||
|
||||
### Service Fixes
|
||||
- **[FIX]** Gitea DB connection: fixed hardcoded PostgreSQL IP (172.18.0.13) → hostname `postgresql17` in compose and app.ini
|
||||
- **[FIX]** Traefik: removed stale stopped container blocking restart
|
||||
- **[FIX]** Redis: removed stale stopped container blocking recreate
|
||||
|
||||
## 2026-02-26
|
||||
|
||||
### WiFi & CAP VLAN Fixes
|
||||
- **[WIFI]** Fixed 5GHz channel overlap: HAP wifi1 reduced from 80MHz to 40MHz at 5180MHz, CAP cap-wifi1 at 5220MHz (no overlap)
|
||||
- **[WIFI]** Restored all 29 WiFi access-list MAC→VLAN entries (were missing/lost)
|
||||
- **[WIFI]** Fixed cap-wifi2 band mismatch: was `band=2ghz-n` with frequency=5220 (5GHz), corrected to frequency=2412
|
||||
- **[CAPSMAN]** Enabled bridge VLAN filtering on CAP (cAP XL ac) — all VLANs now properly tagged through CAP
|
||||
- **[CAPSMAN]** CAP bridgeLocal config: vlan-filtering=yes, pvid=10, VLANs 10/20/25/30/35/40 with proper tagged/untagged members
|
||||
- **[CAPSMAN]** Set `capdp` datapath vlan-id=40 for default PVID on dynamic wifi bridge ports
|
||||
- **[CAPSMAN]** VLAN assignment through CAP now working — access-list vlan-id entries propagate correctly
|
||||
- **[NETWORK]** Fixed AdGuard Home IP conflict: container was at 192.168.10.2 (CAP's IP), now static at 192.168.10.10
|
||||
- **[NETWORK]** Fixed adguardhome-sync IP conflict: was at 192.168.10.3 (CSS326's IP), now static at 192.168.10.11
|
||||
- **[WIFI]** Added Xiaomi Air Purifier 2 (C8:5C:CC:40:B4:AA) to access-list as VLAN 30 (IoT)
|
||||
|
||||
### WiFi Quality Optimization
|
||||
- **[WIFI]** Fixed 2.4GHz co-channel interference: HAP on ch 1 (2412), CAP moved from ch 1 to ch 6 (2437)
|
||||
- **[WIFI]** Fixed 5GHz overlap: HAP stays ch 36 (5180, 40MHz), CAP moved from ch 44 (5220) to ch 52 (5260, DFS)
|
||||
- **[WIFI]** Fixed CAP 2.4GHz width from 40MHz to 20MHz for IoT compatibility
|
||||
- **[WIFI]** TX power kept at defaults (17/16 dBm) — reduction caused kitchen coverage loss through concrete walls
|
||||
|
||||
## 2026-02-24
|
||||
|
||||
### Motherboard Replacement & NVMe Cache Pool
|
||||
- **[HARDWARE]** Replaced XTRM-U motherboard — new MAC `38:05:25:35:8E:7A`, DHCP lease updated on MikroTik
|
||||
- **[HARDWARE]** Confirmed disk1 (10TB HGST HUH721010ALE601, serial 2TKK3K1D) mechanically dead — clicking heads, fails on multiple SATA ports and new motherboard
|
||||
- **[STORAGE]** Created new Unraid-managed cache pool: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1 (~1.8TB usable)
|
||||
- **[STORAGE]** Pool settings: autotrim=on, compression=on
|
||||
- **[DOCKER]** Migrated Docker from btrfs loopback image (disk1 HDD) to ZFS on NVMe cache pool
|
||||
- **[DOCKER]** Docker now uses ZFS storage driver directly on `cache/system/docker` dataset
|
||||
- **[DOCKER]** Recreated `dockerproxy` bridge network, rebuilt all 39 container templates
|
||||
- **[DOCKER]** Restarted Dockge and critical stacks (adguardhome, ntfy, gitea, woodpecker, etc.)
|
||||
- **[STORAGE]** Deleted old `docker.img` (200GB) from disk1
|
||||
- **[INCIDENT]** disk1 still running in parity-emulated mode — replacement drive needed
|
||||
|
||||
### Post-Migration Container Cleanup
|
||||
- **[NETWORK]** Fixed Traefik unreachable: removed stale Docker bridge (duplicate 172.18.0.0/16 subnet) + 7 orphaned bridges
|
||||
- **[DOCKER]** Removed deprecated containers: DoH-Server, binhex-plexpass (duplicate of Plex)
|
||||
- **[DOCKER]** Removed obsolete containers: HomeAssistant_inabox, Docker-WebUI, hass-unraid
|
||||
- **[DOCKER]** Removed nextcloud-aio-mastercontainer (replaced by Nextcloud container)
|
||||
- **[SERVICE]** Fixed adguardhome-sync: recreated config file (was directory from migration), switched to br0 network for macvlan reachability
|
||||
- **[SERVICE]** Fixed diode stack: recreated .env, nginx.conf, OAuth2 client config; ran Hydra DB migration and client bootstrap
|
||||
- **[SERVICE]** Fixed diode-agent: corrected YAML format, secrets, and Hydra authentication
|
||||
- **[SERVICE]** Started unmarr (Homarr fork, 172.18.0.81) and rustfs (S3-compatible storage)
|
||||
- **[DOCKER]** Final state: 53 containers running, pgAdmin4 stopped (utility)
|
||||
- **[DOCS]** Updated 03-SERVICES-OTHER.md with removed containers
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-14
|
||||
|
||||
### CAP XL ac Recovery
|
||||
- **[WIRELESS]** Factory reset CAP XL ac (lost credentials)
|
||||
- **[WIRELESS]** Reconfigured CAPsMAN: regenerated certificate, CAP re-enrolled with `certificate=request`
|
||||
- **[WIRELESS]** Both CAP radios now active: wifi1 (2.4GHz XTRM2) + wifi2 (5GHz XTRM)
|
||||
- **[WIRELESS]** CAP now running RouterOS 7.21.1
|
||||
- **[WIRELESS]** Enabled SSH on CAP port 2222 for user xtrm with mikrotik key
|
||||
- **[WIRELESS]** Confirmed WiFi access list has no VLAN assignment (rolled back Jan 27)
|
||||
|
||||
### Roms Network Share
|
||||
- **[SERVICE]** Shared /mnt/user/roms (2.3TB, 49 systems) via SMB from Unraid
|
||||
- **[SERVICE]** Mounted on Nobara at /mnt/roms (fstab, CIFS guest, systemd.automount)
|
||||
- **[SERVICE]** Mounted on Recalbox via custom.sh boot script (CIFS bind mounts)
|
||||
- **[SERVICE]** Deleted local roms from Recalbox SD card (~12.5GB freed)
|
||||
|
||||
### WiFi DHCP Fix
|
||||
- **[NETWORK]** Fixed DHCP not working on HAP1 local WiFi (wifi1/wifi2)
|
||||
- **[NETWORK]** Root cause: VLAN 40 had wifi1/wifi2 as **tagged** instead of **untagged** — DHCP responses had 802.1Q tags clients couldn't process
|
||||
- **[NETWORK]** Fix: `/interface bridge vlan set` wifi1,wifi2 to untagged for VLAN 40
|
||||
|
||||
### Minecraft Server Deployed
|
||||
- **[SERVICE]** Deployed Minecraft Java Edition (itzg/minecraft-server) on Unraid
|
||||
- **[SERVICE]** Version 1.21.11, Survival mode, 2GB RAM, max 10 players
|
||||
- **[SERVICE]** Docker IP 172.18.0.80, port 25565, Dockge stack `minecraft`
|
||||
- **[NETWORK]** NAT port forward WAN:25565 → 192.168.10.20:25565
|
||||
- **[NETWORK]** Hairpin NAT for internal access via minecraft.xtrm-lab.org
|
||||
- **[SERVICE]** Added Unraid labels with Minecraft icon
|
||||
|
||||
### Documentation Updates
|
||||
- **[DOCS]** Updated 07-WIFI-CAPSMAN-CONFIG.md: CAP both radios working, access list status
|
||||
- **[DOCS]** Updated 01-NETWORK-MAP.md: Fixed CAP IP (.6→.2), added Nobara and SMB shares
|
||||
- **[DOCS]** Updated 04-HARDWARE-INVENTORY.md: CAP details, added Recalbox device
|
||||
- **[DOCS]** Updated 06-VLAN-DEVICE-ASSIGNMENT.md: Added Nobara (VLAN 10) and Recalbox (VLAN 25)
|
||||
- **[DOCS]** Updated 03-SERVICES-OTHER.md: Added Roms SMB share, Minecraft server section
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-13
|
||||
|
||||
### Failover Infrastructure Deployed
|
||||
- **[SERVICE]** Deployed Docker failover stack on XTRM-Nobara (Traefik, Vaultwarden, Authentik, AdGuard Home)
|
||||
- **[SERVICE]** Installed Docker CE 29.2.1 + Docker Compose 5.0.2 on Nobara
|
||||
- **[SERVICE]** Deployed Keepalived VRRP for automatic failover (VIP: 192.168.10.250)
|
||||
- **[SERVICE]** Unraid: Keepalived as Docker container (local/keepalived, MASTER priority 150)
|
||||
- **[SERVICE]** Nobara: Keepalived as systemd service (BACKUP priority 100)
|
||||
- **[SERVICE]** Replicated data: Vaultwarden DB, Authentik PostgreSQL dump (864MB), AdGuard config, Traefik certs
|
||||
- **[NETWORK]** Added VRRP protocol to Nobara firewall (firewalld)
|
||||
- **[NETWORK]** Configured SSH key auth to Nobara (id_ed25519_nobara, passwordless)
|
||||
- **[NETWORK]** Added SSH config alias: `ssh nobara`
|
||||
- **[DOCS]** Created 10-FAILOVER-NOBARA.md with full failover documentation
|
||||
- **[DOCS]** Updated 02-SERVICES-CRITICAL.md with failover section
|
||||
- **[DOCS]** Updated 04-HARDWARE-INVENTORY.md with XTRM-Nobara specs
|
||||
- **[DOCS]** Updated README.md and CLAUDE.md with Nobara references
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-06
|
||||
|
||||
### Unraid Flash Drive Failure
|
||||
- **[INCIDENT]** Unraid flash drive crashing - migration procedure created
|
||||
- **[DOCS]** Created incident report with full flash drive replacement procedure
|
||||
|
||||
### Documentation Restructure
|
||||
- **[DOCS]** Restructured docs/ from 23 files to clean 9-doc structure
|
||||
- **[DOCS]** Archived 12 completed VLAN migration project docs to archive/vlan-migration/
|
||||
|
||||
200
docs/incidents/2026-02-06-unraid-flash-drive-failure.md
Normal file
200
docs/incidents/2026-02-06-unraid-flash-drive-failure.md
Normal file
@@ -0,0 +1,200 @@
|
||||
# Incident: Unraid Flash Drive Failure
|
||||
|
||||
**Date:** 2026-02-06
|
||||
**Severity:** P1 - Server at risk
|
||||
**Status:** In Progress
|
||||
**Affected:** XTRM-U (Unraid NAS)
|
||||
|
||||
---
|
||||
|
||||
## Symptoms
|
||||
|
||||
Unraid flash drive experiencing crashes/instability. Risk of complete failure and data loss of boot configuration.
|
||||
|
||||
---
|
||||
|
||||
## Migration Procedure: Replace Flash Drive
|
||||
|
||||
### Step 1: Retrieve Flash Backup
|
||||
|
||||
Try these options in order of preference:
|
||||
|
||||
**Option A - Fresh backup from WebGUI (if server still boots):**
|
||||
1. Open http://192.168.10.20 in browser
|
||||
2. Go to **Main** tab → click on **Flash** device
|
||||
3. Under Flash Device Settings, click **FLASH BACKUP**
|
||||
4. Download the ZIP file to your Mac
|
||||
|
||||
**Option B - Google Drive (daily Rclone backup):**
|
||||
```bash
|
||||
# From Mac (if rclone is installed)
|
||||
rclone copy drive:Backups/unraid-flash ~/Desktop/unraid-flash-backup/
|
||||
|
||||
# Or download manually from Google Drive web UI
|
||||
# Folder: Backups/unraid-flash
|
||||
```
|
||||
|
||||
**Option C - Local backup on Unraid (if server boots but WebGUI broken):**
|
||||
```bash
|
||||
ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422
|
||||
# Backup is at:
|
||||
ls /mnt/user/Backup/unraid-flash/
|
||||
# Copy it off the server:
|
||||
scp -P 422 -i ~/.ssh/id_ed25519_unraid root@192.168.10.20:/mnt/user/Backup/unraid-flash/* ~/Desktop/unraid-flash-backup/
|
||||
```
|
||||
|
||||
**Option D - Direct copy from failing drive:**
|
||||
1. Shut down server
|
||||
2. Remove flash drive, insert into Mac
|
||||
3. Copy entire contents to `~/Desktop/unraid-flash-backup/`
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Prepare New USB Drive
|
||||
|
||||
**Requirements:**
|
||||
- USB 2.0 recommended (more reliable than USB 3.0 for this purpose)
|
||||
- Capacity: 4 GB minimum, 32 GB maximum
|
||||
- Reputable brand (SanDisk, Samsung, Kingston)
|
||||
- Must have a unique hardware GUID
|
||||
|
||||
**Write the backup to new drive:**
|
||||
|
||||
1. Download [Unraid USB Flash Creator](https://unraid.net/download) for macOS
|
||||
2. Insert new USB drive into Mac
|
||||
3. Open Flash Creator
|
||||
4. For **Operating System**, scroll down and select **"Use custom"**
|
||||
5. Browse to your backup ZIP file from Step 1
|
||||
6. Select the new USB drive as destination
|
||||
7. Click **Write** and wait for completion
|
||||
|
||||
**If you don't have a backup ZIP** (only raw files from Option D):
|
||||
1. In Flash Creator, select the Unraid OS version matching your current install
|
||||
2. Write a fresh Unraid install to the new drive
|
||||
3. After writing, mount the drive and copy your backed-up `config/` folder onto it, replacing the default one
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Swap Drives and Boot
|
||||
|
||||
1. Shut down XTRM-U if still running
|
||||
2. Remove the old (failing) flash drive
|
||||
3. Insert the new USB drive
|
||||
4. Power on the server
|
||||
5. Wait for boot (1-2 minutes)
|
||||
6. Try accessing WebGUI at http://192.168.10.20
|
||||
|
||||
**If WebGUI doesn't load:**
|
||||
- Connect a monitor to the server to check boot messages
|
||||
- Verify the USB drive is detected in BIOS
|
||||
- Ensure boot order has USB first
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Transfer License
|
||||
|
||||
You will see an "Invalid, missing or expired registration key" message. This is expected.
|
||||
|
||||
1. In WebGUI, go to **Tools → Registration**
|
||||
2. Click **Replace Key**
|
||||
3. Enter the email address associated with your Unraid account
|
||||
4. Check your email for the confirmation/license key
|
||||
5. Follow the link or paste the key file URL into the Registration page
|
||||
6. Click **Done**
|
||||
|
||||
**Important warnings:**
|
||||
- Replacing the key **permanently blacklists** the old USB drive - it can never be used with Unraid again
|
||||
- First license transfer can be done at any time
|
||||
- Subsequent transfers: once per 12 months via the automated system
|
||||
- If you need another transfer within 12 months, contact [Unraid support](https://unraid.net/contact) with old GUID, new GUID, license key, and purchase email
|
||||
|
||||
**If you can't find your license:**
|
||||
- Log into https://account.unraid.net to view your keys
|
||||
- Check email for original purchase confirmation
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Post-Migration Verification
|
||||
|
||||
Run through this checklist after the server is back up:
|
||||
|
||||
**Array & Storage:**
|
||||
- [ ] WebGUI loads at http://192.168.10.20
|
||||
- [ ] Array starts normally (Main tab → Start)
|
||||
- [ ] All disks show healthy status
|
||||
- [ ] Shares are accessible
|
||||
|
||||
**Docker & Services:**
|
||||
```bash
|
||||
ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422
|
||||
|
||||
# Check all containers
|
||||
docker ps -a --format 'table {{.Names}}\t{{.Status}}'
|
||||
|
||||
# Start any stopped critical containers (in order):
|
||||
docker start postgresql17 # Wait 30s
|
||||
docker start Redis # Wait 10s
|
||||
docker start traefik
|
||||
docker start authentik authentik-worker
|
||||
docker start vaultwarden
|
||||
```
|
||||
|
||||
**Network:**
|
||||
- [ ] SSH works: `ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422`
|
||||
- [ ] DNS failover AdGuard reachable: http://192.168.10.10:3000
|
||||
- [ ] AdGuard sync working (check `docker logs adguardhome-sync --tail 5`)
|
||||
- [ ] External URLs working (https://xtrm-lab.org)
|
||||
|
||||
**Services checklist:**
|
||||
- [ ] Traefik reverse proxy (https://xtrm-lab.org)
|
||||
- [ ] Authentik SSO (https://auth.xtrm-lab.org)
|
||||
- [ ] Gitea (https://git.xtrm-lab.org)
|
||||
- [ ] Uptime Kuma (https://uptime.xtrm-lab.org)
|
||||
- [ ] Vaultwarden (https://vault.xtrm-lab.org)
|
||||
- [ ] Plex (https://plex.xtrm-lab.org)
|
||||
|
||||
**Backup:**
|
||||
- [ ] Verify Rclone config still present: `rclone listremotes` (should show `drive:`)
|
||||
- [ ] Test flash backup: trigger manual backup from WebGUI or User Scripts
|
||||
- [ ] Verify cron schedule for flash backup is active
|
||||
|
||||
---
|
||||
|
||||
### Step 6: Prevention
|
||||
|
||||
After successful migration:
|
||||
|
||||
1. **Enable Unraid Connect** (if not already) for automated cloud flash backup:
|
||||
- Settings → Management Access → Unraid Connect
|
||||
- Sign in with your unraid.net account
|
||||
- Enable Flash Backup
|
||||
|
||||
2. **Verify Rclone cron** is scheduled:
|
||||
```bash
|
||||
# Check user scripts plugin for flash backup schedule
|
||||
ls /boot/config/plugins/user.scripts/scripts/
|
||||
```
|
||||
|
||||
3. **Keep a spare USB drive** prepared with a fresh Unraid install - makes future recovery faster
|
||||
|
||||
4. **Test backup restoration** periodically - don't wait for a failure to discover your backup is incomplete
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Unraid Docs: Changing the Flash Device](https://docs.unraid.net/unraid-os/system-administration/maintain-and-update/changing-the-flash-device/)
|
||||
- [Unraid Docs: Licensing FAQ](https://docs.unraid.net/unraid-os/troubleshooting/licensing-faq/)
|
||||
- Internal: `docs/02-SERVICES-CRITICAL.md` (startup order)
|
||||
|
||||
---
|
||||
|
||||
## Resolution
|
||||
|
||||
*Update this section when migration is complete:*
|
||||
|
||||
- **Date resolved:**
|
||||
- **New USB drive:**
|
||||
- **License transferred:** Yes/No
|
||||
- **Services verified:** Yes/No
|
||||
- **Backup reconfigured:** Yes/No
|
||||
91
docs/incidents/2026-02-20-disk1-hardware-failure.md
Normal file
91
docs/incidents/2026-02-20-disk1-hardware-failure.md
Normal file
@@ -0,0 +1,91 @@
|
||||
# Incident: Disk1 Hardware Failure (Clicking / SATA Link Failure)
|
||||
|
||||
**Date:** 2026-02-20
|
||||
**Severity:** P2 - Degraded (no redundancy)
|
||||
**Status:** Open — awaiting replacement drive (motherboard replaced, NVMe cache pool added Feb 24)
|
||||
**Affected:** XTRM-U (Unraid NAS) — disk1 (data drive)
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
disk1 (10TB HGST Ultrastar HUH721010ALE601, serial `2TKK3K1D`) has physically failed. The drive dropped off the SATA bus on Feb 18 at 19:15 and is now exhibiting clicking (head failure). The Unraid md array is running in **degraded/emulated mode**, reconstructing disk1 data from parity on the fly. All data is intact but there is **zero redundancy**.
|
||||
|
||||
---
|
||||
|
||||
## Timeline
|
||||
|
||||
| When | What |
|
||||
|------|------|
|
||||
| Feb 18 ~19:15 | `ata5: qc timeout` → multiple hard/soft resets → `reset failed, giving up` → `ata5.00: disable device` |
|
||||
| Feb 18 19:17 | `super.dat` updated — md array marked disk1 as `DISK_DSBL` (213 errors) |
|
||||
| Feb 20 13:14 | Investigation started. `sdc` completely absent from `/dev/`. ZFS pool `disk1` running on emulated `md1p1` with 0 errors |
|
||||
| Feb 20 ~13:30 | Server rebooted, disk moved to new SATA port (ata5 → ata6). Same failure: `ata6: reset failed, giving up`. Clicking noise confirmed |
|
||||
| Feb 24 | Motherboard replaced. Dead drive confirmed still dead on new hardware. New SATA port assignment. Drive is mechanically failed (clicking heads) |
|
||||
| Feb 24 | New cache pool created: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1. Docker migrated from HDD loopback to NVMe ZFS |
|
||||
|
||||
## Drive Details
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Model | HUH721010ALE601 (HGST/WD Ultrastar He10) |
|
||||
| Serial | 2TKK3K1D |
|
||||
| Capacity | 10TB (9766436812 sectors) |
|
||||
| Array slot | disk1 (slot 1) |
|
||||
| Filesystem | ZFS (on md1p1) |
|
||||
| Last known device | sdc |
|
||||
| Accumulated md errors | 213 |
|
||||
|
||||
## Current State
|
||||
|
||||
- **Array**: STARTED, degraded — disk1 emulated from parity (`sdb`)
|
||||
- **ZFS pool `disk1`**: ONLINE, 0 errors, mounted on `md1p1` (parity reconstruction)
|
||||
- **Parity drive** (`sdb`, serial `7PHBNYZC`): DISK_OK, 0 errors
|
||||
- **All services**: Running normally (Docker containers, VMs)
|
||||
- **Risk**: If parity drive fails, data is **unrecoverable**
|
||||
|
||||
## Diagnosis
|
||||
|
||||
- Drive fails on multiple SATA ports → not a port/cable issue
|
||||
- Clicking noise on boot → mechanical head failure
|
||||
- dmesg shows link responds but device never becomes ready → drive electronics partially functional, platters/heads dead
|
||||
- Drive is beyond DIY repair
|
||||
|
||||
## Root Cause
|
||||
|
||||
Mechanical failure of the hard drive (clicking = head crash or seized actuator). Not related to cache drive migration that happened around the same time — confirmed by syslog showing clean SATA link failure.
|
||||
|
||||
---
|
||||
|
||||
## Recovery Plan
|
||||
|
||||
### Step 1: Get Replacement Drive
|
||||
- Must be 10TB or larger
|
||||
- Check WD warranty: serial `HUH721010ALE601_2TKK3K1D` at https://support-en.wd.com/app/warrantycheck
|
||||
- Any 3.5" SATA drive works (doesn't need to match model)
|
||||
|
||||
### Step 2: Install & Rebuild
|
||||
1. Power off the server
|
||||
2. Remove dead drive, install replacement in any SATA port
|
||||
3. Boot Unraid
|
||||
4. Go to **Main** → click on **Disk 1** (will show as "Not installed" or unmapped)
|
||||
5. Stop the array
|
||||
6. Assign the new drive to the **Disk 1** slot
|
||||
7. Start the array — Unraid will prompt to **rebuild** from parity
|
||||
8. Rebuild will take many hours for 10TB — do NOT interrupt
|
||||
|
||||
### Step 3: Post-Rebuild
|
||||
1. Verify ZFS pool `disk1` is healthy: `zpool status disk1`
|
||||
2. Run parity check from Unraid UI
|
||||
3. Run SMART extended test on new drive: `smartctl -t long /dev/sdX`
|
||||
4. Verify all ZFS datasets are intact
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
- Server is safe to run in degraded mode indefinitely, just without parity protection
|
||||
- Avoid heavy writes if possible to reduce risk to parity drive
|
||||
- New cache pool (3x Samsung 990 EVO Plus 1TB, ZFS RAIDZ1) now hosts all Docker containers
|
||||
- Old docker.img loopback deleted from disk1 (200GB freed)
|
||||
- Since disk1 uses ZFS on md, the rebuild reconstructs the raw block device — ZFS doesn't need any separate repair
|
||||
120
docs/wip/HOME-ASSISTANT-SETUP.md
Normal file
120
docs/wip/HOME-ASSISTANT-SETUP.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# Home Assistant Setup
|
||||
|
||||
**Status:** IN PROGRESS
|
||||
**Priority:** High
|
||||
**Started:** 2026-02-07
|
||||
|
||||
## Overview
|
||||
|
||||
Home Assistant OS (HAOS) running as a libvirt VM on Unraid, with custom dashboards, themes, and smart home integrations.
|
||||
|
||||
## Infrastructure
|
||||
|
||||
| Component | Detail |
|
||||
|-----------|--------|
|
||||
| VM | HAOS on libvirt (Unraid host) |
|
||||
| IP | 192.168.10.50 (VLAN 10 — Management) |
|
||||
| Access | Web UI at `http://192.168.10.50:8123` |
|
||||
| VM Management | `virsh qemu-agent-command "Home Assistant"` from Unraid |
|
||||
| Container | `homeassistant` Docker container inside HAOS |
|
||||
|
||||
## Completed
|
||||
|
||||
### Integrations
|
||||
|
||||
| Integration | Type | Devices | Notes |
|
||||
|-------------|------|---------|-------|
|
||||
| Gree AC (manual) | Custom component (`gree_manual`) | 1 AC unit | Cross-VLAN via L3 unicast; generic key `a3K8Bx%2r8Y7#xDh` |
|
||||
| Xiaomi Miot | HACS integration (`xiaomi_miot`) | 11 devices | 2FA verification required; BLE sensors work via cloud |
|
||||
| Tuya / Smart Life | Built-in | Smart Curtain, switches, lights | Paired via Smart Life QR code |
|
||||
| Roborock | Via Xiaomi Miot | S7 Pro Ultra | On 192.168.31.x (Xiaomi router subnet) |
|
||||
| Bosch Home Connect | Built-in (`home_connect`) | Oven, Washing Machine | OAuth2 via developer.home-connect.com |
|
||||
|
||||
**Xiaomi devices discovered:**
|
||||
- 4x Miaomiaoce BLE temp/humidity sensors (Living Room, Kitchen, Boys Room, Bedroom)
|
||||
- 2x Air Purifiers (Living Room — zhimi.mc1, Boys Room — cpa4)
|
||||
- 1x Air Purifier 4 Compact (on VLAN 30)
|
||||
- 1x Humidifier (Living Room — deerma.jsq2w)
|
||||
- 1x Roborock S7 Pro Ultra
|
||||
- 1x Xiaomi Router
|
||||
- 1x Mi Smart Home Hub
|
||||
|
||||
**Bosch Home Connect devices:**
|
||||
- Bosch Oven HRG7784B1 — temperature, door state, programs, child lock, remote control
|
||||
- Bosch Washing Machine WGB24400BY — programs, progress, spin speed, temperature, remaining time, door state, child lock
|
||||
|
||||
### Theme — visionOS
|
||||
|
||||
| File | Path (inside HA) | Source |
|
||||
|------|-------------------|--------|
|
||||
| visionOS theme | `/config/themes/visionos.yaml` | [homeassistant-visionos-theme](https://github.com/Nezz/homeassistant-visionos-theme) |
|
||||
| Liquid Glass theme | `/config/themes/liquid_glass.yaml` | Same repo |
|
||||
| card-mod.js (v4.2.0) | `/config/www/card-mod.js` | Enables backdrop-filter CSS effects |
|
||||
| Mushroom Cards (v5.0.10) | `/config/www/mushroom.js` | Clean card collection for dashboards |
|
||||
|
||||
- Both JS files registered as Lovelace resources via websocket API (`lovelace/resources/create`)
|
||||
- visionOS set as default theme via startup automation in `automations.yaml`
|
||||
|
||||
### Dashboards
|
||||
|
||||
Two custom Lovelace dashboards (not the default Overview):
|
||||
|
||||
| Dashboard | URL Path | Columns | Optimized For |
|
||||
|-----------|----------|---------|---------------|
|
||||
| Mobile | `/dash-mobile` | 2 | Phone screens |
|
||||
| Desktop | `/dash-desktop` | 4 | Desktop/tablet |
|
||||
|
||||
Both use **Sections** view type with **Mushroom Cards**. Created via HA websocket API (`lovelace/dashboards/create` + `lovelace/config/save`).
|
||||
|
||||
**Sections on both dashboards:**
|
||||
|
||||
1. **Header** — Title card + chips (weather, outside temp, vacuum status, phone battery)
|
||||
2. **Climate Control** — Living Room thermostat, Kitchen thermostat, Gree AC
|
||||
3. **Temperatures** — 4 indoor Miaomiaoce BLE sensors + outside temp/humidity
|
||||
4. **Radiators** — Living Room, Main Bedroom, Girls Room
|
||||
5. **Bosch Appliances** — Oven (status, temp, door) + Washer (status, time left, progress)
|
||||
6. **Lights** — Living Room, Dining Room, Bedroom LED strip, Picture Frame lamp
|
||||
7. **Switches** — Entrance (x2), Bathroom (x2), Kids Bathroom (x2), Boys Lamp
|
||||
8. **Curtain & Vacuum** — Smart Curtain, Roborock S7, Vacuum battery
|
||||
|
||||
Desktop dashboard has expanded Bosch sections with program selectors, child lock, spin speed, wash temperature, and remote control status.
|
||||
|
||||
### Startup Automation
|
||||
|
||||
```yaml
|
||||
# /config/automations.yaml
|
||||
- id: set_visionos_theme
|
||||
alias: "Set visionOS Theme on Startup"
|
||||
trigger:
|
||||
- platform: homeassistant
|
||||
event: start
|
||||
action:
|
||||
- service: frontend.set_theme
|
||||
data:
|
||||
name: visionos
|
||||
```
|
||||
|
||||
## Known Issues
|
||||
|
||||
| Issue | Detail | Workaround |
|
||||
|-------|--------|------------|
|
||||
| Xiaomi devices on 192.168.31.x unreachable | Air purifiers, humidifier, Roborock on Xiaomi router subnet | Switch to cloud-only polling in xiaomi_miot config |
|
||||
| Cross-VLAN broadcast discovery | UDP broadcasts don't cross VLANs | Use manual IP config (e.g., `gree_manual` component) |
|
||||
| `/config/www/` not served after creation | HA needs full core restart to detect new `www` directory | `ha core restart` from HAOS VM |
|
||||
| `automations.yaml` syntax | Appending to `[]` creates invalid YAML | Always overwrite file, never append after `[]` |
|
||||
|
||||
## Pending Work
|
||||
|
||||
- [ ] Switch Xiaomi air purifiers/humidifier to cloud-only mode for reliable polling
|
||||
- [ ] Add more dashboard sections as new devices are added
|
||||
- [ ] Evaluate HACS frontend cards (mini-graph-card, apexcharts-card) for richer data display
|
||||
- [ ] Set up HA mobile app companion for phone notifications and presence detection
|
||||
|
||||
## Technical Notes
|
||||
|
||||
- **File transfer to HA:** base64 encode → virsh guest-exec → docker exec -i tee (chunk at 50KB for large files)
|
||||
- **HAOS minimal toolset:** Only `/usr/bin/curl`, `/sbin/ip` available — no wget/nc/python3/which
|
||||
- **Dashboard URL paths:** Must contain a hyphen (e.g., `dash-mobile`, not `mobile`)
|
||||
- **Lovelace resources:** Must be registered via websocket API, not by writing storage files directly
|
||||
- **HA API token:** Generated JWT with HMAC-SHA256 from refresh token's `jwt_key` in `/config/.storage/auth`
|
||||
- **Home Connect OAuth2:** Register app at developer.home-connect.com, use redirect URI `https://my.home-assistant.io/redirect/oauth`, disable "One Time Token Mode" (breaks HA token refresh)
|
||||
@@ -30,6 +30,12 @@ Planned changes, evaluations, and ideas not yet implemented.
|
||||
| [CONSOLE-PORT-ETHER5.md](CONSOLE-PORT-ETHER5.md) | EVALUATING | Low | Console/serial port on HAP1 ether5 |
|
||||
| [KVM-SWITCH-MAC-NOBARA.md](KVM-SWITCH-MAC-NOBARA.md) | EVALUATING | Medium | Software KVM for Mac/Nobara switching |
|
||||
|
||||
### Smart Home
|
||||
|
||||
| Document | Status | Priority | Description |
|
||||
|----------|--------|----------|-------------|
|
||||
| [HOME-ASSISTANT-SETUP.md](HOME-ASSISTANT-SETUP.md) | IN PROGRESS | High | HAOS VM, dashboards, themes, integrations |
|
||||
|
||||
### Applications
|
||||
|
||||
| Document | Status | Priority | Description |
|
||||
|
||||
66
scripts/vw-sync.sh
Executable file
66
scripts/vw-sync.sh
Executable file
@@ -0,0 +1,66 @@
|
||||
#!/bin/bash
|
||||
# Vaultwarden Sync: Unraid → MikroTik (cold standby)
|
||||
# Run this from your Mac (must have VPN/network access to both devices)
|
||||
#
|
||||
# Usage: ./vw-sync.sh
|
||||
# Syncs the Vaultwarden database from Unraid to MikroTik standby instance.
|
||||
# The MikroTik container must be STOPPED during sync.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
UNRAID_SSH="ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422"
|
||||
MIKROTIK_SSH="ssh -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.1"
|
||||
UNRAID_VW_PATH="/mnt/user/appdata/vaultwarden"
|
||||
MIKROTIK_USB_PATH="usb1/vaultwarden/data"
|
||||
HTTP_PORT=8888
|
||||
|
||||
echo "=== Vaultwarden Sync: Unraid → MikroTik ==="
|
||||
echo ""
|
||||
|
||||
# 1. Check MikroTik container is stopped
|
||||
echo "[1/5] Checking MikroTik Vaultwarden container status..."
|
||||
STATUS=$($MIKROTIK_SSH ':foreach c in=[/container/find where name~"server"] do={:put [/container/get $c status]}' 2>/dev/null || echo "unknown")
|
||||
if [ "$STATUS" = "running" ]; then
|
||||
echo " Container is running. Stopping it..."
|
||||
$MIKROTIK_SSH '/container/stop [find where name~"server"]'
|
||||
sleep 5
|
||||
fi
|
||||
echo " Container is stopped."
|
||||
|
||||
# 2. Start temporary HTTP server on Unraid
|
||||
echo "[2/5] Starting temp HTTP server on Unraid (port $HTTP_PORT)..."
|
||||
$UNRAID_SSH "cd $UNRAID_VW_PATH && php -S 0.0.0.0:$HTTP_PORT &>/dev/null &"
|
||||
sleep 2
|
||||
|
||||
# Verify it's responding
|
||||
if ! $UNRAID_SSH "curl -s -o /dev/null -w '%{http_code}' http://127.0.0.1:$HTTP_PORT/db.sqlite3" 2>/dev/null | grep -q "200"; then
|
||||
echo " ERROR: HTTP server not responding. Aborting."
|
||||
$UNRAID_SSH "pkill -f 'php -S' 2>/dev/null" || true
|
||||
exit 1
|
||||
fi
|
||||
echo " HTTP server ready."
|
||||
|
||||
# 3. Fetch files to MikroTik
|
||||
echo "[3/5] Syncing database to MikroTik..."
|
||||
$MIKROTIK_SSH "/tool/fetch url=\"http://192.168.10.20:$HTTP_PORT/db.sqlite3\" dst-path=\"$MIKROTIK_USB_PATH/db.sqlite3\""
|
||||
echo ""
|
||||
|
||||
echo "[4/5] Syncing RSA key and config..."
|
||||
$MIKROTIK_SSH "/tool/fetch url=\"http://192.168.10.20:$HTTP_PORT/rsa_key.pem\" dst-path=\"$MIKROTIK_USB_PATH/rsa_key.pem\""
|
||||
$MIKROTIK_SSH "/tool/fetch url=\"http://192.168.10.20:$HTTP_PORT/config.json\" dst-path=\"$MIKROTIK_USB_PATH/config.json\""
|
||||
echo ""
|
||||
|
||||
# 5. Cleanup
|
||||
echo "[5/5] Stopping HTTP server on Unraid..."
|
||||
$UNRAID_SSH "pkill -f 'php -S' 2>/dev/null" || true
|
||||
|
||||
echo ""
|
||||
echo "=== Sync complete! ==="
|
||||
echo ""
|
||||
echo "To START the standby Vaultwarden:"
|
||||
echo " $MIKROTIK_SSH '/container/start [find where name~\"server\"]'"
|
||||
echo ""
|
||||
echo "To STOP it after maintenance:"
|
||||
echo " $MIKROTIK_SSH '/container/stop [find where name~\"server\"]'"
|
||||
echo ""
|
||||
echo "Access URL: http://192.168.10.1:4743"
|
||||
Reference in New Issue
Block a user