Compare commits
7 Commits
0119c4d4d8
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d0b4fae25e | ||
|
|
6320c0f8d9 | ||
|
|
8aef54992a | ||
|
|
7867b5c950 | ||
|
|
cdb961f943 | ||
|
|
877aa71d3e | ||
|
|
bf6a62a275 |
@@ -68,6 +68,7 @@ infrastructure/
|
|||||||
├── 08-DNS-ARCHITECTURE.md # DNS failover architecture
|
├── 08-DNS-ARCHITECTURE.md # DNS failover architecture
|
||||||
├── 09-TAILSCALE-VPN.md # Tailscale VPN setup
|
├── 09-TAILSCALE-VPN.md # Tailscale VPN setup
|
||||||
├── 10-FAILOVER-NOBARA.md # VRRP failover to Nobara
|
├── 10-FAILOVER-NOBARA.md # VRRP failover to Nobara
|
||||||
|
├── 12-WIFI-TROUBLESHOOTING.md # WiFi/CAPsMAN troubleshooting guide
|
||||||
├── CHANGELOG.md # Change history
|
├── CHANGELOG.md # Change history
|
||||||
├── archive/ # Completed/legacy docs
|
├── archive/ # Completed/legacy docs
|
||||||
│ └── vlan-migration/ # VLAN migration project artifacts
|
│ └── vlan-migration/ # VLAN migration project artifacts
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# Other Services
|
# Other Services
|
||||||
|
|
||||||
**Last Updated:** 2026-02-14
|
**Last Updated:** 2026-02-24
|
||||||
|
|
||||||
Non-critical services that enhance functionality but don't affect core network operation.
|
Non-critical services that enhance functionality but don't affect core network operation.
|
||||||
|
|
||||||
@@ -300,3 +300,8 @@ Non-critical services that enhance functionality but don't affect core network o
|
|||||||
| Pi-hole | Replaced by AdGuard Home | Removed |
|
| Pi-hole | Replaced by AdGuard Home | Removed |
|
||||||
| Pangolin | Not in use | Removed |
|
| Pangolin | Not in use | Removed |
|
||||||
| Slurp'it | Replaced by Diode | Removed |
|
| Slurp'it | Replaced by Diode | Removed |
|
||||||
|
| binhex-plexpass | Duplicate of Plex | Removed |
|
||||||
|
| HomeAssistant_inabox | Duplicate of Home-Assistant-Container | Removed |
|
||||||
|
| Docker-WebUI | Unused, non-functional | Removed |
|
||||||
|
| hass-unraid | No config, unused | Removed |
|
||||||
|
| nextcloud-aio-mastercontainer | Replaced by Nextcloud container | Removed |
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# Hardware Inventory
|
# Hardware Inventory
|
||||||
|
|
||||||
**Last Updated:** 2026-02-14
|
**Last Updated:** 2026-02-24
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -109,18 +109,27 @@
|
|||||||
| **IP** | 192.168.10.20 |
|
| **IP** | 192.168.10.20 |
|
||||||
| **OS** | Unraid 6.x |
|
| **OS** | Unraid 6.x |
|
||||||
|
|
||||||
|
**Motherboard:** Replaced 2026-02-24 (new board, details TBD)
|
||||||
|
|
||||||
**Network:**
|
**Network:**
|
||||||
| Interface | MAC | Speed |
|
| Interface | MAC | Speed |
|
||||||
|-----------|-----|-------|
|
|-----------|-----|-------|
|
||||||
| eth1 | A8:B8:E0:02:B6:15 | 2.5G |
|
| br0 | 38:05:25:35:8E:7A | 2.5G |
|
||||||
| eth2 | A8:B8:E0:02:B6:16 | 2.5G |
|
|
||||||
| eth3 | A8:B8:E0:02:B6:17 | 2.5G |
|
|
||||||
| eth4 | A8:B8:E0:02:B6:18 | 2.5G |
|
|
||||||
| **bond0** | (virtual) | 5G aggregate |
|
|
||||||
|
|
||||||
**Storage:**
|
**Storage:**
|
||||||
- Cache: (current NVMe)
|
| Device | Model | Size | Role | Status |
|
||||||
- Array: 3.5" HDDs
|
|--------|-------|------|------|--------|
|
||||||
|
| sdb | HUH721010ALE601 (serial 7PHBNYZC) | 10TB | Parity | OK |
|
||||||
|
| disk1 | HUH721010ALE601 (serial 2TKK3K1D) | 10TB | Data (ZFS) | **FAILED** — clicking/head crash, emulated from parity |
|
||||||
|
| nvme0n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK |
|
||||||
|
| nvme1n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK |
|
||||||
|
| nvme2n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK |
|
||||||
|
|
||||||
|
**ZFS Pools:**
|
||||||
|
| Pool | Devices | Profile | Usable | Purpose |
|
||||||
|
|------|---------|---------|--------|---------|
|
||||||
|
| disk1 | md1p1 (parity-emulated) | single | 9.1TB | Main data (roms, media, appdata, backups) |
|
||||||
|
| cache | 3x Samsung 990 EVO Plus 1TB NVMe | RAIDZ1 | ~1.8TB | Docker, containers |
|
||||||
|
|
||||||
**Virtual IPs:**
|
**Virtual IPs:**
|
||||||
| IP | Purpose |
|
| IP | Purpose |
|
||||||
@@ -223,6 +232,7 @@ See: `wip/UPGRADE-2026-HARDWARE.md`
|
|||||||
|--------|------|--------|
|
|--------|------|--------|
|
||||||
| XTRM-N5 (Minisforum N5 Air) | Production server | Planned |
|
| XTRM-N5 (Minisforum N5 Air) | Production server | Planned |
|
||||||
| XTRM-N1 (N100 ITX) | Survival node | Planned |
|
| XTRM-N1 (N100 ITX) | Survival node | Planned |
|
||||||
| 3x Samsung 990 EVO Plus 1TB | XTRM-N5 NVMe pool | Planned |
|
| 3x Samsung 990 EVO Plus 1TB | XTRM-U cache pool (RAIDZ1) | **Installed** 2026-02-24 |
|
||||||
| 2x Fikwot FX501Pro 512GB | XTRM-N1 mirror | Planned |
|
| 2x Fikwot FX501Pro 512GB | XTRM-N1 mirror | Planned |
|
||||||
|
| 1x 10TB+ HDD | Replace failed disk1 | **Needed** |
|
||||||
| MikroTik CRS310-8G+2S+IN | Replace ZX1 | Future |
|
| MikroTik CRS310-8G+2S+IN | Replace ZX1 | Future |
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# WiFi and CAPsMAN Configuration
|
# WiFi and CAPsMAN Configuration
|
||||||
|
|
||||||
**Last Updated:** 2026-02-14
|
**Last Updated:** 2026-03-12
|
||||||
**Purpose:** Document WiFi network settings, CAPsMAN configuration, and device compatibility requirements
|
**Purpose:** Document WiFi network settings, CAPsMAN configuration, and device compatibility requirements
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -23,11 +23,12 @@
|
|||||||
| SSID | XTRM |
|
| SSID | XTRM |
|
||||||
| Band | 5GHz |
|
| Band | 5GHz |
|
||||||
| Mode | 802.11ax (WiFi 6) |
|
| Mode | 802.11ax (WiFi 6) |
|
||||||
| Channel | Auto (DFS enabled) |
|
| Channel | 5745 MHz (ch 149) |
|
||||||
| Width | 80MHz |
|
| Width | 20/40/80MHz |
|
||||||
| Security | WPA2-PSK + WPA3-PSK |
|
| Security | WPA2-PSK + WPA3-PSK |
|
||||||
| Cipher | CCMP (AES) |
|
| Cipher | CCMP (AES) |
|
||||||
| 802.11r (FT) | Enabled |
|
| 802.11r (FT) | Disabled |
|
||||||
|
| Skip DFS | All |
|
||||||
| Password | `M0stW4nt3d@home` |
|
| Password | `M0stW4nt3d@home` |
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -44,12 +45,14 @@ Some devices (Tuya JMWZG1 gateway, Amazfit TREX3, iPad 2) require legacy setting
|
|||||||
|---------|-------|--------|
|
|---------|-------|--------|
|
||||||
| SSID | XTRM2 | |
|
| SSID | XTRM2 | |
|
||||||
| Band | 2.4GHz | IoT compatibility |
|
| Band | 2.4GHz | IoT compatibility |
|
||||||
| Mode | **802.11g** | Legacy device support |
|
| Mode | **802.11g** | Legacy device support (NOT 802.11n — breaks IoT) |
|
||||||
| Channel | **1 (2412 MHz)** | Most compatible |
|
| Channel | **1 (2412 MHz)** | Most compatible |
|
||||||
| Width | **20MHz** | Required for old devices |
|
| Width | **20MHz** | Required for old devices |
|
||||||
| Security | **WPA-PSK + WPA2-PSK** | WPA needed for legacy |
|
| Security | **WPA-PSK + WPA2-PSK** | WPA needed for legacy |
|
||||||
| Cipher | **TKIP + CCMP** | TKIP required for old devices |
|
| Cipher | **TKIP + CCMP** | TKIP required for old devices |
|
||||||
| 802.11r (FT) | **Disabled** | Causes issues with IoT |
|
| 802.11r (FT) | **Disabled** | Causes issues with IoT |
|
||||||
|
|
||||||
|
**CRITICAL:** Security must be set explicitly on the interface, not just the profile. Empty `security.authentication-types=""` means OPEN network, not "inherit from profile." See [12-WIFI-TROUBLESHOOTING.md](12-WIFI-TROUBLESHOOTING.md).
|
||||||
| Password | `M0stW4nt3d@IoT` | |
|
| Password | `M0stW4nt3d@IoT` | |
|
||||||
|
|
||||||
### Devices Requiring WPA + TKIP
|
### Devices Requiring WPA + TKIP
|
||||||
@@ -98,44 +101,73 @@ If devices still can't connect, use WPA-only with TKIP-only:
|
|||||||
| Interfaces | bridge, vlan10-mgmt |
|
| Interfaces | bridge, vlan10-mgmt |
|
||||||
| Certificate | Auto-generated |
|
| Certificate | Auto-generated |
|
||||||
|
|
||||||
### CAP Device (CAP XL ac - 192.168.10.2)
|
### CAP Device (cAP XL ac - 192.168.10.2)
|
||||||
|
|
||||||
| Setting | Value |
|
| Setting | Value |
|
||||||
|---------|-------|
|
|---------|-------|
|
||||||
| caps-man-addresses | 192.168.10.1 |
|
| caps-man-addresses | 192.168.10.1 |
|
||||||
|
| discovery-interfaces | bridgeLocal |
|
||||||
|
| slaves-datapath | capdp (bridge=bridgeLocal, vlan-id=40) |
|
||||||
| certificate | request |
|
| certificate | request |
|
||||||
| RouterOS | 7.21.1 |
|
| RouterOS | 7.21.1 |
|
||||||
| SSH Port | 2222 |
|
| SSH Port | 2222 |
|
||||||
| SSH | `ssh -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.2` |
|
| SSH (via proxy) | See ProxyJump command below |
|
||||||
|
|
||||||
**Note:** CAP was factory reset on 2026-02-13. CAPsMAN certificate was regenerated and CAP re-enrolled with `certificate=request`.
|
**SSH Access:** Direct SSH to CAP is unreliable. Use ProxyJump through Unraid:
|
||||||
|
```bash
|
||||||
|
ssh -o ProxyCommand="ssh -i ~/.ssh/id_ed25519_unraid -p 422 -W %h:%p root@192.168.10.20" -i ~/.ssh/mikrotik_key -p 2222 xtrm@192.168.10.2
|
||||||
|
```
|
||||||
|
|
||||||
|
### CAP Bridge VLAN Filtering
|
||||||
|
|
||||||
|
The CAP runs bridge VLAN filtering to properly tag/untag WiFi client traffic before sending it to the HAP over the trunk link (ether1):
|
||||||
|
|
||||||
|
| Setting | Value |
|
||||||
|
|---------|-------|
|
||||||
|
| bridgeLocal | vlan-filtering=yes, pvid=10 |
|
||||||
|
| ether1 (trunk) | bridge port, PVID=10 |
|
||||||
|
| wifi1, wifi2 | dynamic bridge ports, PVID=40 (set by datapath vlan-id) |
|
||||||
|
|
||||||
|
**Bridge VLAN Table:**
|
||||||
|
|
||||||
|
| VLAN | ether1 | wifi1 | wifi2 | bridgeLocal | Purpose |
|
||||||
|
|------|--------|-------|-------|-------------|---------|
|
||||||
|
| 10 | untagged | - | - | untagged | Management |
|
||||||
|
| 20 | tagged | tagged | tagged | - | Trusted |
|
||||||
|
| 25 | tagged | tagged | tagged | - | Kids |
|
||||||
|
| 30 | tagged | tagged | tagged | - | IoT |
|
||||||
|
| 35 | tagged | tagged | tagged | - | Cameras |
|
||||||
|
| 40 | tagged | untagged | untagged | - | CatchAll (default) |
|
||||||
|
|
||||||
### CAP Interfaces
|
### CAP Interfaces
|
||||||
|
|
||||||
| Interface | Radio | Band | SSID | Security | Status |
|
| Interface | Radio | Band | SSID | Security | Status |
|
||||||
|-----------|-------|------|------|----------|--------|
|
|-----------|-------|------|------|----------|--------|
|
||||||
| cap-wifi1 | wifi1 | 2.4GHz | XTRM2 | WPA2-PSK, CCMP | Working |
|
| cap-wifi1 | MAC :BE | 2.4GHz | XTRM2 | WPA+WPA2, TKIP+CCMP | Working (Ch 13/2472, 20MHz, 802.11g) |
|
||||||
| cap-wifi2 | wifi2 | 5GHz | XTRM | WPA2/WPA3-PSK | Working (Ch 5220, 20/40MHz) |
|
| cap-wifi2 | MAC :BF | 5GHz | XTRM | WPA2/WPA3-PSK, CCMP | Working (Ch 36/5180, 20/40/80MHz, 802.11ac) |
|
||||||
|
|
||||||
**Note:** cap-wifi1 uses cfg-xtrm2 but with WPA2+CCMP only (not WPA+TKIP like the local wifi2). Legacy IoT devices requiring TKIP will only work on HAP1's local wifi2.
|
**Note:** CAP radios swapped after CAPsMAN re-provisioning. Identify by MAC address, not interface name. See [12-WIFI-TROUBLESHOOTING.md](12-WIFI-TROUBLESHOOTING.md) for re-provisioning procedures.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## WiFi Access List
|
## WiFi Access List
|
||||||
|
|
||||||
**Status:** VLAN assignment via access list is **not active** (rolled back 2026-01-27). All entries use `action=accept` without VLAN ID. Devices get their VLAN via DHCP static leases on the bridge.
|
**Status:** VLAN assignment via access list is **active**. Each entry has a `vlan-id` that assigns the device to the correct VLAN upon WiFi association. This works on both HAP (local) and CAP (remote, via bridge VLAN filtering).
|
||||||
|
|
||||||
**29 entries** configured (MAC-based accept rules + 1 default catch-all):
|
**30+ entries** configured (MAC-based accept rules with VLAN IDs + 1 default catch-all):
|
||||||
|
|
||||||
| # | MAC | Device | Notes |
|
| # | MAC | Device | VLAN |
|
||||||
|---|-----|--------|-------|
|
|---|-----|--------|------|
|
||||||
| 0 | AA:ED:8B:2A:40:F1 | Samsung S25 Ultra - Kaloyan | |
|
| 0 | AA:ED:8B:2A:40:F1 | Samsung S25 Ultra - Kaloyan | 20 |
|
||||||
| 1 | 82:6D:FB:D9:E0:47 | MacBook Air - Nora | |
|
| 1 | 82:6D:FB:D9:E0:47 | MacBook Air - Nora | 20 |
|
||||||
| 12 | CE:B8:11:EA:8D:55 | MacBook - Kaloyan | |
|
| 12 | CE:B8:11:EA:8D:55 | MacBook - Kaloyan | 20 |
|
||||||
| 13 | BE:A7:95:87:19:4A | MacBook 5GHz - Kaloyan | |
|
| 13 | BE:A7:95:87:19:4A | MacBook 5GHz - Kaloyan | 20 |
|
||||||
| 27 | B8:27:EB:32:B2:13 | RecalBox RPi3 | VLAN 25 (Kids) |
|
| 27 | B8:27:EB:32:B2:13 | RecalBox RPi3 | 25 |
|
||||||
| 28 | CC:5E:F8:D3:37:D3 | ASUS ROG Ally - Kaloyan | |
|
| 28 | CC:5E:F8:D3:37:D3 | ASUS ROG Ally - Kaloyan | 20 |
|
||||||
| 29 | (any) | Default - VLAN40 | Catch-all |
|
| 31 | C8:5C:CC:40:B4:AA | Xiaomi Air Purifier 2 | 30 |
|
||||||
|
| 32 | (any) | Default - VLAN40 | 40 (catch-all) |
|
||||||
|
|
||||||
|
**Default behavior:** Devices not in the access list get VLAN 40 (CatchAll) via the default rule and the datapath `vlan-id=40`.
|
||||||
|
|
||||||
### Show Full Access List
|
### Show Full Access List
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# DNS Architecture with AdGuard Failover
|
# DNS Architecture with AdGuard Failover
|
||||||
|
|
||||||
**Last Updated:** 2026-02-06
|
**Last Updated:** 2026-02-26
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -194,8 +194,10 @@ Settings are synced from Unraid (source of truth) to MikroTik every 30 minutes.
|
|||||||
|
|
||||||
### Sync Container
|
### Sync Container
|
||||||
|
|
||||||
|
Container: `adguardhome-sync` at 192.168.10.11 (br0 macvlan, static IP)
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
# /mnt/user/appdata/adguard-sync/adguardhome-sync.yaml
|
# /mnt/user/appdata/dockge/stacks/adguard-sync/adguardhome-sync.yaml
|
||||||
cron: "*/30 * * * *"
|
cron: "*/30 * * * *"
|
||||||
runOnStart: true
|
runOnStart: true
|
||||||
|
|
||||||
@@ -204,22 +206,13 @@ origin:
|
|||||||
username: jazzymc
|
username: jazzymc
|
||||||
password: 7RqWElENNbZnPW
|
password: 7RqWElENNbZnPW
|
||||||
|
|
||||||
replicas:
|
replica:
|
||||||
- url: http://192.168.10.1:3000
|
url: http://192.168.10.1:3000
|
||||||
username: jazzymc
|
username: jazzymc
|
||||||
password: 7RqWElENNbZnPW
|
password: 7RqWElENNbZnPW
|
||||||
|
|
||||||
features:
|
|
||||||
dns:
|
|
||||||
serverConfig: false
|
|
||||||
accessLists: true
|
|
||||||
rewrites: true
|
|
||||||
filters: true
|
|
||||||
clientSettings: true
|
|
||||||
services: true
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Note:** The sync container must be connected to both `dockerproxy` and `br0` networks to reach both AdGuard instances.
|
**Note:** The sync container is on the `br0` macvlan network with a static IP to avoid conflicts with infrastructure devices.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
275
docs/12-DEVELOPMENT-ENVIRONMENT.md
Normal file
275
docs/12-DEVELOPMENT-ENVIRONMENT.md
Normal file
@@ -0,0 +1,275 @@
|
|||||||
|
# Development Environment
|
||||||
|
|
||||||
|
**Last Updated:** 2026-03-08
|
||||||
|
|
||||||
|
Web-based development environment running directly on Unraid, providing VS Code IDE with full host access to Claude Code, Cooperator CLI, Docker, and all project repositories.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## OpenVSCode Server
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|----------|-------|
|
||||||
|
| **URL** | https://code.xtrm-lab.org |
|
||||||
|
| **Auth** | Authentik forward auth (SSO) |
|
||||||
|
| **Port** | 3100 (host-native, not a container) |
|
||||||
|
| **Binary** | `/mnt/user/appdata/openvscode/current/` (symlink) |
|
||||||
|
| **Config** | `/mnt/user/appdata/openvscode/config/` |
|
||||||
|
| **Boot Script** | `/mnt/user/appdata/openvscode/start.sh` |
|
||||||
|
| **Log** | `/mnt/user/appdata/openvscode/server.log` |
|
||||||
|
|
||||||
|
**Why host-native?** Running directly on Unraid (not in a container) means the VS Code terminal has full access to `claude`, `cooperator`, `node`, `npm`, `docker`, `git`, and all host tools. No volume mount hacks or container-breaking updates.
|
||||||
|
|
||||||
|
### Persistence
|
||||||
|
|
||||||
|
All data lives on the array (`/mnt/user/`) — survives Unraid OS updates:
|
||||||
|
|
||||||
|
| Component | Path | Purpose |
|
||||||
|
|-----------|------|---------|
|
||||||
|
| Server binary | `/mnt/user/appdata/openvscode/openvscode-server-v1.109.5-linux-x64/` | VS Code server |
|
||||||
|
| Symlink | `/mnt/user/appdata/openvscode/current` → version dir | Easy version switching |
|
||||||
|
| VS Code config | `/mnt/user/appdata/openvscode/config/` | Extensions, settings, themes |
|
||||||
|
| Start script | `/mnt/user/appdata/openvscode/start.sh` | Startup with PATH setup |
|
||||||
|
|
||||||
|
### Updating OpenVSCode Server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download new version
|
||||||
|
cd /mnt/user/appdata/openvscode
|
||||||
|
curl -fsSL "https://github.com/gitpod-io/openvscode-server/releases/download/openvscode-server-vX.Y.Z/openvscode-server-vX.Y.Z-linux-x64.tar.gz" -o new.tar.gz
|
||||||
|
tar xzf new.tar.gz && rm new.tar.gz
|
||||||
|
|
||||||
|
# Switch symlink and restart
|
||||||
|
ln -sfn openvscode-server-vX.Y.Z-linux-x64 current
|
||||||
|
pkill -f "openvscode-server.*--port 3100"
|
||||||
|
/mnt/user/appdata/openvscode/start.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Extensions and settings are preserved (stored separately in `config/`).
|
||||||
|
|
||||||
|
### Traefik Routing
|
||||||
|
|
||||||
|
Defined in `/mnt/user/appdata/traefik/dynamic.yml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
openvscode-secure:
|
||||||
|
rule: "Host(`code.xtrm-lab.org`)"
|
||||||
|
entryPoints: [https]
|
||||||
|
middlewares: [default-headers, authentik-forward-auth]
|
||||||
|
tls:
|
||||||
|
certResolver: cloudflare
|
||||||
|
service: openvscode
|
||||||
|
|
||||||
|
# ...
|
||||||
|
openvscode:
|
||||||
|
loadBalancer:
|
||||||
|
servers:
|
||||||
|
- url: "http://192.168.10.20:3100"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Claude Code
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|----------|-------|
|
||||||
|
| **Version** | 2.1.71 |
|
||||||
|
| **Binary** | `/mnt/user/appdata/claude-code/.npm-global/bin/claude` |
|
||||||
|
| **Symlink** | `/root/.local/bin/claude` |
|
||||||
|
| **Config** | `/mnt/user/appdata/claude-code/.claude.json` → `/root/.claude.json` |
|
||||||
|
| **Settings** | `/mnt/user/appdata/claude-code/.claude/` → `/root/.claude/` |
|
||||||
|
| **Boot Script** | `/mnt/user/appdata/claude-code/install-claude.sh` |
|
||||||
|
|
||||||
|
### Persistence
|
||||||
|
|
||||||
|
npm global prefix set to `/mnt/user/appdata/claude-code/.npm-global/` (array-backed). Boot script creates symlinks from `/root/` to persistent paths.
|
||||||
|
|
||||||
|
### Updating Claude Code
|
||||||
|
|
||||||
|
```bash
|
||||||
|
source /root/.bashrc
|
||||||
|
npm install -g @anthropic-ai/claude-code
|
||||||
|
claude --version
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cooperator CLI
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|----------|-------|
|
||||||
|
| **Version** | 3.36.1 |
|
||||||
|
| **Binary** | `/mnt/user/appdata/claude-code/.npm-global/bin/cooperator` |
|
||||||
|
| **Config** | `~/.cooperator/.env` (Shortcut token, Confluence, git config) |
|
||||||
|
| **Registry** | `@ampeco:registry=https://gitlab.com/api/v4/projects/71775017/packages/npm/` |
|
||||||
|
| **npm auth** | `/root/.npmrc` (GitLab PAT) |
|
||||||
|
|
||||||
|
### What Cooperator Install Sets Up
|
||||||
|
|
||||||
|
- **Commands** — `~/.claude/commands/cooperator` → cooperator's claude-commands
|
||||||
|
- **Agents** — `~/.claude/agents/implementation-task-executor.md`
|
||||||
|
- **Skills** — 12 cooperator skills (shortcut-operations, create-feature-story, gitlab-operations, etc.)
|
||||||
|
- **Shortcut API** — validated via `~/.cooperator/.env` token
|
||||||
|
|
||||||
|
### Updating Cooperator
|
||||||
|
|
||||||
|
```bash
|
||||||
|
source /root/.bashrc
|
||||||
|
npm install -g @ampeco/cooperator
|
||||||
|
cooperator --version
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note:** `/root/.npmrc` is in RAM — recreated on boot if needed. The GitLab PAT is stored in `/boot/config/go` would need a persistent `.npmrc` setup if token changes frequently.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## GitLab CLI (glab)
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|----------|-------|
|
||||||
|
| **Version** | 1.89.0 |
|
||||||
|
| **Binary** | `/usr/local/bin/glab` (RAM — lost on reboot) |
|
||||||
|
| **Config** | `~/.config/glab-cli/config.yml` |
|
||||||
|
| **Auth** | GitLab PAT (same as npm registry token) |
|
||||||
|
|
||||||
|
**Note:** glab binary at `/usr/local/bin/` is lost on Unraid reboot. Add to boot script or persist to appdata.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Python (via uv)
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|----------|-------|
|
||||||
|
| **uv** | `/root/.local/bin/uv` |
|
||||||
|
| **Python** | 3.12.13 (managed by uv) |
|
||||||
|
| **mikrotik-mcp venv** | `/mnt/user/projects/mikrotik-mcp/venv/` |
|
||||||
|
| **unraid-mcp venv** | `/mnt/user/projects/unraid-mcp/.venv/` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Custom Skills
|
||||||
|
|
||||||
|
6 custom skills synced from Mac to `/mnt/user/appdata/claude-code/custom-skills/`:
|
||||||
|
|
||||||
|
| Skill | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| ev-compliance-story | EV regulatory compliance story creation |
|
||||||
|
| ev-protocol-expert | OCPP/OCPI/AFIR protocol expertise |
|
||||||
|
| frontend-designer | Nova/Vue component design |
|
||||||
|
| mikrotik-admin | MikroTik router management via MCP |
|
||||||
|
| prd-generator | Product requirements documents |
|
||||||
|
| unraid-admin | Unraid server management via MCP |
|
||||||
|
|
||||||
|
Symlinked to `~/.claude/skills/` alongside 12 cooperator skills (18 total).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## MCP Servers
|
||||||
|
|
||||||
|
### Registered (TODO)
|
||||||
|
|
||||||
|
The following MCP servers need to be registered via `claude mcp add` on Unraid:
|
||||||
|
|
||||||
|
| Server | Command | Status |
|
||||||
|
|--------|---------|--------|
|
||||||
|
| **shortcut** | `node /mnt/user/appdata/claude-code/mcp-server-shortcut/dist/index.js` | Built, needs `claude mcp add` |
|
||||||
|
| **mikrotik** | `/mnt/user/projects/mikrotik-mcp/venv/bin/python -m mikrotik_mcp.server` | Venv ready, needs `claude mcp add` |
|
||||||
|
| **unraid** | `/mnt/user/projects/unraid-mcp/.venv/bin/python -m unraid_mcp.main` | Venv ready, needs `claude mcp add` |
|
||||||
|
| **playwright** | `npx -y @playwright/mcp@latest --isolated` | npx available, needs `claude mcp add` |
|
||||||
|
| **smartbear** | `npx -y @smartbear/mcp@latest` | npx available, needs `claude mcp add` |
|
||||||
|
|
||||||
|
### Environment Variables for MCPs
|
||||||
|
|
||||||
|
- **mikrotik**: `DEVICES_PATH=/mnt/user/projects/mikrotik-mcp/devices.json`
|
||||||
|
- **unraid**: `UNRAID_API_URL`, `UNRAID_API_KEY`, `UNRAID_MCP_TRANSPORT=stdio`, `UNRAID_VERIFY_SSL=false`
|
||||||
|
- **shortcut**: `SHORTCUT_API_TOKEN` (from `~/.cooperator/.env`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Projects Workspace
|
||||||
|
|
||||||
|
All projects at `/mnt/user/projects/`, opened as default folder in VS Code.
|
||||||
|
|
||||||
|
### Personal Projects (Gitea)
|
||||||
|
|
||||||
|
| Project | Gitea Repo | Description |
|
||||||
|
|---------|-----------|-------------|
|
||||||
|
| infrastructure | jazzymc/infrastructure | This repo — home lab documentation |
|
||||||
|
| claude-skills | jazzymc/claude-skills | Claude Code custom skills |
|
||||||
|
| mikrotik-mcp | jazzymc/mikrotik-mcp | MikroTik MCP server |
|
||||||
|
| unraid-mcp | jazzymc/unraid-mcp | Unraid MCP server |
|
||||||
|
| unraid-glass | jazzymc/unraid-glass | Unraid dashboard plugin |
|
||||||
|
| openclaw | jazzymc/openclaw | OpenClaw game project |
|
||||||
|
| nanobot-mcp | jazzymc/nanobot-mcp | Nanobot MCP server |
|
||||||
|
| nanobot-hkuds | jazzymc/nanobot-hkuds | Nanobot HKU DS |
|
||||||
|
| xtrm-agent | jazzymc/xtrm-agent | AI agent framework |
|
||||||
|
| geekmagic-smalltv | jazzymc/geekmagic-smalltv | SmallTV firmware |
|
||||||
|
| homarr | jazzymc/homarr | Homarr dashboard fork |
|
||||||
|
| shortcut-daily-sync | jazzymc/shortcut-daily-sync | Shortcut sync tool |
|
||||||
|
|
||||||
|
**Remote URL format:** `https://jazzymc:<token>@git.xtrm-lab.org/jazzymc/<repo>.git`
|
||||||
|
|
||||||
|
### AMPECO Work Projects
|
||||||
|
|
||||||
|
| Project | Source | Type |
|
||||||
|
|---------|--------|------|
|
||||||
|
| backend | GitLab (ampeco/apps/charge/backend) | Git clone |
|
||||||
|
| crm | GitLab (ampeco/apps/charge/crm) | Git clone |
|
||||||
|
| marketplace | GitLab (ampeco/apps/charge/marketplace) | Git clone |
|
||||||
|
| mobile-2 | GitLab (ampeco/apps/charge/mobile-2) | Git clone |
|
||||||
|
| ad-hoc-payment-web-app | GitLab (ampeco/apps/charge/external-apps/) | Git clone |
|
||||||
|
| dev-proxy | GitLab (ampeco/apps/shared/dev-proxy) | Git clone |
|
||||||
|
| ampeco-custom-dashboard-widgets-boilerplate | GitHub (ampeco/) | Git clone |
|
||||||
|
| docs | Local rsync | Reference docs |
|
||||||
|
| stories | Local rsync | Product stories |
|
||||||
|
| booking-ewa | Local rsync | Booking app |
|
||||||
|
| ewa-ui | Local rsync | EWA frontend |
|
||||||
|
| design-tokens | Local rsync | Design system tokens |
|
||||||
|
| ampeco-backup | Local rsync | Configuration backups |
|
||||||
|
| central_registry | Local rsync | Service registry |
|
||||||
|
| CCode-UI-Distribution-1.0.0 | Local rsync | UI distribution |
|
||||||
|
| automations | Local rsync | Automation scripts |
|
||||||
|
|
||||||
|
**GitLab auth:** OAuth2 PAT in remote URLs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Boot Sequence
|
||||||
|
|
||||||
|
`/boot/config/go` triggers on Unraid boot:
|
||||||
|
|
||||||
|
1. **Wait for array** — polls for `/mnt/user/appdata/claude-code` (up to 5 min)
|
||||||
|
2. **Claude Code setup** — `/mnt/user/appdata/claude-code/install-claude.sh`
|
||||||
|
- Creates symlinks (`/root/.local/bin/claude`, `/root/.claude`, `/root/.claude.json`)
|
||||||
|
- Writes `.bashrc` with persistent npm PATH
|
||||||
|
3. **OpenVSCode Server** — `/mnt/user/appdata/openvscode/start.sh`
|
||||||
|
- Kills any existing instance
|
||||||
|
- Starts on port 3100 with persistent config dir
|
||||||
|
- Sources Claude/Cooperator PATH for terminal sessions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Diagram
|
||||||
|
|
||||||
|
```
|
||||||
|
Browser → https://code.xtrm-lab.org
|
||||||
|
↓
|
||||||
|
Traefik (443) → Authentik SSO check
|
||||||
|
↓
|
||||||
|
OpenVSCode Server (:3100, host-native)
|
||||||
|
↓
|
||||||
|
Unraid Host Shell
|
||||||
|
├── claude (2.1.71)
|
||||||
|
├── cooperator (3.36.1)
|
||||||
|
├── glab (1.89.0)
|
||||||
|
├── node (22.18.0) / npm (10.9.3) / bun (1.3.10)
|
||||||
|
├── uv + python 3.12
|
||||||
|
├── docker / docker compose
|
||||||
|
├── git
|
||||||
|
└── /mnt/user/projects/
|
||||||
|
├── ampeco/ (18 AMPECO work projects)
|
||||||
|
├── infrastructure/
|
||||||
|
├── claude-skills/
|
||||||
|
├── mikrotik-mcp/
|
||||||
|
└── ... (12 personal repos)
|
||||||
|
```
|
||||||
237
docs/12-WIFI-TROUBLESHOOTING.md
Normal file
237
docs/12-WIFI-TROUBLESHOOTING.md
Normal file
@@ -0,0 +1,237 @@
|
|||||||
|
# WiFi / CAPsMAN Troubleshooting Guide
|
||||||
|
|
||||||
|
**Last Updated:** 2026-03-12
|
||||||
|
**Purpose:** Document known pitfalls, root causes, and diagnostic procedures for WiFi and CAPsMAN issues on the MikroTik HAP ax³ + cAP XL ac setup.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pitfall 1: Empty Inline Security Overrides = Open Network
|
||||||
|
|
||||||
|
**Severity:** CRITICAL
|
||||||
|
|
||||||
|
**Problem:** Setting `security.authentication-types=""` on a WiFi interface does NOT mean "inherit from security profile." RouterOS interprets an empty string as **no authentication (open network)**.
|
||||||
|
|
||||||
|
**Symptoms:**
|
||||||
|
- Devices connect but show empty AUTH-TYPE in registration table
|
||||||
|
- IoT devices that try WPA/WPA2 handshake silently fail — router logs show ZERO connection attempts
|
||||||
|
- Other devices connect fine (they accept open)
|
||||||
|
|
||||||
|
**Root Cause:** Attempting to "clear inline overrides" to inherit from the security profile by setting empty values. RouterOS treats empty string as explicit "no auth."
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```routeros
|
||||||
|
# Always set explicit values on the interface
|
||||||
|
/interface wifi set wifi2 \
|
||||||
|
security.authentication-types=wpa-psk,wpa2-psk \
|
||||||
|
security.encryption=tkip,ccmp
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rule:** NEVER set `security.authentication-types=""` or `security.encryption=""`. Always use explicit values matching the security profile.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pitfall 2: CAPsMAN Re-Provisioning Wipes Interface Config
|
||||||
|
|
||||||
|
**Severity:** HIGH
|
||||||
|
|
||||||
|
**Problem:** Running `/interface wifi capsman remote-cap provision` clears all configuration from cap-wifi interfaces — security, channel, datapath, and SSID are all removed. Interfaces show "SSID not set" and remain inactive.
|
||||||
|
|
||||||
|
**Fix:** After re-provisioning, manually re-apply full config:
|
||||||
|
```routeros
|
||||||
|
# 2.4GHz (cap-wifi1 = MAC :BE = 2.4GHz radio)
|
||||||
|
/interface wifi set cap-wifi1 \
|
||||||
|
configuration=cfg-xtrm2 security=sec-xtrm2 datapath=dp-cap \
|
||||||
|
channel.frequency=2472 channel.band=2ghz-g channel.width=20mhz
|
||||||
|
|
||||||
|
# 5GHz (cap-wifi2 = MAC :BF = 5GHz radio)
|
||||||
|
/interface wifi set cap-wifi2 \
|
||||||
|
configuration=cfg-xtrm security=sec-xtrm datapath=dp-cap \
|
||||||
|
channel.frequency=5180 channel.band=5ghz-ac \
|
||||||
|
channel.width=20/40/80mhz channel.skip-dfs-channels=all
|
||||||
|
|
||||||
|
# Re-enable both
|
||||||
|
/interface wifi enable cap-wifi1
|
||||||
|
/interface wifi enable cap-wifi2
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pitfall 3: Interface IDs Change After Re-Provisioning
|
||||||
|
|
||||||
|
**Severity:** HIGH
|
||||||
|
|
||||||
|
**Problem:** After CAPsMAN re-provisioning, cap-wifi interface internal IDs change (e.g., `*20`/`*21` become `*22`/`*23`). Access-list rules referencing old IDs stop matching.
|
||||||
|
|
||||||
|
**Symptom:** `client was disconnected because could not assign vlan` error on CAP interfaces.
|
||||||
|
|
||||||
|
**Fix:**
|
||||||
|
```routeros
|
||||||
|
# Check current IDs
|
||||||
|
:foreach i in=[/interface wifi find where name~"cap"] do={
|
||||||
|
:put ([/interface wifi get $i name] . " = " . $i)
|
||||||
|
}
|
||||||
|
|
||||||
|
# Update access-list rules
|
||||||
|
/interface wifi access-list set [find where interface=*OLD] interface=*NEW
|
||||||
|
```
|
||||||
|
|
||||||
|
**Best practice:** Don't use CAP-specific access-list rules. Let all clients (HAP and CAP) use the same MAC-based access list. The HAP handles VLAN assignment uniformly via CAPsMAN.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pitfall 4: CAP Radio-to-Interface Mapping Swap
|
||||||
|
|
||||||
|
**Severity:** MEDIUM
|
||||||
|
|
||||||
|
**Problem:** After re-provisioning, `cap-wifi1` and `cap-wifi2` may swap which physical radio they map to. Assigning 5GHz config to the 2.4GHz radio (or vice versa) causes "no available channels" error.
|
||||||
|
|
||||||
|
**Identification:** Check MAC addresses:
|
||||||
|
|
||||||
|
| MAC suffix | Radio | Must receive |
|
||||||
|
|------------|-------|--------------|
|
||||||
|
| :BE | 2.4GHz | 2.4GHz config (XTRM2) |
|
||||||
|
| :BF | 5GHz | 5GHz config (XTRM) |
|
||||||
|
|
||||||
|
**Fix:** Match config to the correct radio MAC, not the interface name.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pitfall 5: CAP Band Must Be AC, Not AX
|
||||||
|
|
||||||
|
**Severity:** MEDIUM
|
||||||
|
|
||||||
|
**Problem:** The cAP XL ac only supports 802.11ac. Setting band to `5ghz-ax` results in the radio not starting.
|
||||||
|
|
||||||
|
**Fix:** Always use `5ghz-ac` for the CAP 5GHz channel configuration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pitfall 6: IoT Devices Need Legacy WiFi Settings
|
||||||
|
|
||||||
|
**Severity:** HIGH
|
||||||
|
|
||||||
|
**Problem:** Many IoT devices (vacuums, smart gateways, ovens) require legacy WiFi settings to connect. Using 802.11n-only or WPA2-only silently prevents connections — the router sees zero attempts.
|
||||||
|
|
||||||
|
**Required XTRM2 (2.4GHz) settings:**
|
||||||
|
|
||||||
|
| Setting | Value | Reason |
|
||||||
|
|---------|-------|--------|
|
||||||
|
| Band | `2ghz-g` | NOT `2ghz-n` — IoT devices may only support 802.11g |
|
||||||
|
| Auth | `wpa-psk,wpa2-psk` | Some devices need WPA1 available |
|
||||||
|
| Encryption | `tkip,ccmp` | Some devices need TKIP |
|
||||||
|
| Channel width | `20mhz` | Maximum compatibility |
|
||||||
|
| FT (802.11r) | Disabled | Causes issues with IoT |
|
||||||
|
|
||||||
|
**Known devices requiring legacy support:**
|
||||||
|
- Roborock S7 Vacuum (B0:4A:39:3F:9A:14)
|
||||||
|
- Tuya Smart Gateway JMWZG1 (38:1F:8D:04:6F:E4)
|
||||||
|
- Bosch Oven (94:27:70:1E:0C:EE)
|
||||||
|
- Various other IoT appliances
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5GHz Channel Separation
|
||||||
|
|
||||||
|
HAP and CAP must use different 5GHz channels to avoid co-channel interference:
|
||||||
|
|
||||||
|
| Device | Channel | Frequency | Band |
|
||||||
|
|--------|---------|-----------|------|
|
||||||
|
| HAP wifi1 | 149 | 5745 MHz | 5ghz-ax |
|
||||||
|
| CAP cap-wifi2 | 36 | 5180 MHz | 5ghz-ac |
|
||||||
|
|
||||||
|
Both use `skip-dfs-channels=all` to avoid radar detection disconnects.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Diagnostic Checklist
|
||||||
|
|
||||||
|
When devices can't connect to WiFi, check in this order:
|
||||||
|
|
||||||
|
### Step 1: Check Security (Most Common Issue)
|
||||||
|
```routeros
|
||||||
|
# Check if AUTH-TYPE is empty in registration table (= open network!)
|
||||||
|
/interface wifi registration-table print
|
||||||
|
|
||||||
|
# Check inline security overrides
|
||||||
|
:put [/interface wifi get wifi2 security.authentication-types]
|
||||||
|
:put [/interface wifi get wifi2 security.encryption]
|
||||||
|
# If empty → security is broken, set explicit values
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Check Band Compatibility
|
||||||
|
```routeros
|
||||||
|
/interface wifi monitor wifi2 once
|
||||||
|
# If channel shows /n → change to /g for IoT compatibility
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Enable Debug Logging
|
||||||
|
```routeros
|
||||||
|
/system logging add topics=wireless,debug action=memory
|
||||||
|
# Then check: /log print where topics~"wireless"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Check CAP Interface IDs
|
||||||
|
```routeros
|
||||||
|
# Verify access-list rules reference current IDs
|
||||||
|
:foreach i in=[/interface wifi find where name~"cap"] do={
|
||||||
|
:put ([/interface wifi get $i name] . " = " . $i)
|
||||||
|
}
|
||||||
|
/interface wifi access-list print where interface~"\\*"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Check Radio-MAC Mapping
|
||||||
|
```routeros
|
||||||
|
# Verify cap interfaces are assigned to correct radios
|
||||||
|
/interface wifi print where name~"cap" proplist=name,mac-address,channel.band
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 6: If Router Sees ZERO Attempts
|
||||||
|
This means:
|
||||||
|
- **Security mismatch** — device won't even try (most likely empty auth = open)
|
||||||
|
- **Band incompatibility** — 802.11n-only blocks 802.11g devices
|
||||||
|
- **Device-side issue** — power cycle device, re-do WiFi setup from scratch
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quick Recovery Commands
|
||||||
|
|
||||||
|
### Restore XTRM2 (2.4GHz) to known working state
|
||||||
|
```routeros
|
||||||
|
/interface wifi security set sec-xtrm2 \
|
||||||
|
authentication-types=wpa-psk,wpa2-psk encryption=tkip,ccmp
|
||||||
|
|
||||||
|
/interface wifi set wifi2 \
|
||||||
|
security.authentication-types=wpa-psk,wpa2-psk \
|
||||||
|
security.encryption=tkip,ccmp \
|
||||||
|
security.ft=no security.ft-over-ds=no
|
||||||
|
|
||||||
|
/interface wifi channel set ch-2g-hap \
|
||||||
|
frequency=2412 band=2ghz-g width=20mhz
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restore XTRM (5GHz) to known working state
|
||||||
|
```routeros
|
||||||
|
/interface wifi security set sec-xtrm \
|
||||||
|
authentication-types=wpa2-psk,wpa3-psk encryption=ccmp
|
||||||
|
|
||||||
|
/interface wifi set wifi1 \
|
||||||
|
security.authentication-types=wpa2-psk,wpa3-psk \
|
||||||
|
security.encryption=ccmp \
|
||||||
|
security.ft=no security.ft-over-ds=no
|
||||||
|
|
||||||
|
/interface wifi channel set ch-5g-hap \
|
||||||
|
frequency=5745 band=5ghz-ax width=20/40/80mhz skip-dfs-channels=all
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restore CAP interfaces after re-provisioning
|
||||||
|
```routeros
|
||||||
|
/interface wifi set cap-wifi1 \
|
||||||
|
configuration=cfg-xtrm2 security=sec-xtrm2 datapath=dp-cap \
|
||||||
|
channel.frequency=2472 channel.band=2ghz-g channel.width=20mhz
|
||||||
|
/interface wifi set cap-wifi2 \
|
||||||
|
configuration=cfg-xtrm security=sec-xtrm datapath=dp-cap \
|
||||||
|
channel.frequency=5180 channel.band=5ghz-ac \
|
||||||
|
channel.width=20/40/80mhz channel.skip-dfs-channels=all
|
||||||
|
/interface wifi enable cap-wifi1
|
||||||
|
/interface wifi enable cap-wifi2
|
||||||
|
```
|
||||||
@@ -2,6 +2,138 @@
|
|||||||
|
|
||||||
**Purpose:** Major infrastructure events only. Minor changes are in git commit messages.
|
**Purpose:** Major infrastructure events only. Minor changes are in git commit messages.
|
||||||
|
|
||||||
|
---
|
||||||
|
## 2026-03-12
|
||||||
|
|
||||||
|
### WiFi Optimization & Troubleshooting
|
||||||
|
- **[WIFI]** Moved HAP 5GHz from ch 36 (5180) to ch 149 (5745), skip-dfs-channels=all
|
||||||
|
- **[WIFI]** Moved CAP 5GHz from ch 52 (5260) to ch 36 (5180), band corrected from ax to ac
|
||||||
|
- **[WIFI]** Separated HAP/CAP 5GHz channels to avoid co-channel interference
|
||||||
|
- **[WIFI]** Fixed sec-xtrm2 security: WPA+WPA2 with TKIP+CCMP for IoT compatibility
|
||||||
|
- **[WIFI]** Fixed critical bug: empty inline security.authentication-types="" caused wifi2 to run as open network — IoT devices silently failed to connect
|
||||||
|
- **[WIFI]** Set explicit encryption on all interfaces and security profiles (never leave empty)
|
||||||
|
- **[WIFI]** Removed CAP-specific access-list catch-all rules — all clients now use unified MAC-based access list
|
||||||
|
- **[WIFI]** Fixed CAP interface IDs in access-list after re-provisioning (*20/*21 → *22/*23)
|
||||||
|
- **[WIFI]** Restored 2.4GHz band to 2ghz-g (was changed to 2ghz-n, breaking IoT devices)
|
||||||
|
- **[WIFI]** Disabled FT (802.11r) on wifi1 (5GHz) for stability
|
||||||
|
- **[DOCS]** Added 12-WIFI-TROUBLESHOOTING.md with diagnostic checklist and recovery commands
|
||||||
|
|
||||||
|
---
|
||||||
|
## 2026-02-28
|
||||||
|
|
||||||
|
### Docker Container Audit & Migration to Dockge
|
||||||
|
- **[DOCKER]** Removed 4 orphan images: nextcloud/all-in-one, olprog/unraid-docker-webui, ghcr.io/ich777/doh-server, ghcr.io/idmedia/hass-unraid
|
||||||
|
- **[DOCKER]** Removed ancient pgAdmin4 v2.1 (status=Created) and fenglc/pgadmin4 image
|
||||||
|
- **[DOCKER]** Removed spaceinvaderone/ha_inabox image (replaced by Home-Assistant-Container)
|
||||||
|
- **[TRAEFIK]** Removed Docker provider constraint (`traefik.constraint=valid`) — Docker labels now auto-discovered
|
||||||
|
- **[TRAEFIK]** Cleaned up dynamic.yml: removed 14 stale/migrated router+service pairs (pangolin, pihole, doh, netbox, and services now using Docker labels)
|
||||||
|
- **[TRAEFIK]** Added dockge-secure router to dynamic.yml
|
||||||
|
- **[DOCKER]** Created 6 new Dockge stacks: docker-socket-proxy, tuyagateway, firefly, seekandwatch, ha-time-machine, homeassistant (replaced inabox with Container)
|
||||||
|
- **[DOCKER]** Migrated ALL 53 containers from dockerman to Dockge compose stacks (100% coverage)
|
||||||
|
- **[DOCKER]** Fixed Nextcloud Traefik rule: empty Host() → Host(`cloud.xtrm-lab.org`)
|
||||||
|
- **[DOCKER]** Fixed UptimeKuma Traefik rule: empty Host() → Host(`uptime.xtrm-lab.org`)
|
||||||
|
- **[DOCKER]** Fixed Homarr domain: `homarr.xtrm-lab.org` → `xtrm-lab.org` (root domain)
|
||||||
|
- **[DOCKER]** Fixed Netdisco entrypoint: `websecure` → `https`
|
||||||
|
- **[DOCKER]** Removed stale `traefik.constraint=valid` from Dockhand
|
||||||
|
- **[DOCKER]** Fixed Transmission middleware: removed non-existent `transmission-headers@file`
|
||||||
|
- **[DOCKER]** Added Authentik forward auth middleware to: n8n, homarr, transmission, speedtest-tracker, uptime-kuma, firefly, seekandwatch, open-webui, traefik dashboard, dockge, netalertx, urbackup, unimus
|
||||||
|
- **[DOCKER]** Added Traefik labels to: vaultwarden, open-webui (ai.xtrm-lab.org), firefly, seekandwatch
|
||||||
|
- **[DOCKER]** Added missing Unraid labels (icon, managed, webui) to: ntfy, timemachine, ollama, docker-socket-proxy, tuyagateway, all new stacks
|
||||||
|
- **[DOCKER]** Moved ollama + open-webui from bridge to dockerproxy network
|
||||||
|
- **[DOCKER]** Moved fireflyiii + firefly-data-importer from none to dockerproxy network
|
||||||
|
- **[DOCKER]** Moved SeekAndWatch from bridge to dockerproxy network
|
||||||
|
- **[DOCKER]** Removed traefik labels from host-network containers (plex, netalertx) — routed via dynamic.yml only
|
||||||
|
- **[DOCKER]** Fixed NetAlertX: added read_only, proper capabilities (NET_RAW/NET_ADMIN), and UID 20211
|
||||||
|
- **[DOCKER]** Removed empty netbox stack directory
|
||||||
|
|
||||||
|
## 2026-03-09
|
||||||
|
|
||||||
|
### Claude Code Tooling Completion
|
||||||
|
- **[SERVICE]** Installed Cooperator CLI v3.36.1 on Unraid (`npm install -g @ampeco/cooperator`)
|
||||||
|
- **[SERVICE]** Ran `cooperator install --non-interactive` — symlinked commands, agents, 12 skills to `~/.claude/`
|
||||||
|
- **[SERVICE]** Created `~/.cooperator/.env` with Shortcut API token, Confluence token, git config
|
||||||
|
- **[SERVICE]** Installed glab CLI v1.89.0 on Unraid (`/usr/local/bin/glab`) — authenticated as kaloyan.danchev
|
||||||
|
- **[SERVICE]** Installed uv package manager + Python 3.12.13 on Unraid
|
||||||
|
- **[SERVICE]** Created Python venvs for mikrotik-mcp and unraid-mcp projects
|
||||||
|
- **[SERVICE]** Copied MikroTik SSH key from Mac to Unraid — SSH to HAP ax3 verified working
|
||||||
|
- **[SERVICE]** Synced 6 custom Claude skills to `/mnt/user/appdata/claude-code/custom-skills/` (ev-compliance-story, ev-protocol-expert, frontend-designer, mikrotik-admin, prd-generator, unraid-admin)
|
||||||
|
- **[SERVICE]** Built shortcut MCP server at `/mnt/user/appdata/claude-code/mcp-server-shortcut/`
|
||||||
|
- **[SERVICE]** Enabled Claude plugins: ralph-loop, claude-md-management, playground
|
||||||
|
- **[DOCS]** Updated 12-DEVELOPMENT-ENVIRONMENT.md with Cooperator, glab, Python, skills, MCP sections
|
||||||
|
|
||||||
|
#### TODO — MCP Server Registration
|
||||||
|
The following MCP servers are built/ready but need `claude mcp add` registration (requires interactive Claude session on Unraid):
|
||||||
|
- shortcut, mikrotik, unraid, playwright, smartbear
|
||||||
|
|
||||||
|
## 2026-03-08
|
||||||
|
|
||||||
|
### Development Environment Setup
|
||||||
|
- **[SERVICE]** Installed OpenVSCode Server as host-native process (port 3100, not a container) — accessible at https://code.xtrm-lab.org
|
||||||
|
- **[SERVICE]** Traefik route added in dynamic.yml with Authentik forward auth
|
||||||
|
- **[SERVICE]** Boot auto-start via `/boot/config/go` → `/mnt/user/appdata/openvscode/start.sh`
|
||||||
|
- **[SERVICE]** Claude Code updated to v2.1.71, persistent at `/mnt/user/appdata/claude-code/.npm-global/`
|
||||||
|
- **[SERVICE]** Cooperator CLI v3.36.1 installed globally (`npm install -g @ampeco/cooperator`)
|
||||||
|
- **[SERVICE]** Created `/mnt/user/projects/` workspace with 12 personal repos (Gitea) + 18 AMPECO work projects (GitLab)
|
||||||
|
- **[DOCS]** Added `12-DEVELOPMENT-ENVIRONMENT.md` documenting full dev environment setup
|
||||||
|
|
||||||
|
### Docker Maintenance
|
||||||
|
- **[DOCKER]** Created Unraid Docker Manager XML templates for 11 containers missing them (adguardhome, gitea, minecraft, ntfy, ollama, open-webui, etc.)
|
||||||
|
- **[DOCKER]** Pulled new images for all 30 active Dockge stacks, 14 containers received updates
|
||||||
|
- **[DOCKER]** Cleaned up dangling images: 10.95 GB reclaimed
|
||||||
|
- **[DOCKER]** Organized all 42 containers into Docker Folders (12 folders: Infrastructure, Security, Monitoring, DevOps, Media, etc.)
|
||||||
|
- **[DOCKER]** Pushed 6 local-only projects to Gitea (claude-skills, mikrotik-mcp, unraid-mcp, nanobot-mcp, nanobot-hkuds, openclaw)
|
||||||
|
|
||||||
|
### Service Fixes
|
||||||
|
- **[FIX]** Gitea DB connection: fixed hardcoded PostgreSQL IP (172.18.0.13) → hostname `postgresql17` in compose and app.ini
|
||||||
|
- **[FIX]** Traefik: removed stale stopped container blocking restart
|
||||||
|
- **[FIX]** Redis: removed stale stopped container blocking recreate
|
||||||
|
|
||||||
|
## 2026-02-26
|
||||||
|
|
||||||
|
### WiFi & CAP VLAN Fixes
|
||||||
|
- **[WIFI]** Fixed 5GHz channel overlap: HAP wifi1 reduced from 80MHz to 40MHz at 5180MHz, CAP cap-wifi1 at 5220MHz (no overlap)
|
||||||
|
- **[WIFI]** Restored all 29 WiFi access-list MAC→VLAN entries (were missing/lost)
|
||||||
|
- **[WIFI]** Fixed cap-wifi2 band mismatch: was `band=2ghz-n` with frequency=5220 (5GHz), corrected to frequency=2412
|
||||||
|
- **[CAPSMAN]** Enabled bridge VLAN filtering on CAP (cAP XL ac) — all VLANs now properly tagged through CAP
|
||||||
|
- **[CAPSMAN]** CAP bridgeLocal config: vlan-filtering=yes, pvid=10, VLANs 10/20/25/30/35/40 with proper tagged/untagged members
|
||||||
|
- **[CAPSMAN]** Set `capdp` datapath vlan-id=40 for default PVID on dynamic wifi bridge ports
|
||||||
|
- **[CAPSMAN]** VLAN assignment through CAP now working — access-list vlan-id entries propagate correctly
|
||||||
|
- **[NETWORK]** Fixed AdGuard Home IP conflict: container was at 192.168.10.2 (CAP's IP), now static at 192.168.10.10
|
||||||
|
- **[NETWORK]** Fixed adguardhome-sync IP conflict: was at 192.168.10.3 (CSS326's IP), now static at 192.168.10.11
|
||||||
|
- **[WIFI]** Added Xiaomi Air Purifier 2 (C8:5C:CC:40:B4:AA) to access-list as VLAN 30 (IoT)
|
||||||
|
|
||||||
|
### WiFi Quality Optimization
|
||||||
|
- **[WIFI]** Fixed 2.4GHz co-channel interference: HAP on ch 1 (2412), CAP moved from ch 1 to ch 6 (2437)
|
||||||
|
- **[WIFI]** Fixed 5GHz overlap: HAP stays ch 36 (5180, 40MHz), CAP moved from ch 44 (5220) to ch 52 (5260, DFS)
|
||||||
|
- **[WIFI]** Fixed CAP 2.4GHz width from 40MHz to 20MHz for IoT compatibility
|
||||||
|
- **[WIFI]** TX power kept at defaults (17/16 dBm) — reduction caused kitchen coverage loss through concrete walls
|
||||||
|
|
||||||
|
## 2026-02-24
|
||||||
|
|
||||||
|
### Motherboard Replacement & NVMe Cache Pool
|
||||||
|
- **[HARDWARE]** Replaced XTRM-U motherboard — new MAC `38:05:25:35:8E:7A`, DHCP lease updated on MikroTik
|
||||||
|
- **[HARDWARE]** Confirmed disk1 (10TB HGST HUH721010ALE601, serial 2TKK3K1D) mechanically dead — clicking heads, fails on multiple SATA ports and new motherboard
|
||||||
|
- **[STORAGE]** Created new Unraid-managed cache pool: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1 (~1.8TB usable)
|
||||||
|
- **[STORAGE]** Pool settings: autotrim=on, compression=on
|
||||||
|
- **[DOCKER]** Migrated Docker from btrfs loopback image (disk1 HDD) to ZFS on NVMe cache pool
|
||||||
|
- **[DOCKER]** Docker now uses ZFS storage driver directly on `cache/system/docker` dataset
|
||||||
|
- **[DOCKER]** Recreated `dockerproxy` bridge network, rebuilt all 39 container templates
|
||||||
|
- **[DOCKER]** Restarted Dockge and critical stacks (adguardhome, ntfy, gitea, woodpecker, etc.)
|
||||||
|
- **[STORAGE]** Deleted old `docker.img` (200GB) from disk1
|
||||||
|
- **[INCIDENT]** disk1 still running in parity-emulated mode — replacement drive needed
|
||||||
|
|
||||||
|
### Post-Migration Container Cleanup
|
||||||
|
- **[NETWORK]** Fixed Traefik unreachable: removed stale Docker bridge (duplicate 172.18.0.0/16 subnet) + 7 orphaned bridges
|
||||||
|
- **[DOCKER]** Removed deprecated containers: DoH-Server, binhex-plexpass (duplicate of Plex)
|
||||||
|
- **[DOCKER]** Removed obsolete containers: HomeAssistant_inabox, Docker-WebUI, hass-unraid
|
||||||
|
- **[DOCKER]** Removed nextcloud-aio-mastercontainer (replaced by Nextcloud container)
|
||||||
|
- **[SERVICE]** Fixed adguardhome-sync: recreated config file (was directory from migration), switched to br0 network for macvlan reachability
|
||||||
|
- **[SERVICE]** Fixed diode stack: recreated .env, nginx.conf, OAuth2 client config; ran Hydra DB migration and client bootstrap
|
||||||
|
- **[SERVICE]** Fixed diode-agent: corrected YAML format, secrets, and Hydra authentication
|
||||||
|
- **[SERVICE]** Started unmarr (Homarr fork, 172.18.0.81) and rustfs (S3-compatible storage)
|
||||||
|
- **[DOCKER]** Final state: 53 containers running, pgAdmin4 stopped (utility)
|
||||||
|
- **[DOCS]** Updated 03-SERVICES-OTHER.md with removed containers
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 2026-02-14
|
## 2026-02-14
|
||||||
|
|||||||
91
docs/incidents/2026-02-20-disk1-hardware-failure.md
Normal file
91
docs/incidents/2026-02-20-disk1-hardware-failure.md
Normal file
@@ -0,0 +1,91 @@
|
|||||||
|
# Incident: Disk1 Hardware Failure (Clicking / SATA Link Failure)
|
||||||
|
|
||||||
|
**Date:** 2026-02-20
|
||||||
|
**Severity:** P2 - Degraded (no redundancy)
|
||||||
|
**Status:** Open — awaiting replacement drive (motherboard replaced, NVMe cache pool added Feb 24)
|
||||||
|
**Affected:** XTRM-U (Unraid NAS) — disk1 (data drive)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
disk1 (10TB HGST Ultrastar HUH721010ALE601, serial `2TKK3K1D`) has physically failed. The drive dropped off the SATA bus on Feb 18 at 19:15 and is now exhibiting clicking (head failure). The Unraid md array is running in **degraded/emulated mode**, reconstructing disk1 data from parity on the fly. All data is intact but there is **zero redundancy**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Timeline
|
||||||
|
|
||||||
|
| When | What |
|
||||||
|
|------|------|
|
||||||
|
| Feb 18 ~19:15 | `ata5: qc timeout` → multiple hard/soft resets → `reset failed, giving up` → `ata5.00: disable device` |
|
||||||
|
| Feb 18 19:17 | `super.dat` updated — md array marked disk1 as `DISK_DSBL` (213 errors) |
|
||||||
|
| Feb 20 13:14 | Investigation started. `sdc` completely absent from `/dev/`. ZFS pool `disk1` running on emulated `md1p1` with 0 errors |
|
||||||
|
| Feb 20 ~13:30 | Server rebooted, disk moved to new SATA port (ata5 → ata6). Same failure: `ata6: reset failed, giving up`. Clicking noise confirmed |
|
||||||
|
| Feb 24 | Motherboard replaced. Dead drive confirmed still dead on new hardware. New SATA port assignment. Drive is mechanically failed (clicking heads) |
|
||||||
|
| Feb 24 | New cache pool created: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1. Docker migrated from HDD loopback to NVMe ZFS |
|
||||||
|
|
||||||
|
## Drive Details
|
||||||
|
|
||||||
|
| Field | Value |
|
||||||
|
|-------|-------|
|
||||||
|
| Model | HUH721010ALE601 (HGST/WD Ultrastar He10) |
|
||||||
|
| Serial | 2TKK3K1D |
|
||||||
|
| Capacity | 10TB (9766436812 sectors) |
|
||||||
|
| Array slot | disk1 (slot 1) |
|
||||||
|
| Filesystem | ZFS (on md1p1) |
|
||||||
|
| Last known device | sdc |
|
||||||
|
| Accumulated md errors | 213 |
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
- **Array**: STARTED, degraded — disk1 emulated from parity (`sdb`)
|
||||||
|
- **ZFS pool `disk1`**: ONLINE, 0 errors, mounted on `md1p1` (parity reconstruction)
|
||||||
|
- **Parity drive** (`sdb`, serial `7PHBNYZC`): DISK_OK, 0 errors
|
||||||
|
- **All services**: Running normally (Docker containers, VMs)
|
||||||
|
- **Risk**: If parity drive fails, data is **unrecoverable**
|
||||||
|
|
||||||
|
## Diagnosis
|
||||||
|
|
||||||
|
- Drive fails on multiple SATA ports → not a port/cable issue
|
||||||
|
- Clicking noise on boot → mechanical head failure
|
||||||
|
- dmesg shows link responds but device never becomes ready → drive electronics partially functional, platters/heads dead
|
||||||
|
- Drive is beyond DIY repair
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
Mechanical failure of the hard drive (clicking = head crash or seized actuator). Not related to cache drive migration that happened around the same time — confirmed by syslog showing clean SATA link failure.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recovery Plan
|
||||||
|
|
||||||
|
### Step 1: Get Replacement Drive
|
||||||
|
- Must be 10TB or larger
|
||||||
|
- Check WD warranty: serial `HUH721010ALE601_2TKK3K1D` at https://support-en.wd.com/app/warrantycheck
|
||||||
|
- Any 3.5" SATA drive works (doesn't need to match model)
|
||||||
|
|
||||||
|
### Step 2: Install & Rebuild
|
||||||
|
1. Power off the server
|
||||||
|
2. Remove dead drive, install replacement in any SATA port
|
||||||
|
3. Boot Unraid
|
||||||
|
4. Go to **Main** → click on **Disk 1** (will show as "Not installed" or unmapped)
|
||||||
|
5. Stop the array
|
||||||
|
6. Assign the new drive to the **Disk 1** slot
|
||||||
|
7. Start the array — Unraid will prompt to **rebuild** from parity
|
||||||
|
8. Rebuild will take many hours for 10TB — do NOT interrupt
|
||||||
|
|
||||||
|
### Step 3: Post-Rebuild
|
||||||
|
1. Verify ZFS pool `disk1` is healthy: `zpool status disk1`
|
||||||
|
2. Run parity check from Unraid UI
|
||||||
|
3. Run SMART extended test on new drive: `smartctl -t long /dev/sdX`
|
||||||
|
4. Verify all ZFS datasets are intact
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
|
||||||
|
- Server is safe to run in degraded mode indefinitely, just without parity protection
|
||||||
|
- Avoid heavy writes if possible to reduce risk to parity drive
|
||||||
|
- New cache pool (3x Samsung 990 EVO Plus 1TB, ZFS RAIDZ1) now hosts all Docker containers
|
||||||
|
- Old docker.img loopback deleted from disk1 (200GB freed)
|
||||||
|
- Since disk1 uses ZFS on md, the rebuild reconstructs the raw block device — ZFS doesn't need any separate repair
|
||||||
Reference in New Issue
Block a user