From 877aa71d3e7599a12335626a9fce0120441c74fe Mon Sep 17 00:00:00 2001 From: Kaloyan Danchev Date: Tue, 24 Feb 2026 14:47:07 +0200 Subject: [PATCH] Update docs: motherboard swap, NVMe cache pool, Docker migration - New motherboard installed, MAC/DHCP updated - 3x Samsung 990 EVO Plus 1TB NVMe cache pool (ZFS RAIDZ1) - Docker migrated from HDD loopback to NVMe ZFS storage driver - disk1 confirmed dead (clicking heads), still on parity emulation - Hardware inventory, changelog, and incident report updated Co-Authored-By: Claude Opus 4.6 --- docs/04-HARDWARE-INVENTORY.md | 30 ++++++++++++------- docs/CHANGELOG.md | 16 ++++++++++ .../2026-02-20-disk1-hardware-failure.md | 7 +++-- 3 files changed, 41 insertions(+), 12 deletions(-) diff --git a/docs/04-HARDWARE-INVENTORY.md b/docs/04-HARDWARE-INVENTORY.md index 6087760..4736e23 100644 --- a/docs/04-HARDWARE-INVENTORY.md +++ b/docs/04-HARDWARE-INVENTORY.md @@ -1,6 +1,6 @@ # Hardware Inventory -**Last Updated:** 2026-02-14 +**Last Updated:** 2026-02-24 --- @@ -109,18 +109,27 @@ | **IP** | 192.168.10.20 | | **OS** | Unraid 6.x | +**Motherboard:** Replaced 2026-02-24 (new board, details TBD) + **Network:** | Interface | MAC | Speed | |-----------|-----|-------| -| eth1 | A8:B8:E0:02:B6:15 | 2.5G | -| eth2 | A8:B8:E0:02:B6:16 | 2.5G | -| eth3 | A8:B8:E0:02:B6:17 | 2.5G | -| eth4 | A8:B8:E0:02:B6:18 | 2.5G | -| **bond0** | (virtual) | 5G aggregate | +| br0 | 38:05:25:35:8E:7A | 2.5G | -**Storage:** -- Cache: (current NVMe) -- Array: 3.5" HDDs +**Storage:** +| Device | Model | Size | Role | Status | +|--------|-------|------|------|--------| +| sdb | HUH721010ALE601 (serial 7PHBNYZC) | 10TB | Parity | OK | +| disk1 | HUH721010ALE601 (serial 2TKK3K1D) | 10TB | Data (ZFS) | **FAILED** — clicking/head crash, emulated from parity | +| nvme0n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK | +| nvme1n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK | +| nvme2n1 | Samsung 990 EVO Plus 1TB | 1TB | Cache pool (RAIDZ1) | OK | + +**ZFS Pools:** +| Pool | Devices | Profile | Usable | Purpose | +|------|---------|---------|--------|---------| +| disk1 | md1p1 (parity-emulated) | single | 9.1TB | Main data (roms, media, appdata, backups) | +| cache | 3x Samsung 990 EVO Plus 1TB NVMe | RAIDZ1 | ~1.8TB | Docker, containers | **Virtual IPs:** | IP | Purpose | @@ -223,6 +232,7 @@ See: `wip/UPGRADE-2026-HARDWARE.md` |--------|------|--------| | XTRM-N5 (Minisforum N5 Air) | Production server | Planned | | XTRM-N1 (N100 ITX) | Survival node | Planned | -| 3x Samsung 990 EVO Plus 1TB | XTRM-N5 NVMe pool | Planned | +| 3x Samsung 990 EVO Plus 1TB | XTRM-U cache pool (RAIDZ1) | **Installed** 2026-02-24 | | 2x Fikwot FX501Pro 512GB | XTRM-N1 mirror | Planned | +| 1x 10TB+ HDD | Replace failed disk1 | **Needed** | | MikroTik CRS310-8G+2S+IN | Replace ZX1 | Future | diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 7049758..f1879e4 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -4,6 +4,22 @@ --- +## 2026-02-24 + +### Motherboard Replacement & NVMe Cache Pool +- **[HARDWARE]** Replaced XTRM-U motherboard — new MAC `38:05:25:35:8E:7A`, DHCP lease updated on MikroTik +- **[HARDWARE]** Confirmed disk1 (10TB HGST HUH721010ALE601, serial 2TKK3K1D) mechanically dead — clicking heads, fails on multiple SATA ports and new motherboard +- **[STORAGE]** Created new Unraid-managed cache pool: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1 (~1.8TB usable) +- **[STORAGE]** Pool settings: autotrim=on, compression=on +- **[DOCKER]** Migrated Docker from btrfs loopback image (disk1 HDD) to ZFS on NVMe cache pool +- **[DOCKER]** Docker now uses ZFS storage driver directly on `cache/system/docker` dataset +- **[DOCKER]** Recreated `dockerproxy` bridge network, rebuilt all 39 container templates +- **[DOCKER]** Restarted Dockge and critical stacks (adguardhome, ntfy, gitea, woodpecker, etc.) +- **[STORAGE]** Deleted old `docker.img` (200GB) from disk1 +- **[INCIDENT]** disk1 still running in parity-emulated mode — replacement drive needed + +--- + ## 2026-02-14 ### CAP XL ac Recovery diff --git a/docs/incidents/2026-02-20-disk1-hardware-failure.md b/docs/incidents/2026-02-20-disk1-hardware-failure.md index 6bb8c0f..e51c1f1 100644 --- a/docs/incidents/2026-02-20-disk1-hardware-failure.md +++ b/docs/incidents/2026-02-20-disk1-hardware-failure.md @@ -2,7 +2,7 @@ **Date:** 2026-02-20 **Severity:** P2 - Degraded (no redundancy) -**Status:** Open — awaiting replacement drive +**Status:** Open — awaiting replacement drive (motherboard replaced, NVMe cache pool added Feb 24) **Affected:** XTRM-U (Unraid NAS) — disk1 (data drive) --- @@ -21,6 +21,8 @@ disk1 (10TB HGST Ultrastar HUH721010ALE601, serial `2TKK3K1D`) has physically fa | Feb 18 19:17 | `super.dat` updated — md array marked disk1 as `DISK_DSBL` (213 errors) | | Feb 20 13:14 | Investigation started. `sdc` completely absent from `/dev/`. ZFS pool `disk1` running on emulated `md1p1` with 0 errors | | Feb 20 ~13:30 | Server rebooted, disk moved to new SATA port (ata5 → ata6). Same failure: `ata6: reset failed, giving up`. Clicking noise confirmed | +| Feb 24 | Motherboard replaced. Dead drive confirmed still dead on new hardware. New SATA port assignment. Drive is mechanically failed (clicking heads) | +| Feb 24 | New cache pool created: 3x Samsung 990 EVO Plus 1TB NVMe, ZFS RAIDZ1. Docker migrated from HDD loopback to NVMe ZFS | ## Drive Details @@ -84,5 +86,6 @@ Mechanical failure of the hard drive (clicking = head crash or seized actuator). - Server is safe to run in degraded mode indefinitely, just without parity protection - Avoid heavy writes if possible to reduce risk to parity drive -- The two NVMe SSDs (cache pool, ZFS mirror) are unaffected +- New cache pool (3x Samsung 990 EVO Plus 1TB, ZFS RAIDZ1) now hosts all Docker containers +- Old docker.img loopback deleted from disk1 (200GB freed) - Since disk1 uses ZFS on md, the rebuild reconstructs the raw block device — ZFS doesn't need any separate repair