HGST Ultrastar 10TB drive (serial 2TKK3K1D) failed on Feb 18. Array running degraded on parity emulation. Recovery plan documented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.5 KiB
3.5 KiB
Incident: Disk1 Hardware Failure (Clicking / SATA Link Failure)
Date: 2026-02-20 Severity: P2 - Degraded (no redundancy) Status: Open — awaiting replacement drive Affected: XTRM-U (Unraid NAS) — disk1 (data drive)
Summary
disk1 (10TB HGST Ultrastar HUH721010ALE601, serial 2TKK3K1D) has physically failed. The drive dropped off the SATA bus on Feb 18 at 19:15 and is now exhibiting clicking (head failure). The Unraid md array is running in degraded/emulated mode, reconstructing disk1 data from parity on the fly. All data is intact but there is zero redundancy.
Timeline
| When | What |
|---|---|
| Feb 18 ~19:15 | ata5: qc timeout → multiple hard/soft resets → reset failed, giving up → ata5.00: disable device |
| Feb 18 19:17 | super.dat updated — md array marked disk1 as DISK_DSBL (213 errors) |
| Feb 20 13:14 | Investigation started. sdc completely absent from /dev/. ZFS pool disk1 running on emulated md1p1 with 0 errors |
| Feb 20 ~13:30 | Server rebooted, disk moved to new SATA port (ata5 → ata6). Same failure: ata6: reset failed, giving up. Clicking noise confirmed |
Drive Details
| Field | Value |
|---|---|
| Model | HUH721010ALE601 (HGST/WD Ultrastar He10) |
| Serial | 2TKK3K1D |
| Capacity | 10TB (9766436812 sectors) |
| Array slot | disk1 (slot 1) |
| Filesystem | ZFS (on md1p1) |
| Last known device | sdc |
| Accumulated md errors | 213 |
Current State
- Array: STARTED, degraded — disk1 emulated from parity (
sdb) - ZFS pool
disk1: ONLINE, 0 errors, mounted onmd1p1(parity reconstruction) - Parity drive (
sdb, serial7PHBNYZC): DISK_OK, 0 errors - All services: Running normally (Docker containers, VMs)
- Risk: If parity drive fails, data is unrecoverable
Diagnosis
- Drive fails on multiple SATA ports → not a port/cable issue
- Clicking noise on boot → mechanical head failure
- dmesg shows link responds but device never becomes ready → drive electronics partially functional, platters/heads dead
- Drive is beyond DIY repair
Root Cause
Mechanical failure of the hard drive (clicking = head crash or seized actuator). Not related to cache drive migration that happened around the same time — confirmed by syslog showing clean SATA link failure.
Recovery Plan
Step 1: Get Replacement Drive
- Must be 10TB or larger
- Check WD warranty: serial
HUH721010ALE601_2TKK3K1Dat https://support-en.wd.com/app/warrantycheck - Any 3.5" SATA drive works (doesn't need to match model)
Step 2: Install & Rebuild
- Power off the server
- Remove dead drive, install replacement in any SATA port
- Boot Unraid
- Go to Main → click on Disk 1 (will show as "Not installed" or unmapped)
- Stop the array
- Assign the new drive to the Disk 1 slot
- Start the array — Unraid will prompt to rebuild from parity
- Rebuild will take many hours for 10TB — do NOT interrupt
Step 3: Post-Rebuild
- Verify ZFS pool
disk1is healthy:zpool status disk1 - Run parity check from Unraid UI
- Run SMART extended test on new drive:
smartctl -t long /dev/sdX - Verify all ZFS datasets are intact
Notes
- Server is safe to run in degraded mode indefinitely, just without parity protection
- Avoid heavy writes if possible to reduce risk to parity drive
- The two NVMe SSDs (cache pool, ZFS mirror) are unaffected
- Since disk1 uses ZFS on md, the rebuild reconstructs the raw block device — ZFS doesn't need any separate repair