diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md index 18f9479..caed8f9 100644 --- a/docs/CHANGELOG.md +++ b/docs/CHANGELOG.md @@ -6,6 +6,10 @@ ## 2026-02-06 +### Unraid Flash Drive Failure +- **[INCIDENT]** Unraid flash drive crashing - migration procedure created +- **[DOCS]** Created incident report with full flash drive replacement procedure + ### Documentation Restructure - **[DOCS]** Restructured docs/ from 23 files to clean 9-doc structure - **[DOCS]** Archived 12 completed VLAN migration project docs to archive/vlan-migration/ diff --git a/docs/incidents/2026-02-06-unraid-flash-drive-failure.md b/docs/incidents/2026-02-06-unraid-flash-drive-failure.md new file mode 100644 index 0000000..6e6aea5 --- /dev/null +++ b/docs/incidents/2026-02-06-unraid-flash-drive-failure.md @@ -0,0 +1,200 @@ +# Incident: Unraid Flash Drive Failure + +**Date:** 2026-02-06 +**Severity:** P1 - Server at risk +**Status:** In Progress +**Affected:** XTRM-U (Unraid NAS) + +--- + +## Symptoms + +Unraid flash drive experiencing crashes/instability. Risk of complete failure and data loss of boot configuration. + +--- + +## Migration Procedure: Replace Flash Drive + +### Step 1: Retrieve Flash Backup + +Try these options in order of preference: + +**Option A - Fresh backup from WebGUI (if server still boots):** +1. Open http://192.168.10.20 in browser +2. Go to **Main** tab → click on **Flash** device +3. Under Flash Device Settings, click **FLASH BACKUP** +4. Download the ZIP file to your Mac + +**Option B - Google Drive (daily Rclone backup):** +```bash +# From Mac (if rclone is installed) +rclone copy drive:Backups/unraid-flash ~/Desktop/unraid-flash-backup/ + +# Or download manually from Google Drive web UI +# Folder: Backups/unraid-flash +``` + +**Option C - Local backup on Unraid (if server boots but WebGUI broken):** +```bash +ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422 +# Backup is at: +ls /mnt/user/Backup/unraid-flash/ +# Copy it off the server: +scp -P 422 -i ~/.ssh/id_ed25519_unraid root@192.168.10.20:/mnt/user/Backup/unraid-flash/* ~/Desktop/unraid-flash-backup/ +``` + +**Option D - Direct copy from failing drive:** +1. Shut down server +2. Remove flash drive, insert into Mac +3. Copy entire contents to `~/Desktop/unraid-flash-backup/` + +--- + +### Step 2: Prepare New USB Drive + +**Requirements:** +- USB 2.0 recommended (more reliable than USB 3.0 for this purpose) +- Capacity: 4 GB minimum, 32 GB maximum +- Reputable brand (SanDisk, Samsung, Kingston) +- Must have a unique hardware GUID + +**Write the backup to new drive:** + +1. Download [Unraid USB Flash Creator](https://unraid.net/download) for macOS +2. Insert new USB drive into Mac +3. Open Flash Creator +4. For **Operating System**, scroll down and select **"Use custom"** +5. Browse to your backup ZIP file from Step 1 +6. Select the new USB drive as destination +7. Click **Write** and wait for completion + +**If you don't have a backup ZIP** (only raw files from Option D): +1. In Flash Creator, select the Unraid OS version matching your current install +2. Write a fresh Unraid install to the new drive +3. After writing, mount the drive and copy your backed-up `config/` folder onto it, replacing the default one + +--- + +### Step 3: Swap Drives and Boot + +1. Shut down XTRM-U if still running +2. Remove the old (failing) flash drive +3. Insert the new USB drive +4. Power on the server +5. Wait for boot (1-2 minutes) +6. Try accessing WebGUI at http://192.168.10.20 + +**If WebGUI doesn't load:** +- Connect a monitor to the server to check boot messages +- Verify the USB drive is detected in BIOS +- Ensure boot order has USB first + +--- + +### Step 4: Transfer License + +You will see an "Invalid, missing or expired registration key" message. This is expected. + +1. In WebGUI, go to **Tools → Registration** +2. Click **Replace Key** +3. Enter the email address associated with your Unraid account +4. Check your email for the confirmation/license key +5. Follow the link or paste the key file URL into the Registration page +6. Click **Done** + +**Important warnings:** +- Replacing the key **permanently blacklists** the old USB drive - it can never be used with Unraid again +- First license transfer can be done at any time +- Subsequent transfers: once per 12 months via the automated system +- If you need another transfer within 12 months, contact [Unraid support](https://unraid.net/contact) with old GUID, new GUID, license key, and purchase email + +**If you can't find your license:** +- Log into https://account.unraid.net to view your keys +- Check email for original purchase confirmation + +--- + +### Step 5: Post-Migration Verification + +Run through this checklist after the server is back up: + +**Array & Storage:** +- [ ] WebGUI loads at http://192.168.10.20 +- [ ] Array starts normally (Main tab → Start) +- [ ] All disks show healthy status +- [ ] Shares are accessible + +**Docker & Services:** +```bash +ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422 + +# Check all containers +docker ps -a --format 'table {{.Names}}\t{{.Status}}' + +# Start any stopped critical containers (in order): +docker start postgresql17 # Wait 30s +docker start Redis # Wait 10s +docker start traefik +docker start authentik authentik-worker +docker start vaultwarden +``` + +**Network:** +- [ ] SSH works: `ssh -i ~/.ssh/id_ed25519_unraid root@192.168.10.20 -p 422` +- [ ] DNS failover AdGuard reachable: http://192.168.10.10:3000 +- [ ] AdGuard sync working (check `docker logs adguardhome-sync --tail 5`) +- [ ] External URLs working (https://xtrm-lab.org) + +**Services checklist:** +- [ ] Traefik reverse proxy (https://xtrm-lab.org) +- [ ] Authentik SSO (https://auth.xtrm-lab.org) +- [ ] Gitea (https://git.xtrm-lab.org) +- [ ] Uptime Kuma (https://uptime.xtrm-lab.org) +- [ ] Vaultwarden (https://vault.xtrm-lab.org) +- [ ] Plex (https://plex.xtrm-lab.org) + +**Backup:** +- [ ] Verify Rclone config still present: `rclone listremotes` (should show `drive:`) +- [ ] Test flash backup: trigger manual backup from WebGUI or User Scripts +- [ ] Verify cron schedule for flash backup is active + +--- + +### Step 6: Prevention + +After successful migration: + +1. **Enable Unraid Connect** (if not already) for automated cloud flash backup: + - Settings → Management Access → Unraid Connect + - Sign in with your unraid.net account + - Enable Flash Backup + +2. **Verify Rclone cron** is scheduled: + ```bash + # Check user scripts plugin for flash backup schedule + ls /boot/config/plugins/user.scripts/scripts/ + ``` + +3. **Keep a spare USB drive** prepared with a fresh Unraid install - makes future recovery faster + +4. **Test backup restoration** periodically - don't wait for a failure to discover your backup is incomplete + +--- + +## References + +- [Unraid Docs: Changing the Flash Device](https://docs.unraid.net/unraid-os/system-administration/maintain-and-update/changing-the-flash-device/) +- [Unraid Docs: Licensing FAQ](https://docs.unraid.net/unraid-os/troubleshooting/licensing-faq/) +- Internal: `docs/02-SERVICES-CRITICAL.md` (startup order) + +--- + +## Resolution + +*Update this section when migration is complete:* + +- **Date resolved:** +- **New USB drive:** +- **License transferred:** Yes/No +- **Services verified:** Yes/No +- **Backup reconfigured:** Yes/No