From b250493d5a4f829891443340c27f1873ee1abb8c Mon Sep 17 00:00:00 2001 From: XTRM-Unraid Date: Sun, 25 Jan 2026 11:17:33 +0200 Subject: [PATCH] Major documentation restructure - consolidated docs New Structure: - 01-NETWORK-MAP.md - Network topology, IPs, Docker networks, services - 02-SERVICES-CRITICAL.md - DNS, Auth, Routing (P0/P1 services) - 03-SERVICES-OTHER.md - All non-critical services - 04-HARDWARE-INVENTORY.md - Physical devices and specs - 05-CHANGELOG.md - Major events only New Folders: - docs/archive/ - Legacy docs (read-only reference) - docs/wip/ - Planned changes and ideas - UPGRADE-2026-HARDWARE.md - N5 Air + N100 migration plan - GITOPS-CONTAINERS.md - Phase 2 container GitOps Changes: - Moved all 22 legacy docs to archive/ - Consolidated container IPs, physical map, and services into single network map - Extracted critical vs non-critical service classification - Simplified changelog to major events only Co-Authored-By: Claude Opus 4.5 --- README.md | 98 +++-- docs/01-NETWORK-MAP.md | 340 ++++++++++++++++++ docs/02-SERVICES-CRITICAL.md | 212 +++++++++++ docs/03-SERVICES-OTHER.md | 262 ++++++++++++++ docs/04-HARDWARE-INVENTORY.md | 184 ++++++++++ docs/05-CHANGELOG.md | 71 ++++ docs/{ => archive}/00-CHANGELOG.md | 0 docs/{ => archive}/00-CURRENT-STATE.md | 0 .../01-PHASE1-DNS-PORTABILITY.md | 0 .../02-PHASE2-FOSSORIAL-STACK.md | 0 .../03-PHASE3-AUTHENTIK-ZEROTRUST.md | 0 docs/{ => archive}/04-PHASE4-REMOTE-GAMING.md | 0 docs/{ => archive}/05-PHASE5-RUSTDESK.md | 0 docs/{ => archive}/06-CHANGELOG.md | 0 .../06-PHASE6-PORTAINER-MANAGEMENT.md | 0 docs/{ => archive}/07-CHANGELOG.md | 0 docs/{ => archive}/08-PHASE7-GITEA-GITOPS.md | 0 .../{ => archive}/09-MIKROTIK-WIFI-CAPSMAN.md | 0 .../10-VLAN-NETWORK-SEGMENTATION.md | 0 .../11-NETWORK-ASSET-INVENTORY.md | 0 .../12-PHASE8-NETDISCO-INTEGRATION.md | 0 .../13-CONTAINER-IP-ASSIGNMENTS.md | 0 docs/{ => archive}/AGENT-CREDENTIALS.md | 0 docs/{ => archive}/INFRASTRUCTURE-DIAGRAM.md | 0 docs/{ => archive}/NETBOX-DRAFT.md | 0 docs/{ => archive}/NETWORK-PHYSICAL-MAP.md | 0 docs/archive/README.md | 16 + docs/{ => archive}/unraid-claude.md | 0 docs/wip/GITOPS-CONTAINERS.md | 180 ++++++++++ docs/wip/README.md | 16 + docs/wip/UPGRADE-2026-HARDWARE.md | 229 ++++++++++++ 31 files changed, 1585 insertions(+), 23 deletions(-) create mode 100644 docs/01-NETWORK-MAP.md create mode 100644 docs/02-SERVICES-CRITICAL.md create mode 100644 docs/03-SERVICES-OTHER.md create mode 100644 docs/04-HARDWARE-INVENTORY.md create mode 100644 docs/05-CHANGELOG.md rename docs/{ => archive}/00-CHANGELOG.md (100%) rename docs/{ => archive}/00-CURRENT-STATE.md (100%) rename docs/{ => archive}/01-PHASE1-DNS-PORTABILITY.md (100%) rename docs/{ => archive}/02-PHASE2-FOSSORIAL-STACK.md (100%) rename docs/{ => archive}/03-PHASE3-AUTHENTIK-ZEROTRUST.md (100%) rename docs/{ => archive}/04-PHASE4-REMOTE-GAMING.md (100%) rename docs/{ => archive}/05-PHASE5-RUSTDESK.md (100%) rename docs/{ => archive}/06-CHANGELOG.md (100%) rename docs/{ => archive}/06-PHASE6-PORTAINER-MANAGEMENT.md (100%) rename docs/{ => archive}/07-CHANGELOG.md (100%) rename docs/{ => archive}/08-PHASE7-GITEA-GITOPS.md (100%) rename docs/{ => archive}/09-MIKROTIK-WIFI-CAPSMAN.md (100%) rename docs/{ => archive}/10-VLAN-NETWORK-SEGMENTATION.md (100%) rename docs/{ => archive}/11-NETWORK-ASSET-INVENTORY.md (100%) rename docs/{ => archive}/12-PHASE8-NETDISCO-INTEGRATION.md (100%) rename docs/{ => archive}/13-CONTAINER-IP-ASSIGNMENTS.md (100%) rename docs/{ => archive}/AGENT-CREDENTIALS.md (100%) rename docs/{ => archive}/INFRASTRUCTURE-DIAGRAM.md (100%) rename docs/{ => archive}/NETBOX-DRAFT.md (100%) rename docs/{ => archive}/NETWORK-PHYSICAL-MAP.md (100%) create mode 100644 docs/archive/README.md rename docs/{ => archive}/unraid-claude.md (100%) create mode 100644 docs/wip/GITOPS-CONTAINERS.md create mode 100644 docs/wip/README.md create mode 100644 docs/wip/UPGRADE-2026-HARDWARE.md diff --git a/README.md b/README.md index 0dba197..1154264 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,83 @@ -# XTRM-Lab Infrastructure +# XTRM Home Lab Infrastructure -This repository contains infrastructure documentation and CI/CD pipelines for the xtrm-lab.org homelab. +**Domain:** xtrm-lab.org +**Repository:** https://git.xtrm-lab.org/jazzymc/infrastructure -## Documentation +--- -All infrastructure documentation is in the `docs/` folder: +## Quick Reference -| Document | Description | -|----------|-------------| -| [00-CURRENT-STATE.md](docs/00-CURRENT-STATE.md) | Current infrastructure state | -| [01-PHASE1-DNS-PORTABILITY.md](docs/01-PHASE1-DNS-PORTABILITY.md) | DNS & Pi-hole setup | -| [02-PHASE2-FOSSORIAL-STACK.md](docs/02-PHASE2-FOSSORIAL-STACK.md) | Pangolin tunnel stack | -| [03-PHASE3-AUTHENTIK-ZEROTRUST.md](docs/03-PHASE3-AUTHENTIK-ZEROTRUST.md) | Authentik SSO | -| [04-PHASE4-REMOTE-GAMING.md](docs/04-PHASE4-REMOTE-GAMING.md) | Sunshine/Moonlight | -| [05-PHASE5-RUSTDESK.md](docs/05-PHASE5-RUSTDESK.md) | RustDesk setup | -| [06-PHASE6-PORTAINER-MANAGEMENT.md](docs/06-PHASE6-PORTAINER-MANAGEMENT.md) | Portainer | -| [08-PHASE7-GITEA-GITOPS.md](docs/08-PHASE7-GITEA-GITOPS.md) | Gitea & Woodpecker CI | +| Resource | Address | +|----------|---------| +| **Dashboard** | https://xtrm-lab.org | +| **NetBox** | https://netbox.xtrm-lab.org | +| **Git** | https://git.xtrm-lab.org | +| **CI/CD** | https://ci.xtrm-lab.org | +| **DNS Primary** | dns.xtrm-lab.org | +| **DNS Secondary** | dns2.xtrm-lab.org | + +--- + +## Documentation Structure + +``` +docs/ +├── 01-NETWORK-MAP.md # Network topology, IPs, Docker networks +├── 02-SERVICES-CRITICAL.md # DNS, Auth, Routing - must stay up +├── 03-SERVICES-OTHER.md # All other services +├── 04-HARDWARE-INVENTORY.md # Physical devices, specs, serials +├── 05-CHANGELOG.md # Major events only +├── wip/ # Planned changes & ideas +│ ├── UPGRADE-2026-HARDWARE.md +│ └── GITOPS-CONTAINERS.md +└── archive/ # Legacy docs (read-only) +``` + +--- + +## Key Devices + +| Device | IP | Role | +|--------|-----|------| +| HAP1 | 192.168.31.1 | Router, DNS, WiFi Controller | +| XTRM-U | 192.168.31.2 | Production Server (Unraid) | +| CSS1 | 192.168.31.9 | Distribution Switch | +| ZX1 | 192.168.31.7 | Core Switch (2.5G) | +| CAP | 192.168.31.6 | Wireless Access Point | + +--- + +## SSH Access + +```bash +# Unraid +ssh -i ~/.ssh/id_ed25519_unraid root@192.168.31.2 -p 422 + +# MikroTik Router +ssh -i ~/.ssh/mikrotik_key -p 2222 unraid@192.168.31.1 +``` + +--- + +## Emergency Recovery + +1. **DNS down?** → Clients fallback to 192.168.31.4 (secondary) +2. **Internet down?** → Check HAP1 at 192.168.31.1 +3. **Services down?** → Check Unraid at 192.168.31.2 +4. **Full outage?** → See `02-SERVICES-CRITICAL.md` startup order + +--- + +## Change Management + +- **Major changes:** Document in `05-CHANGELOG.md` +- **Minor changes:** Git commit messages only +- **Planned work:** Create doc in `wip/` folder + +--- ## CI/CD -This repo uses Woodpecker CI. See [.woodpecker.yml](.woodpecker.yml) for the pipeline configuration. +Woodpecker CI at https://ci.xtrm-lab.org -## Services - -| Service | URL | -|---------|-----| -| Gitea | https://git.xtrm-lab.org | -| Woodpecker CI | https://ci.xtrm-lab.org | -| Authentik | https://auth.xtrm-lab.org | -| Traefik | https://traefik.xtrm-lab.org | +Pipelines trigger on push to this repository. diff --git a/docs/01-NETWORK-MAP.md b/docs/01-NETWORK-MAP.md new file mode 100644 index 0000000..852e999 --- /dev/null +++ b/docs/01-NETWORK-MAP.md @@ -0,0 +1,340 @@ +# Network Map - xtrm-lab.org + +**Last Updated:** 2026-01-25 +**Domain:** xtrm-lab.org +**WAN IP:** 62.73.120.142 + +--- + +## Quick Reference + +| Resource | Address | +|----------|---------| +| **Dashboard** | https://xtrm-lab.org | +| **DNS Primary** | dns.xtrm-lab.org (HAP1) | +| **DNS Secondary** | dns2.xtrm-lab.org (XTRM-U) | +| **Unraid SSH** | `ssh -i ~/.ssh/id_ed25519_unraid root@192.168.31.2 -p 422` | +| **MikroTik SSH** | `ssh -i ~/.ssh/mikrotik_key -p 2222 unraid@192.168.31.1` | + +--- + +## Network Topology + +```mermaid +flowchart TB + subgraph Internet["Internet"] + ISP["IGP Fiber Gateway
(Vivacom)
62.73.120.x"] + end + + subgraph Rack19["19" Rack (3U)"] + HAP1["HAP1 | hAP ax³
192.168.31.1"] + PP1["PP1 | 24-port"] + CSS1["CSS1 | CSS326-24G-2S+
192.168.31.9"] + end + + subgraph Rack10["10" Rack (9U)"] + ZX1["ZX1 | ZX-SWTGW218AS
192.168.31.7"] + PP2["PP2 | 12-port"] + XTRMU["XTRM-U
192.168.31.2"] + end + + subgraph Wireless["WiFi"] + CAP["CAP | cAP XL ac
192.168.31.6"] + end + + ISP -->|"H-1 WAN"| HAP1 + HAP1 -->|"H-4 → ZX1-1"| ZX1 + HAP1 -->|"H-3 → CSS1-1"| CSS1 + ZX1 <-->|"⚡ 10G SFP+ ⚡"| CSS1 + ZX1 -->|"ZX1-2/3 via PP2"| XTRMU + HAP1 -->|"H-2 via PP1"| CAP + CSS1 -->|"Ports 16-24"| PP1 +``` + +--- + +## Physical Infrastructure + +### Rack Layout + +#### 10" Rack (9U) + +| U | Device | Model | IP | Notes | +|---|--------|-------|-----|-------| +| U9 | Shelf + ISP Gateway | Vivacom ONT | 62.73.120.2 | WAN | +| U8 | PP2 | 10" 12-port Cat6a | - | Patch panel | +| U7 | Shelf + ZX1 | ZX-SWTGW218AS | 192.168.31.7 | 8x2.5G + 2x10G SFP+ | +| U6 | (empty) | - | - | Reserved for XTRM-N1 | +| U1-U4 | XTRM-U | NAS Server | 192.168.31.2 | 4x 2.5GbE bond | + +#### 19" Rack (3U) + +| U | Device | Model | IP | Notes | +|---|--------|-------|-----|-------| +| U3 | Shelf + HAP1 | hAP ax³ | 192.168.31.1 | Router + WiFi controller | +| U2.5 | PP1 | 19" 24-port Cat6a | - | Room connections | +| U1 | CSS1 | CSS326-24G-2S+ | 192.168.31.9 | 24x1G + 2x10G SFP+ | + +### Backbone Links + +| Link | From | To | Speed | Type | +|------|------|----|-------|------| +| **Primary** | ZX1-SFP1 | CSS1-SFP1 | 10G | SFP+ DAC | +| Router→Core | HAP1 H-4 | ZX1-1 | 2.5G | Cat6a | +| Router→Dist | HAP1 H-3 | CSS1-1 | 1G | Cat6a | +| Server Bond | ZX1-2/3 | XTRM-U via PP2 | 2x 2.5G | Cat6a | + +--- + +## IP Address Allocation + +### Network: 192.168.31.0/24 + +#### Infrastructure Devices + +| IP | Device | Type | MAC | +|----|--------|------|-----| +| 192.168.31.1 | HAP1 \| hAP ax³ | Router | 78:9A:18:2C:A5:48 | +| 192.168.31.2 | XTRM-U | Server | A8:B8:E0:02:B6:15 | +| 192.168.31.6 | CAP \| cAP XL ac | Access Point | 18:FD:74:54:3D:BC | +| 192.168.31.7 | ZX1 \| ZX-SWTGW218AS | Switch | 1C:2A:A3:1E:78:67 | +| 192.168.31.9 | CSS1 \| CSS326-24G-2S+ | Switch | F4:1E:57:C9:BD:09 | + +#### Containers (br0 Macvlan) + +| IP | Container | Purpose | +|----|-----------|---------| +| 192.168.31.4 | AdGuard Home | DNS Secondary | +| 192.168.31.5 | Unbound | Recursive DNS (stopped) | +| 192.168.31.12 | TimeMachine | macOS backups | + +#### DHCP Ranges + +| Range | Purpose | +|-------|---------| +| 192.168.31.10-99 | Reserved (static) | +| 192.168.31.100-200 | DHCP Pool | +| 192.168.31.201-254 | Reserved | + +--- + +## Docker Networks + +### HAP1 (MikroTik Router) + +**Network:** 172.17.0.0/16 (bridge) + +| Container | IP | Purpose | +|-----------|-----|---------| +| AdGuard Home | 172.17.0.5 | DNS Primary (DoH/DoT/DoQ) | +| Tailscale | 172.17.0.4 | VPN mesh | + +### XTRM-U (Unraid Server) + +#### dockerproxy (172.18.0.0/16) + +**Static IP Assignments:** + +| Range | Purpose | +|-------|---------| +| 172.18.0.2-10 | Core Infrastructure | +| 172.18.0.11-15 | Security | +| 172.18.0.16-30 | Productivity | +| 172.18.0.31-40 | DevOps | +| 172.18.0.41-50 | NetDisco | +| 172.18.0.61-69 | NetBox | +| 172.18.0.70-79 | Diode Discovery | + +**Core Infrastructure (172.18.0.2-10)** + +| IP | Container | Purpose | +|----|-----------|---------| +| 172.18.0.2 | dockersocket | Docker socket proxy | +| 172.18.0.3 | traefik | Reverse proxy | +| 172.18.0.4 | homarr | Dashboard | + +**Security (172.18.0.11-15)** + +| IP | Container | Purpose | +|----|-----------|---------| +| 172.18.0.11 | authentik | Identity provider | +| 172.18.0.12 | authentik-worker | Background tasks | +| 172.18.0.13 | postgresql17 | Shared database | +| 172.18.0.14 | Redis | Shared cache/queue | +| 172.18.0.15 | vaultwarden | Password manager | + +**Productivity (172.18.0.16-30)** + +| IP | Container | Purpose | +|----|-----------|---------| +| 172.18.0.16 | actual-budget | Budget tracking | +| 172.18.0.17 | n8n | Workflow automation | +| 172.18.0.18 | Uptime-Kuma-API | Monitoring API | +| 172.18.0.19 | AutoKuma | Auto-monitor | +| 172.18.0.20 | UptimeKuma | Uptime monitoring | +| 172.18.0.21 | speedtest-tracker | Speed tests | +| 172.18.0.23 | Libation | Audiobooks | +| 172.18.0.24 | Nextcloud | Cloud storage | +| 172.18.0.25 | karakeep | Bookmarks | +| 172.18.0.26 | transmission | Torrent | +| 172.18.0.27 | adguardhome-sync | DNS sync | + +**DevOps (172.18.0.31-40)** + +| IP | Container | Purpose | +|----|-----------|---------| +| 172.18.0.31 | gitea | Git server | +| 172.18.0.32 | woodpecker-server | CI/CD server | +| 172.18.0.33 | woodpecker-agent | CI/CD agent | + +**NetDisco (172.18.0.41-50)** + +| IP | Container | Purpose | +|----|-----------|---------| +| 172.18.0.41 | netdisco-web | Web UI | +| 172.18.0.42 | netdisco-backend | SNMP poller | + +**NetBox (172.18.0.61-69)** + +| IP | Container | Purpose | +|----|-----------|---------| +| 172.18.0.61 | netbox | Web UI (DCIM/IPAM) | +| 172.18.0.62 | netbox-worker | Background tasks | +| 172.18.0.64 | netbox-redis-cache | Query cache | + +**Diode Discovery (172.18.0.70-79)** + +| IP | Container | Purpose | +|----|-----------|---------| +| 172.18.0.70 | diode-ingress | API Gateway | +| 172.18.0.71 | diode-ingester | Data ingestion | +| 172.18.0.72 | diode-reconciler | NetBox sync | +| 172.18.0.73 | diode-hydra | OAuth2 | +| 172.18.0.74 | diode-auth | Token service | + +#### Host Network Containers + +| Container | Purpose | +|-----------|---------| +| plex | Media server (:32400) | +| unimus | Network config backup | +| UrBackup | Backup server | +| NetAlertX | Network scanner | +| HomeAssistant | Home automation | + +#### Bridge Network (172.17.0.0/16) + +| Container | Purpose | +|-----------|---------| +| portainer | Container management | +| rustdesk-hbbs | RustDesk signaling | +| rustdesk-hbbr | RustDesk relay | + +--- + +## Port Forwarding (NAT) + +| External Port | Destination | Service | +|---------------|-------------|---------| +| 80 | 192.168.31.2:8001 | Traefik HTTP | +| 443 | 192.168.31.2:44301 | Traefik HTTPS | +| 853 | 172.17.0.5:853 | AdGuard DoT | +| 8853 | 172.17.0.5:8853 | AdGuard DoQ | +| 32400 | 192.168.31.2:32400 | Plex | +| 51413 | 192.168.31.2:51413 | Transmission | +| 21115-21119 | 192.168.31.2 | RustDesk | + +--- + +## DNS Architecture + +```mermaid +flowchart TB + subgraph External["External Access"] + DOH["DoH: dns.xtrm-lab.org"] + DOT["DoT: dns.xtrm-lab.org:853"] + end + + subgraph HAP1["HAP1 (Primary)"] + AGH1["AdGuard Home
172.17.0.5"] + end + + subgraph XTRMU["XTRM-U (Secondary)"] + AGH2["AdGuard Home
192.168.31.4"] + end + + subgraph Sync["Sync"] + SYNC["adguardhome-sync
Every 30 min"] + end + + DOH --> AGH1 + DOT --> AGH1 + AGH1 <-.->|sync| SYNC + SYNC <-.->|sync| AGH2 + AGH1 --> Q9["Quad9 DoH"] + AGH2 --> Q9 +``` + +--- + +## WiFi Networks + +| SSID | Band | Security | Purpose | +|------|------|----------|---------| +| XTRM | 5GHz | WPA2/WPA3 | Primary devices | +| XTRM | 2.4GHz | WPA/WPA2 | Legacy support | +| XTRM2 | 2.4GHz | WPA/WPA2 | IoT devices | + +**CAPsMAN:** HAP1 manages CAP access point + +--- + +## External URLs + +| Service | URL | +|---------|-----| +| Dashboard | https://xtrm-lab.org | +| Auth | https://auth.xtrm-lab.org | +| Git | https://git.xtrm-lab.org | +| CI/CD | https://ci.xtrm-lab.org | +| NetBox | https://netbox.xtrm-lab.org | +| Uptime | https://uptime.xtrm-lab.org | +| Plex | https://plex.xtrm-lab.org | +| Nextcloud | https://cloud.xtrm-lab.org | +| Vault | https://vault.xtrm-lab.org | +| NetDisco | https://netdisco.xtrm-lab.org | + +--- + +## Room Outlets + +| Room | Outlets | Switch Ports | Status | +|------|---------|--------------|--------| +| Living Room | L1, L2, L3 | CSS1-22/23/24 | Active | +| Main Bedroom | M1, M2, M3 | CSS1-19/20/21 | Active | +| Boys Room | B1, B2 | CSS1-17/18 | Active | +| Girls Room | G1 | CSS1-16 | Unused | +| Corridor | C1 (CAP) | HAP1 H-2 | Active | + +--- + +## Shared Databases + +### PostgreSQL 17 (172.18.0.13) + +| Database | User | Consumer | +|----------|------|----------| +| authentik_db | authentik_user | Authentik | +| netbox | netbox_user | NetBox | +| gitea | gitea_user | Gitea | +| netdisco_db | netdisco_user | NetDisco | +| diode | diode_user | Diode Reconciler | +| hydra | hydra_user | Diode Hydra | + +### Redis (172.18.0.14) + +| Consumer | Purpose | +|----------|---------| +| Authentik | Session cache | +| NetBox Worker | Task queue | +| Diode | Ingestion queue | diff --git a/docs/02-SERVICES-CRITICAL.md b/docs/02-SERVICES-CRITICAL.md new file mode 100644 index 0000000..230ec07 --- /dev/null +++ b/docs/02-SERVICES-CRITICAL.md @@ -0,0 +1,212 @@ +# Critical Services + +**Last Updated:** 2026-01-25 + +Services that must remain operational for network functionality and security. + +--- + +## Priority Levels + +| Priority | Meaning | Recovery Target | +|----------|---------|-----------------| +| **P0** | Network offline without this | < 5 minutes | +| **P1** | Major functionality impacted | < 1 hour | + +--- + +## P0 - Network Core + +### DNS (AdGuard Home) + +| Instance | Host | IP | Role | +|----------|------|-----|------| +| Primary | HAP1 | 172.17.0.5 | Main DNS, DoH/DoT/DoQ | +| Secondary | XTRM-U | 192.168.31.4 | Failover DNS | + +**Endpoints:** +- DoH: `https://dns.xtrm-lab.org/dns-query` +- DoT: `tls://dns.xtrm-lab.org:853` +- DoQ: `quic://dns.xtrm-lab.org:8853` + +**Config Sync:** adguardhome-sync (every 30 min) + +**Upstream:** Quad9 DoH (`https://dns10.quad9.net/dns-query`) + +**Recovery:** +1. If primary fails → clients use secondary (192.168.31.4) +2. Restart container on HAP1: `/container/start adguardhome` + +--- + +### Routing (HAP1) + +| Function | Details | +|----------|---------| +| WAN | 62.73.120.142 via Vivacom fiber | +| LAN | 192.168.31.0/24 | +| NAT | Port forwarding to XTRM-U | +| Firewall | RouterOS firewall rules | + +**Recovery:** +1. Physical access to HAP1 +2. Reset: hold reset button 5s +3. Reconfigure via WinBox or SSH + +--- + +### DHCP (HAP1) + +| Pool | Range | +|------|-------| +| Dynamic | 192.168.31.100-200 | +| Lease Time | 24 hours | + +**Static Leases:** Managed in RouterOS DHCP server + +--- + +## P1 - Authentication & Secrets + +### Authentik (SSO) + +| Component | IP | Purpose | +|-----------|-----|---------| +| authentik | 172.18.0.11 | Web UI + OIDC | +| authentik-worker | 172.18.0.12 | Background tasks | + +**URL:** https://auth.xtrm-lab.org + +**Protects:** +- Traefik forward auth (all *.xtrm-lab.org) +- Gitea OAuth +- Woodpecker OAuth +- NetBox OAuth +- NetDisco SSO + +**Recovery:** +```bash +cd /mnt/user/appdata/authentik +docker compose up -d +``` + +**Database:** postgresql17 (authentik_db) + +--- + +### Vaultwarden (Passwords) + +| Component | IP | Purpose | +|-----------|-----|---------| +| vaultwarden | 172.18.0.15 | Password manager | + +**URL:** https://vault.xtrm-lab.org + +**Data:** `/mnt/user/appdata/vaultwarden/` + +**Recovery:** +```bash +docker start vaultwarden +``` + +**Backup:** Part of Unraid flash backup + +--- + +## P1 - Reverse Proxy + +### Traefik + +| Component | IP | Ports | +|-----------|-----|-------| +| traefik | 172.18.0.3 | 8001→80, 44301→443 | + +**Config:** `/mnt/user/appdata/traefik/` +- `traefik.yml` - Static config +- `dynamic.yml` - Routers & services + +**TLS:** Let's Encrypt wildcard for *.xtrm-lab.org + +**Recovery:** +```bash +docker start traefik +``` + +--- + +## Shared Infrastructure + +### PostgreSQL 17 + +| IP | Databases | +|----|-----------| +| 172.18.0.13 | authentik_db, netbox, gitea, netdisco_db, diode, hydra | + +**Data:** `/mnt/user/appdata/postgresql17/` + +**Recovery:** +```bash +docker start postgresql17 +# Wait for DB to be ready before starting dependents +``` + +### Redis + +| IP | Consumers | +|----|-----------| +| 172.18.0.14 | Authentik, NetBox, Diode | + +**Recovery:** +```bash +docker start Redis +``` + +--- + +## Startup Order + +When recovering from full outage: + +1. **postgresql17** - Database (wait 30s) +2. **Redis** - Cache/queue (wait 10s) +3. **traefik** - Reverse proxy +4. **authentik** + **authentik-worker** - SSO +5. **vaultwarden** - Passwords +6. All other services + +--- + +## Monitoring + +### Uptime Kuma + +| URL | Monitors | +|-----|----------| +| https://uptime.xtrm-lab.org | 27 services | + +**Alerts:** Configured per service (email/webhook) + +--- + +## Backup Strategy + +| Data | Location | Frequency | +|------|----------|-----------| +| Unraid Flash | Google Drive | Daily | +| PostgreSQL | `/mnt/user/Backup/` | Daily | +| Vaultwarden | Unraid Flash | With flash backup | +| Authentik | PostgreSQL + `/mnt/user/appdata/authentik/` | Daily | + +--- + +## Future: XTRM-N1 Survival Node + +When hardware upgrade completes, these services will have replicas on XTRM-N1: + +| Service | Primary | Replica | +|---------|---------|---------| +| DNS | HAP1 | XTRM-N1 | +| Vaultwarden | XTRM-N5 | XTRM-N1 | +| Authentik | XTRM-N5 | XTRM-N1 | + +See: `wip/UPGRADE-2026-HARDWARE.md` diff --git a/docs/03-SERVICES-OTHER.md b/docs/03-SERVICES-OTHER.md new file mode 100644 index 0000000..d9837ca --- /dev/null +++ b/docs/03-SERVICES-OTHER.md @@ -0,0 +1,262 @@ +# Other Services + +**Last Updated:** 2026-01-25 + +Non-critical services that enhance functionality but don't affect core network operation. + +--- + +## DevOps + +### Gitea (Git Server) + +| Component | IP | URL | +|-----------|-----|-----| +| gitea | 172.18.0.31 | https://git.xtrm-lab.org | + +**Ports:** 2222→22 (SSH), 3005→3000 (HTTP) +**Auth:** Authentik OAuth2 +**Database:** postgresql17 (gitea) + +### Woodpecker CI + +| Component | IP | URL | +|-----------|-----|-----| +| woodpecker-server | 172.18.0.32 | https://ci.xtrm-lab.org | +| woodpecker-agent | 172.18.0.33 | - | + +**Auth:** Gitea OAuth2 +**Purpose:** CI/CD pipelines for infrastructure repo + +--- + +## Network Management + +### NetBox (DCIM/IPAM) + +| Component | IP | URL | +|-----------|-----|-----| +| netbox | 172.18.0.61 | https://netbox.xtrm-lab.org | +| netbox-worker | 172.18.0.62 | - | +| netbox-redis-cache | 172.18.0.64 | - | + +**Database:** postgresql17 (netbox) +**Plugins:** diode, nextbox-ui, dns, inventory, interface-sync, routing + +### NetDisco (Network Discovery) + +| Component | IP | URL | +|-----------|-----|-----| +| netdisco-web | 172.18.0.41 | https://netdisco.xtrm-lab.org | +| netdisco-backend | 172.18.0.42 | - | + +**Database:** postgresql17 (netdisco_db) +**Purpose:** SNMP-based device discovery, MAC/ARP tracking + +### Diode (NetBox Discovery) + +| Component | IP | Purpose | +|-----------|-----|---------| +| diode-ingress | 172.18.0.70 | API Gateway | +| diode-ingester | 172.18.0.71 | Data ingestion | +| diode-reconciler | 172.18.0.72 | NetBox sync | +| diode-hydra | 172.18.0.73 | OAuth2 | +| diode-auth | 172.18.0.74 | Token service | +| diode-agent | host | Network scanner | + +**Discovery:** 192.168.31.0/24 every 30 minutes + +### Unimus + +| Network | URL | +|---------|-----| +| host | https://unimus.xtrm-lab.org | + +**Purpose:** Network device configuration backup (MikroTik) + +--- + +## Monitoring + +### Uptime Kuma + +| Component | IP | URL | +|-----------|-----|-----| +| UptimeKuma | 172.18.0.20 | https://uptime.xtrm-lab.org | +| Uptime-Kuma-API | 172.18.0.18 | - | +| AutoKuma | 172.18.0.19 | - | + +**Monitors:** 27 services configured + +### Speedtest Tracker + +| Component | IP | URL | +|-----------|-----|-----| +| speedtest-tracker | 172.18.0.21 | https://speedtest.xtrm-lab.org | + +### NetAlertX + +| Network | URL | +|---------|-----| +| host | https://netalert.xtrm-lab.org | + +**Purpose:** Network device discovery and alerting + +--- + +## Media + +### Plex + +| Network | Port | URL | +|---------|------|-----| +| host | 32400 | https://plex.xtrm-lab.org | + +**Libraries:** Movies, TV Shows, Music + +### Libation + +| Component | IP | +|-----------|-----| +| Libation | 172.18.0.23 | + +**Purpose:** Audiobook management + +### Transmission + +| Component | IP | Ports | +|-----------|-----|-------| +| transmission | 172.18.0.26 | 9091, 51413 | + +**Purpose:** Torrent client + +--- + +## Productivity + +### Nextcloud + +| Component | IP | URL | +|-----------|-----|-----| +| Nextcloud | 172.18.0.24 | https://cloud.xtrm-lab.org | + +**Purpose:** File sync, calendar, contacts + +### Actual Budget + +| Component | IP | URL | +|-----------|-----|-----| +| actual-budget | 172.18.0.16 | https://actual.xtrm-lab.org | + +**Purpose:** Personal finance tracking + +### n8n + +| Component | IP | URL | +|-----------|-----|-----| +| n8n | 172.18.0.17 | https://n8n.xtrm-lab.org | + +**Purpose:** Workflow automation + +### Karakeep + +| Component | IP | URL | +|-----------|-----|-----| +| karakeep | 172.18.0.25 | https://karakeep.xtrm-lab.org | + +**Purpose:** Bookmark manager + +### Homarr (Dashboard) + +| Component | IP | URL | +|-----------|-----|-----| +| homarr | 172.18.0.4 | https://xtrm-lab.org | + +**Purpose:** Home dashboard with Docker integration + +--- + +## Storage & Backup + +### TimeMachine + +| Network | IP | +|---------|-----| +| br0 macvlan | 192.168.31.12 | + +**Purpose:** macOS Time Machine backup target + +### UrBackup + +| Network | URL | +|---------|-----| +| host | https://urbackup.xtrm-lab.org | + +**Purpose:** File and image backup for PCs + +### RustFS + +| Network | +|---------| +| bridge | + +**Purpose:** Rust-based filesystem utilities + +--- + +## Remote Access + +### RustDesk + +| Component | Ports | +|-----------|-------| +| rustdesk-hbbs | 21115-21116 (TCP), 21116 (UDP) | +| rustdesk-hbbr | 21117-21119 | + +**Purpose:** Self-hosted remote desktop + +### Tailscale + +| Host | IP | +|------|-----| +| HAP1 | 172.17.0.4 | + +**Purpose:** Mesh VPN for remote access + +--- + +## Smart Home + +### Home Assistant + +| Network | URL | +|---------|-----| +| host | https://ha.xtrm-lab.org | + +**Purpose:** Home automation hub + +--- + +## Container Management + +### Portainer + +| Network | Port | +|---------|------| +| bridge | 9002 | + +**Purpose:** Docker container management UI +**Access:** http://192.168.31.2:9002 or via Tailscale + +--- + +## Stopped/Disabled Services + +| Service | Reason | Status | +|---------|--------|--------| +| Unbound | Redundant (AdGuard upstream) | Stopped | +| DoH-Server | Redundant (AdGuard built-in) | Removed | +| stunnel-dot | Redundant (AdGuard built-in) | Removed | +| Pi-hole | Replaced by AdGuard Home | Removed | +| Pangolin | Not in use | Removed | +| Slurp'it | Replaced by Diode | Removed | diff --git a/docs/04-HARDWARE-INVENTORY.md b/docs/04-HARDWARE-INVENTORY.md new file mode 100644 index 0000000..0b6ab22 --- /dev/null +++ b/docs/04-HARDWARE-INVENTORY.md @@ -0,0 +1,184 @@ +# Hardware Inventory + +**Last Updated:** 2026-01-25 + +--- + +## Network Devices + +### HAP1 | MikroTik hAP ax³ + +| Property | Value | +|----------|-------| +| **Role** | Router, WiFi Controller, DNS | +| **Location** | 19" Rack U3 (on shelf) | +| **IP** | 192.168.31.1 | +| **MAC** | 78:9A:18:2C:A5:48 | +| **OS** | RouterOS 7.20.6 | +| **Serial** | - | + +**Ports:** +| Port | Speed | Connected To | +|------|-------|--------------| +| H-1 | 2.5G | ISP Gateway (WAN) | +| H-2 | 1G | PP1-3 → CAP | +| H-3 | 1G | CSS1-1 | +| H-4 | 2.5G | ZX1-1 | +| H-5 | 1G | Unused | + +**Containers:** AdGuard Home, Tailscale + +--- + +### CSS1 | MikroTik CSS326-24G-2S+ + +| Property | Value | +|----------|-------| +| **Role** | Distribution Switch | +| **Location** | 19" Rack U1 | +| **IP** | 192.168.31.9 | +| **MAC** | F4:1E:57:C9:BD:09 | +| **OS** | SwOS | +| **Serial** | - | + +**Ports:** 24x 1G RJ45, 2x 10G SFP+ +- SFP1: 10G DAC to ZX1 +- Ports 16-24: Room outlets via PP1 + +--- + +### ZX1 | ZX-SWTGW218AS + +| Property | Value | +|----------|-------| +| **Role** | Core Switch (2.5GbE) | +| **Location** | 10" Rack U7 (on shelf) | +| **IP** | 192.168.31.7 | +| **MAC** | 1C:2A:A3:1E:78:67 | +| **Serial** | - | + +**Ports:** 8x 2.5G RJ45, 2x 10G SFP+ +| Port | Connected To | +|------|--------------| +| ZX1-1 | HAP1 H-4 | +| ZX1-2 | PP2-1 → XTRM-U XU-1 | +| ZX1-3 | PP2-2 → XTRM-U XU-2 | +| SFP1 | CSS1-SFP1 (10G backbone) | + +--- + +### CAP | MikroTik cAP XL ac + +| Property | Value | +|----------|-------| +| **Role** | Wireless Access Point | +| **Location** | Corridor (ceiling) | +| **IP** | 192.168.31.6 | +| **MAC** | 18:FD:74:54:3D:BC | +| **OS** | RouterOS 7.x | +| **Serial** | HCT085KBH8B | + +**Managed by:** HAP1 CAPsMAN + +--- + +### ISP Gateway | Vivacom Fiber ONT + +| Property | Value | +|----------|-------| +| **Role** | WAN Connection | +| **Location** | 10" Rack U9 (on shelf) | +| **WAN IP** | 62.73.120.142 | +| **MAC** | 9C:E0:41:BB:5E:32 | + +--- + +## Servers + +### XTRM-U | Unraid NAS + +| Property | Value | +|----------|-------| +| **Role** | Production Server | +| **Location** | 10" Rack U1-U4 | +| **IP** | 192.168.31.2 | +| **OS** | Unraid 6.x | + +**Network:** +| Interface | MAC | Speed | +|-----------|-----|-------| +| eth1 | A8:B8:E0:02:B6:15 | 2.5G | +| eth2 | A8:B8:E0:02:B6:16 | 2.5G | +| eth3 | A8:B8:E0:02:B6:17 | 2.5G | +| eth4 | A8:B8:E0:02:B6:18 | 2.5G | +| **bond0** | (virtual) | 5G aggregate | + +**Storage:** +- Cache: (current NVMe) +- Array: 3.5" HDDs + +**Virtual IPs:** +| IP | Purpose | +|----|---------| +| 192.168.31.4 | AdGuard Home (macvlan) | +| 192.168.31.15 | (reserved) | + +--- + +## Patch Panels + +### PP1 | 19" 24-Port + +| Property | Value | +|----------|-------| +| **Location** | 19" Rack U2.5 | +| **Type** | Cat6a Keystone | +| **Height** | 0.5U | + +**Active Ports:** 3 (CAP), 12, 16-24 (rooms) + +### PP2 | 10" 12-Port + +| Property | Value | +|----------|-------| +| **Location** | 10" Rack U8 | +| **Type** | Cat6a Keystone | +| **Height** | 1U | + +**Active Ports:** 1-2 (XTRM-U) + +--- + +## Rack Shelves + +| Location | Size | Holds | +|----------|------|-------| +| 10" Rack U9 | 10" | ISP Gateway | +| 10" Rack U7 | 10" | ZX1 Switch | +| 19" Rack U3 | 19" | HAP1 Router | + +--- + +## End Devices (Wired) + +| Device | Room | Outlet | Switch Port | MAC | +|--------|------|--------|-------------|-----| +| LGTV | Living Room | L3 | CSS1-24 | - | +| XTRM-Nobara | Main Bedroom | M2 | CSS1-20 | 08:92:04:C6:07:C5 | +| Dell Display | Main Bedroom | M3 | CSS1-21 | - | +| Dancho | Boys Room | B1 | CSS1-18 | - | +| KVM Switch | - | Direct | CSS1-2 | - | + +--- + +## Future Hardware (Planned) + +See: `wip/UPGRADE-2026-HARDWARE.md` + +| Device | Role | Status | +|--------|------|--------| +| XTRM-N5 (Minisforum N5 Air) | Production server | Planned | +| XTRM-N1 (N100 ITX) | Survival node | Planned | +| 3x Samsung 990 EVO Plus 1TB | XTRM-N5 NVMe pool | Planned | +| 2x Fikwot FX501Pro 512GB | XTRM-N1 mirror | Planned | +| MikroTik CRS310-8G+2S+IN | Replace ZX1 | Future | diff --git a/docs/05-CHANGELOG.md b/docs/05-CHANGELOG.md new file mode 100644 index 0000000..d64391c --- /dev/null +++ b/docs/05-CHANGELOG.md @@ -0,0 +1,71 @@ +# Infrastructure Changelog + +**Purpose:** Major infrastructure events only. Minor changes are in git commit messages. + +--- + +## 2026-01 + +### 2026-01-25 +- **[DOCS]** Restructured documentation - consolidated into 5 core docs + archive +- **[NETBOX]** Added shelf devices for rack organization (U9, U7, U3) + +### 2026-01-24 +- **[NETBOX]** Standardized device names to NetBox convention (HAP1, CSS1, ZX1) +- **[DOCS]** Created NETWORK-PHYSICAL-MAP.md with complete port maps + +### 2026-01-23 +- **[SERVICE]** Deployed Diode network discovery stack +- **[SERVICE]** Removed Slurp'it (replaced by Diode + NetDisco) +- **[SERVICE]** Consolidated NetBox Redis to shared instance +- **[SERVICE]** Removed redundant DNS services (Unbound, DoH-Server, stunnel-dot) + +### 2026-01-22 +- **[SERVICE]** Migrated NetBox to shared PostgreSQL 17 +- **[SERVICE]** Deployed AdGuard Home on MikroTik (primary DNS) +- **[SERVICE]** Deployed AdGuard Home on Unraid (secondary DNS) +- **[SERVICE]** Removed Pi-hole (replaced by AdGuard Home) +- **[DOCS]** Created INFRASTRUCTURE-DIAGRAM.md + +### 2026-01-21 +- **[BACKUP]** Configured Rclone sync to Google Drive + +### 2026-01-19 +- **[SERVICE]** Deployed NetBox IPAM/DCIM +- **[SERVICE]** Deployed NetDisco network discovery +- **[NETWORK]** Enabled SNMP on all MikroTik devices + +### 2026-01-18 +- **[SERVICE]** Deployed Gitea git server +- **[SERVICE]** Deployed Woodpecker CI +- **[NETWORK]** Configured CAPsMAN on HAP1 +- **[WIRELESS]** CAP added to CAPsMAN management + +### 2026-01-17 +- **[SERVICE]** Deployed Portainer CE + +--- + +## Format Guide + +```markdown +### YYYY-MM-DD +- **[CATEGORY]** Brief description + +Categories: +- [DEVICE] - Hardware added/removed/changed +- [SERVICE] - Container/service deployed/removed +- [NETWORK] - Network topology/config changes +- [WIRELESS] - WiFi/CAPsMAN changes +- [BACKUP] - Backup configuration +- [DOCS] - Major documentation changes +``` + +--- + +## Previous History + +For detailed history before 2026-01-17, see archived changelogs: +- `archive/06-CHANGELOG.md` +- `archive/07-CHANGELOG.md` +- `archive/00-CHANGELOG.md` diff --git a/docs/00-CHANGELOG.md b/docs/archive/00-CHANGELOG.md similarity index 100% rename from docs/00-CHANGELOG.md rename to docs/archive/00-CHANGELOG.md diff --git a/docs/00-CURRENT-STATE.md b/docs/archive/00-CURRENT-STATE.md similarity index 100% rename from docs/00-CURRENT-STATE.md rename to docs/archive/00-CURRENT-STATE.md diff --git a/docs/01-PHASE1-DNS-PORTABILITY.md b/docs/archive/01-PHASE1-DNS-PORTABILITY.md similarity index 100% rename from docs/01-PHASE1-DNS-PORTABILITY.md rename to docs/archive/01-PHASE1-DNS-PORTABILITY.md diff --git a/docs/02-PHASE2-FOSSORIAL-STACK.md b/docs/archive/02-PHASE2-FOSSORIAL-STACK.md similarity index 100% rename from docs/02-PHASE2-FOSSORIAL-STACK.md rename to docs/archive/02-PHASE2-FOSSORIAL-STACK.md diff --git a/docs/03-PHASE3-AUTHENTIK-ZEROTRUST.md b/docs/archive/03-PHASE3-AUTHENTIK-ZEROTRUST.md similarity index 100% rename from docs/03-PHASE3-AUTHENTIK-ZEROTRUST.md rename to docs/archive/03-PHASE3-AUTHENTIK-ZEROTRUST.md diff --git a/docs/04-PHASE4-REMOTE-GAMING.md b/docs/archive/04-PHASE4-REMOTE-GAMING.md similarity index 100% rename from docs/04-PHASE4-REMOTE-GAMING.md rename to docs/archive/04-PHASE4-REMOTE-GAMING.md diff --git a/docs/05-PHASE5-RUSTDESK.md b/docs/archive/05-PHASE5-RUSTDESK.md similarity index 100% rename from docs/05-PHASE5-RUSTDESK.md rename to docs/archive/05-PHASE5-RUSTDESK.md diff --git a/docs/06-CHANGELOG.md b/docs/archive/06-CHANGELOG.md similarity index 100% rename from docs/06-CHANGELOG.md rename to docs/archive/06-CHANGELOG.md diff --git a/docs/06-PHASE6-PORTAINER-MANAGEMENT.md b/docs/archive/06-PHASE6-PORTAINER-MANAGEMENT.md similarity index 100% rename from docs/06-PHASE6-PORTAINER-MANAGEMENT.md rename to docs/archive/06-PHASE6-PORTAINER-MANAGEMENT.md diff --git a/docs/07-CHANGELOG.md b/docs/archive/07-CHANGELOG.md similarity index 100% rename from docs/07-CHANGELOG.md rename to docs/archive/07-CHANGELOG.md diff --git a/docs/08-PHASE7-GITEA-GITOPS.md b/docs/archive/08-PHASE7-GITEA-GITOPS.md similarity index 100% rename from docs/08-PHASE7-GITEA-GITOPS.md rename to docs/archive/08-PHASE7-GITEA-GITOPS.md diff --git a/docs/09-MIKROTIK-WIFI-CAPSMAN.md b/docs/archive/09-MIKROTIK-WIFI-CAPSMAN.md similarity index 100% rename from docs/09-MIKROTIK-WIFI-CAPSMAN.md rename to docs/archive/09-MIKROTIK-WIFI-CAPSMAN.md diff --git a/docs/10-VLAN-NETWORK-SEGMENTATION.md b/docs/archive/10-VLAN-NETWORK-SEGMENTATION.md similarity index 100% rename from docs/10-VLAN-NETWORK-SEGMENTATION.md rename to docs/archive/10-VLAN-NETWORK-SEGMENTATION.md diff --git a/docs/11-NETWORK-ASSET-INVENTORY.md b/docs/archive/11-NETWORK-ASSET-INVENTORY.md similarity index 100% rename from docs/11-NETWORK-ASSET-INVENTORY.md rename to docs/archive/11-NETWORK-ASSET-INVENTORY.md diff --git a/docs/12-PHASE8-NETDISCO-INTEGRATION.md b/docs/archive/12-PHASE8-NETDISCO-INTEGRATION.md similarity index 100% rename from docs/12-PHASE8-NETDISCO-INTEGRATION.md rename to docs/archive/12-PHASE8-NETDISCO-INTEGRATION.md diff --git a/docs/13-CONTAINER-IP-ASSIGNMENTS.md b/docs/archive/13-CONTAINER-IP-ASSIGNMENTS.md similarity index 100% rename from docs/13-CONTAINER-IP-ASSIGNMENTS.md rename to docs/archive/13-CONTAINER-IP-ASSIGNMENTS.md diff --git a/docs/AGENT-CREDENTIALS.md b/docs/archive/AGENT-CREDENTIALS.md similarity index 100% rename from docs/AGENT-CREDENTIALS.md rename to docs/archive/AGENT-CREDENTIALS.md diff --git a/docs/INFRASTRUCTURE-DIAGRAM.md b/docs/archive/INFRASTRUCTURE-DIAGRAM.md similarity index 100% rename from docs/INFRASTRUCTURE-DIAGRAM.md rename to docs/archive/INFRASTRUCTURE-DIAGRAM.md diff --git a/docs/NETBOX-DRAFT.md b/docs/archive/NETBOX-DRAFT.md similarity index 100% rename from docs/NETBOX-DRAFT.md rename to docs/archive/NETBOX-DRAFT.md diff --git a/docs/NETWORK-PHYSICAL-MAP.md b/docs/archive/NETWORK-PHYSICAL-MAP.md similarity index 100% rename from docs/NETWORK-PHYSICAL-MAP.md rename to docs/archive/NETWORK-PHYSICAL-MAP.md diff --git a/docs/archive/README.md b/docs/archive/README.md new file mode 100644 index 0000000..0130faf --- /dev/null +++ b/docs/archive/README.md @@ -0,0 +1,16 @@ +# Archived Documentation + +> ⚠️ **OBSOLETE - DO NOT UPDATE** + +These documents are from the legacy documentation structure (pre-2026-01-25). +They are kept for historical reference only. + +**For current documentation, see the parent `docs/` folder:** +- `01-NETWORK-MAP.md` - Network topology, IPs, services +- `02-SERVICES-CRITICAL.md` - Essential services +- `03-SERVICES-OTHER.md` - Non-critical services +- `04-HARDWARE-INVENTORY.md` - Hardware details +- `05-CHANGELOG.md` - Major events + +**Do not reference these archived documents for current state.** +All relevant information has been migrated to the new structure. diff --git a/docs/unraid-claude.md b/docs/archive/unraid-claude.md similarity index 100% rename from docs/unraid-claude.md rename to docs/archive/unraid-claude.md diff --git a/docs/wip/GITOPS-CONTAINERS.md b/docs/wip/GITOPS-CONTAINERS.md new file mode 100644 index 0000000..768bfde --- /dev/null +++ b/docs/wip/GITOPS-CONTAINERS.md @@ -0,0 +1,180 @@ +# GitOps for Container Management + +**Status:** 💡 IDEA +**Depends On:** Hardware upgrade completion +**Author:** Kaloyan + +--- + +## Overview + +Version control all container configurations to: +1. Track changes over time +2. Maintain consistency between XTRM-N5 and XTRM-N1 +3. Enable automated deployments via Woodpecker CI +4. Recover from disasters quickly + +--- + +## Repository Structure + +``` +infrastructure/ +├── configs/ +│ ├── common/ # Shared configs +│ │ ├── traefik/ +│ │ │ └── dynamic.yml +│ │ └── authentik/ +│ │ └── blueprints/ +│ │ +│ ├── xtrm-n5/ # Production server +│ │ ├── docker/ +│ │ │ ├── compose/ # docker-compose files +│ │ │ │ ├── netbox.yml +│ │ │ │ ├── gitea.yml +│ │ │ │ └── ... +│ │ │ ├── templates/ # Unraid XML templates +│ │ │ └── env/ # Environment files (.env.example) +│ │ ├── network/ +│ │ │ └── docker-networks.json +│ │ └── unraid/ +│ │ ├── shares.json +│ │ └── users.json +│ │ +│ └── xtrm-n1/ # Survival node +│ ├── docker/ +│ │ └── compose/ +│ │ ├── adguard.yml +│ │ ├── vaultwarden.yml +│ │ └── authentik-replica.yml +│ └── proxmox/ +│ └── vm-configs/ +│ +└── .woodpecker.yml +``` + +--- + +## Workflow + +### 1. Change Detection + +```mermaid +flowchart LR + A[Edit config in Git] --> B[Push to main] + B --> C[Woodpecker CI triggers] + C --> D{Validate configs} + D -->|Pass| E[Deploy to target server] + D -->|Fail| F[Notify & block] +``` + +### 2. Drift Detection + +```mermaid +flowchart LR + A[Scheduled job] --> B[Export current state] + B --> C{Compare to Git} + C -->|Match| D[All good] + C -->|Drift| E[Alert + PR with diff] +``` + +--- + +## Implementation Phases + +### Phase 2.1: Export Current State + +1. Export all docker-compose files +2. Export Unraid container templates (XML → YAML) +3. Export network configurations +4. Create initial commit + +### Phase 2.2: CI Pipeline + +```yaml +# .woodpecker.yml +pipeline: + validate: + image: docker:latest + commands: + - docker compose -f configs/xtrm-n5/docker/compose/*.yml config + + deploy-n5: + image: alpine/ssh + when: + path: configs/xtrm-n5/** + commands: + - ssh root@192.168.31.2 "cd /path && docker compose up -d" + secrets: [ssh_key] + + deploy-n1: + image: alpine/ssh + when: + path: configs/xtrm-n1/** + commands: + - ssh root@xtrm-n1 "cd /path && docker compose up -d" + secrets: [ssh_key] +``` + +### Phase 2.3: Drift Detection + +Scheduled Woodpecker job: +1. SSH to each server +2. Export current docker/network state +3. Compare to Git configs +4. Create issue/PR if drift detected + +### Phase 2.4: Unraid GUI Sync + +**Challenge:** Changes made in Unraid GUI need to sync to Git + +**Solution Options:** + +| Option | Pros | Cons | +|--------|------|------| +| **A: Webhook on change** | Real-time sync | Complex, needs Unraid plugin | +| **B: Scheduled export** | Simple, reliable | Delay between change and commit | +| **C: Prohibit GUI changes** | Clean workflow | User friction | + +**Recommended:** Option B with daily scheduled exports + +```bash +# Cron job on Unraid +0 4 * * * /boot/config/scripts/export-docker-config.sh +``` + +--- + +## Secrets Management + +**Options:** + +| Tool | Integration | Complexity | +|------|-------------|------------| +| Woodpecker Secrets | Native | Low | +| Vaultwarden API | Via script | Medium | +| HashiCorp Vault | Enterprise | High | + +**Recommended:** Woodpecker Secrets for CI, `.env.example` in Git + +```yaml +# In docker-compose +services: + app: + env_file: + - .env # Not in Git, created from .env.example + secrets +``` + +--- + +## Rollback Strategy + +1. **Git revert** - Revert commit, CI redeploys previous version +2. **Tagged releases** - Deploy specific tag +3. **Manual override** - SSH and docker compose down/up + +--- + +## Related Documents + +- `UPGRADE-2026-HARDWARE.md` - Hardware prerequisite diff --git a/docs/wip/README.md b/docs/wip/README.md new file mode 100644 index 0000000..dd9462c --- /dev/null +++ b/docs/wip/README.md @@ -0,0 +1,16 @@ +# Work In Progress + +This folder contains planned changes, evaluations, and ideas that are not yet implemented. + +## Document Status + +| Status | Meaning | +|--------|---------| +| 📋 PLANNED | Approved, waiting for resources/time | +| 🔬 EVALUATING | Under investigation/research | +| 💡 IDEA | Concept, needs further definition | + +## Current Items + +- `UPGRADE-2026-HARDWARE.md` - Hardware upgrade plan (N5 Air + N100) +- `GITOPS-CONTAINERS.md` - Container GitOps implementation (Phase 2) diff --git a/docs/wip/UPGRADE-2026-HARDWARE.md b/docs/wip/UPGRADE-2026-HARDWARE.md new file mode 100644 index 0000000..c5a92e4 --- /dev/null +++ b/docs/wip/UPGRADE-2026-HARDWARE.md @@ -0,0 +1,229 @@ +# Hardware Upgrade Plan 2026 + +**Status:** 📋 PLANNED +**Target:** Q1 2026 +**Author:** Kaloyan + +--- + +## Overview + +Migration from single-server architecture to dual-server with survival capabilities. + +## Device Naming Convention + +| Name | Hardware | Role | Notes | +|------|----------|------|-------| +| **XTRM-U** | Current NAS chassis | Current production | Will be decommissioned | +| **XTRM-N5** | Minisforum N5 Air | New production NAS | Inherits XTRM-U storage, OS, services + new NVMe pool | +| **XTRM-N1** | N100 ITX (XTRM-U board) | Survival node | Same 4x2.5GbE NIC, 1U 10" case, no HDDs | + +--- + +## Hardware Specifications + +### XTRM-N5 (Minisforum N5 Air) - Production Server + +| Component | Specification | +|-----------|---------------| +| **CPU** | Intel N95/N100 or similar | +| **RAM** | TBD | +| **Storage (Fast)** | 3x Samsung 990 EVO Plus 1TB (NVMe ZFS RAID-Z1) | +| **Storage (Capacity)** | Existing 3.5" HDD Array (from XTRM-U) | +| **Network** | 10GbE Primary + 5GbE Secondary | +| **OS** | Unraid 7 (internal boot from NVMe pool) | + +### XTRM-N1 (N100 ITX) - Survival Node + +| Component | Specification | +|-----------|---------------| +| **Board** | XTRM-U motherboard (4x 2.5GbE Intel NICs) | +| **Chassis** | 1U 10" Rackmount (modular/3D printed) | +| **Storage** | 2x Fikwot FX501Pro 512GB (NVMe mirror) | +| **Power** | PicoPSU + 12V External Brick | +| **OS** | Proxmox VE or Debian | + +--- + +## Migration Flow + +``` +XTRM-U (Current State) +├── Hardware (board, 4x2.5GbE) ──────────────► XTRM-N1 (1U 10" case) +│ └── New role: Survival node +│ └── New storage: 2x Fikwot 512GB +│ +├── Storage (3.5" HDDs) ─────────────────────► XTRM-N5 +├── OS (Unraid + config) ────────────────────► XTRM-N5 +├── Services (containers) ───────────────────► XTRM-N5 +│ └── New: Samsung NVMe pool +│ +└── Chassis ─────────────────────────────────► Retired +``` + +--- + +## Post-Migration Architecture + +### XTRM-N1 (Survival Node) + +**Purpose:** Keep network alive if production server is offline + +| Service | Purpose | Priority | +|---------|---------|----------| +| OPNsense | Firewall/Router (replaces HAP1 routing) | P0 | +| AdGuard Home | DNS | P0 | +| Vaultwarden | Password manager | P0 | +| Authentik (replica) | SSO backup | P1 | + +### XTRM-N5 (Production Server) + +**Purpose:** Primary services and storage + +| Service | Purpose | Priority | +|---------|---------|----------| +| Nextcloud | File sync/storage | P1 | +| Plex | Media streaming | P2 | +| Authentik (primary) | SSO | P1 | +| Gitea | Git hosting | P2 | +| Woodpecker CI | CI/CD | P2 | +| NetBox | Network documentation | P2 | +| All other services | Various | P3 | + +--- + +## Rack Layout (Post-Migration) + +### 10" Rack (9U) + +| U | Device | Notes | +|---|--------|-------| +| U9 | Shelf + ISP Gateway | Unchanged | +| U8 | PP2 (12-port) | Unchanged | +| U7 | Shelf + ZX1 (or CRS310) | Future: upgrade to CRS310 | +| U6 | **XTRM-N1** | NEW: Survival node | +| U5 | (empty) | | +| U4 | (XTRM-N5 continued) | | +| U3 | (XTRM-N5 continued) | | +| U2 | (XTRM-N5 continued) | | +| U1 | **XTRM-N5** | NEW: Production server | + +--- + +## Software Configuration + +### XTRM-N5: Unraid 7 Features + +1. **Internal Boot Pool** + - Use "Add Bootable Pool" feature + - ZFS RAID-Z1 with 3x Samsung 990 EVO Plus + - Partition 1: 8GB (OS boot) + - Partition 2: ~1.8TB usable (data/containers) + - Remove USB boot stick + +2. **Storage Pools** + - **nvme_pool**: 3x Samsung 990 EVO Plus (~1.8TB usable) + - **hdd_pool**: Existing 3.5" HDDs (capacity tier) + +### XTRM-N1: Proxmox Configuration + +1. **Storage** + - ZFS mirror: 2x Fikwot 512GB + - Boot + VMs + Containers + +2. **VMs/Containers** + - OPNsense VM (if routing moves here) + - AdGuard LXC + - Vaultwarden LXC + - Authentik LXC (replica) + +--- + +## Network Changes + +### Current (HAP1 as router) + +``` +Internet → HAP1 (Router) → ZX1 → XTRM-U + → CSS1 → Room outlets +``` + +### Option A: Keep HAP1 as Router + +``` +Internet → HAP1 (Router) → ZX1 → XTRM-N5 + ↓ → XTRM-N1 (services only) + → CSS1 → Room outlets +``` + +### Option B: Move Routing to XTRM-N1 + +``` +Internet → HAP1 (Bridge) → XTRM-N1 (OPNsense) → ZX1 → XTRM-N5 + → CSS1 → Rooms +``` + +**Decision:** TBD based on performance testing + +--- + +## Thermal Management + +| Strategy | Implementation | +|----------|----------------| +| Samsung NVMe in N5 | "Thermal Champions" - run cooler than Fikwot | +| Physical spacing | 1U gap between XTRM-N1 and XTRM-N5 | +| Active cooling | Exhaust fans tied to drive temp sensors | +| HDD hibernation | Aggressive spindown, NVMe handles active I/O | + +--- + +## Future Milestone: CRS310 Upgrade + +When budget allows, replace ZX1 with MikroTik CRS310-8G+2S+IN: + +| Feature | Benefit | +|---------|---------| +| 10G DAC to CSS326 | Proper backbone | +| L3 offloading | Inter-VLAN routing off HAP1 | +| 8x 2.5G ports | All servers at full speed | + +--- + +## Migration Checklist + +### Pre-Migration +- [ ] Order XTRM-N5 hardware +- [ ] Order 3x Samsung 990 EVO Plus 1TB +- [ ] Order 2x Fikwot FX501Pro 512GB +- [ ] Design/print 1U 10" chassis for N1 +- [ ] Order PicoPSU + 12V brick + +### XTRM-N1 Setup (First) +- [ ] Extract XTRM-U motherboard +- [ ] Install in 1U chassis +- [ ] Install Fikwot NVMe drives +- [ ] Install Proxmox VE +- [ ] Deploy survival services (AdGuard, Vaultwarden) +- [ ] Test network survival (unplug XTRM-U) + +### XTRM-N5 Setup +- [ ] Unbox and install Samsung NVMe drives +- [ ] Create ZFS boot pool +- [ ] Migrate Unraid config from USB +- [ ] Transfer HDD array +- [ ] Verify all services running +- [ ] Update DNS/IPs as needed + +### Post-Migration +- [ ] Update NetBox with new devices +- [ ] Update documentation +- [ ] Decommission old XTRM-U chassis +- [ ] Performance testing +- [ ] Thermal monitoring + +--- + +## Related Documents + +- `GITOPS-CONTAINERS.md` - Phase 2: Container configuration in Git