diff --git a/docs/07-CHANGELOG.md b/docs/07-CHANGELOG.md index fc8dfc0..c769477 100644 --- a/docs/07-CHANGELOG.md +++ b/docs/07-CHANGELOG.md @@ -1,3 +1,29 @@ +## 2026-01-19 - NetDisco Web UI Fixed + +### Task 8.4: Traefik Ingress - VERIFIED WORKING + +**Root Cause:** +- [BUG] session_cookie_key was missing from database +- NetDisco generates this key via netdisco-deploy, but our external PostgreSQL setup skipped this step +- Error: "The setting session_cookie_key must be defined" + +**Fix Applied:** +- [DB] Manually inserted dancer_session_cookie_key into sessions table: + ```sql + INSERT INTO sessions (id, a_session) VALUES ('dancer_session_cookie_key', md5(random()::text)); + ``` + +**Verification:** +- [TEST] http://netdisco-web:5000 - WORKING (returns HTML) +- [TEST] https://netdisco.xtrm-lab.org - WORKING (302 redirect to Authentik) + +**Access:** +- External URL: https://netdisco.xtrm-lab.org (SSO via Authentik) +- Internal URL: http://192.168.31.2:5000 (direct) +- Database: session_cookie_key stored in PostgreSQL sessions table + +--- + # Infrastructure Changelog ## 2026-01-19 - NetDisco Traefik Integration @@ -292,534 +318,4 @@ Both containers recreated to apply labels. Services verified working after recre - Phase 6.3: Unified dashboard connection ### Security Considerations -- MikroTik firewall rules to restrict Docker API access to Unraid only -- Unauthenticated API requires network-level security - -### Status -- [PHASE 6] IN PROGRESS - Phase 6.1 completed - ---- - - -## 2026-01-17 - Status Audit - -### Verified Working -- [PHASE 1] Tailscale on Unraid - WORKING (100.100.208.70) -- [PHASE 1] nebula-sync Pi-hole sync - HEALTHY (was unhealthy, now fixed) -- [PHASE 1] stunnel-dot (DoT on 853) - WORKING -- [PHASE 2] Pangolin controller - RUNNING (13 days uptime) -- [PHASE 3] Authentik main container - HEALTHY - -### Issues Found -- [PHASE 1] DoH-Server - RUNNING but BROKEN (can't reach Pi-hole upstream from dockerproxy network) -- [PHASE 2] gerbil - CRASHED (Pangolin returns empty CIDR for WG config) -- [PHASE 3] authentik-worker - CRASHED (PostgreSQL DNS resolution failure) -- [PHASE 5] RustDesk - NOT DEPLOYED - -### MikroTik NAT Status -- Port 853 (DoT) -> Unraid stunnel - OK -- Port 5443 (DoH) -> MikroTik Pi-hole (wrong target, should be Unraid DoH-Server) -- Port 51820 (Fossorial WG) - NOT CONFIGURED -- Ports 21115-21119 (RustDesk) - NOT CONFIGURED - ---- - -## Template for Future Entries - -## YYYY-MM-DD - Description - -### Changes -- [PHASE X] description - STATUS -- [SERVICE] name: what changed - -### Issues -- description of any problems found - -### Notes -- any relevant context - -## 2026-01-17 - DNS Infrastructure Fixes - -### DoH-Server -- [PHASE 1] DoH-Server - WORKING at `doh.xtrm-lab.org` (not `dns.xtrm-lab.org` as documented) -- [ISSUE] Infrastructure docs reference `dns.xtrm-lab.org` but container uses `doh.xtrm-lab.org` -- [ACTION NEEDED] Either update docs OR add Traefik route for `dns.xtrm-lab.org` - -### Unraid Unbound - FIXED -- [PHASE 1] Replaced broken klutchell/unbound with mvance/unbound:latest -- [ROOT CAUSE 1] Original image missing root.hints/root.key files (distroless image issue) -- [ROOT CAUSE 2] MikroTik NAT rules were hijacking Unbound's outbound DNS (192.168.31.0/24 -> Pi-hole) -- [ROOT CAUSE 3] IPv6 not working on br0 macvlan, causing timeout loops - -### MikroTik NAT Changes -- Added rule 6: "Allow Unraid Unbound" - accept UDP from 192.168.31.5 port 53 -- Added rule 8: "Allow Unraid Unbound TCP" - accept TCP from 192.168.31.5 port 53 -- These rules placed BEFORE the "Force DNS to Pi-hole" rules - -### Unbound Configuration -- Location: /mnt/user/appdata/unbound-mvance/ -- Custom config: a-records.conf (disables IPv6, sets logging) -- Image: mvance/unbound:latest -- Network: br0 (macvlan) at 192.168.31.5 - -### Verified Working -- Unraid Unbound (192.168.31.5) - RESOLVED google.com, github.com, cloudflare.com -- Unraid Pi-hole upstreams: 172.17.0.3 (MikroTik Unbound) + 192.168.31.5 (Unraid Unbound) -- DoH endpoint working at doh.xtrm-lab.org -- stunnel-dot (DoT) - already working - -### Still Pending -- MikroTik Pi-hole upstream config needs verification (check if it uses both Unbounds) -- Docs need update: dns.xtrm-lab.org vs doh.xtrm-lab.org - -### MikroTik Pi-hole Upstreams - FIXED -- [PHASE 1] MikroTik Pi-hole was using Google DNS (8.8.8.8, 8.8.4.4) instead of local Unbounds -- Changed upstreams via Pi-hole v6 API to: - - 172.17.0.3#53 - MikroTik local Unbound - - 192.168.31.5#53 - Unraid Unbound -- DNS resolution tested and working - -### Full DNS Redundancy Now Achieved -- Unraid Pi-hole upstreams: 172.17.0.3, 192.168.31.5 -- MikroTik Pi-hole upstreams: 172.17.0.3, 192.168.31.5 -- Both Unbounds working as recursive resolvers -- nebula-sync keeps blocklists in sync between Pi-holes - ---- - -## 2026-01-17 - Gerbil Investigation: Feature Not Available - -### Issue -- Gerbil kept crashing with "invalid CIDR address" error -- Exit node was correctly configured in database -- API returned empty data despite valid configuration - -### Root Cause -- **Pangolin 1.14 Community Edition does not include Exit Nodes feature** -- Exit Nodes / Gerbil functionality requires paid Pangolin license -- The API endpoint exists but returns empty data for CE users - -### Resolution -- Removed gerbil container (feature not available) -- Existing MikroTik WireGuard VPN provides equivalent remote access functionality -- Phase 2 (Fossorial Stack) marked as blocked pending license upgrade - -### Status -- [PHASE 2] gerbil - REMOVED (requires paid Pangolin license) -- [PHASE 2] Pangolin controller - RUNNING (limited to CE features) - ---- - -## 2026-01-18 - Actual Budget OIDC Integration with Authentik - -### Problem -- Actual Budget OIDC login failing with multiple errors - -### Fixes Applied - -#### 1. DNS Resolution (EAI_AGAIN) -- **Issue:** Container couldn't resolve auth.xtrm-lab.org -- **Fix:** Added `--add-host=auth.xtrm-lab.org:` to container -- **Template:** /boot/config/plugins/dockerMan/templates-user/my-actual-budget.xml - -#### 2. JWT Signing Algorithm (HS256 vs RS256) -- **Issue:** Authentik signed tokens with HS256, Actual Budget expected RS256 -- **Root Cause:** OAuth2 provider had no signing key configured -- **Fix:** Set signing_key_id to 'authentik Internal JWT Certificate' in database -- **SQL:** `UPDATE authentik_providers_oauth2_oauth2provider SET signing_key_id = '48203833-f562-4ec6-b782-f566e6d960d5' WHERE client_id = 'actual-budget';` - -#### 3. Insufficient Scope -- **Issue:** Provider had no scope mappings assigned -- **Fix:** Added openid, email, profile scopes to provider -- **SQL:** `INSERT INTO authentik_core_provider_property_mappings (provider_id, propertymapping_id) VALUES (3, 'a24eea06-...'), (3, '4394c150-...'), (3, '7272ab52-...');` - -### Traefik Static IP -- **Issue:** Traefik IP was dynamic, would break actual-budget on restart -- **Fix:** Assigned static IP 172.18.0.10 to Traefik on dockerproxy network -- **Template:** Added `--ip=172.18.0.10` to ExtraParams in my-traefik.xml - -### Final Configuration - -| Component | Setting | -|-----------|---------| -| Traefik | 172.18.0.10 (static) on dockerproxy | -| Actual Budget | --add-host=auth.xtrm-lab.org:172.18.0.10 | -| Authentik Provider | actual-budget with RS256 signing + scopes | - -### Actual Budget OIDC Environment -``` -ACTUAL_OPENID_DISCOVERY_URL=https://auth.xtrm-lab.org/application/o/actual-budget/.well-known/openid-configuration -ACTUAL_OPENID_CLIENT_ID=actual-budget -ACTUAL_OPENID_CLIENT_SECRET= -ACTUAL_OPENID_SERVER_HOSTNAME=https://actual.xtrm-lab.org -``` - -### Status -- [PHASE 3] Actual Budget OIDC - WORKING -- [SERVICE] traefik: Static IP 172.18.0.10 configured -- [SERVICE] actual-budget: OIDC login via Authentik working - -## 2026-01-18 - Phase 5 Completed: RustDesk Self-Hosted Deployment - -### Keypair Generation -- [PHASE 5] Generated Ed25519 keypair for encrypted connections -- Public Key: `+Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8=` -- Data directory: /mnt/user/appdata/rustdesk-server/ - -### Containers Deployed -- [SERVICE] rustdesk-hbbs: ID/Rendezvous server on ports 21115-21116 (TCP), 21116 (UDP), 21118-21119 -- [SERVICE] rustdesk-hbbr: Relay server on port 21117 -- Both containers configured with `-k _` for mandatory encryption -- AutoKuma labels added for Uptime Kuma monitoring - -### MikroTik Configuration -- Added NAT rules 24-27 for RustDesk ports -- Added firewall forward rules (Allow RustDesk TCP/UDP) -- Ports forwarded: 21115 (NAT test), 21116 (TCP+UDP), 21117 (Relay) - -### DNS -- rustdesk.xtrm-lab.org already resolving to 62.73.120.142 (DNS only, no proxy) - -### Verification -- All TCP ports (21115, 21116, 21117) accessible externally -- Both containers running healthy -- Logs show successful startup with keypair loaded - -### Client Configuration -| Setting | Value | -|---------|-------| -| ID Server | rustdesk.xtrm-lab.org | -| Relay Server | rustdesk.xtrm-lab.org | -| Public Key | +Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8= | - -### Status -- [PHASE 5] RustDesk Self-Hosted - COMPLETED - -## 2026-01-18 - Vaultwarden 502 Fix - -### Issue -- Vaultwarden returning unexpected error when creating new logins -- Traefik logs showed 502 Bad Gateway errors - -### Root Cause -- Traefik config pointed to `http://192.168.31.2:4743` -- Vaultwarden container had no port 4743 mapping (port 80/tcp was not published) -- Both containers on `dockerproxy` network but config used host IP - -### Fix -- Updated `/mnt/user/appdata/traefik/dynamic.yml` -- Changed: `url: "http://192.168.31.2:4743"` → `url: "http://vaultwarden:80"` -- Uses Docker internal DNS which resolves to container IP on dockerproxy network - -### Status -- [SERVICE] vaultwarden: Working - can create/edit logins - -## 2026-01-18 - Progress Summary - -### Completed Phases -- [PHASE 1] DNS Portability - COMPLETE (DoH, DoT, Unbound redundancy) -- [PHASE 5] RustDesk Self-Hosted - COMPLETE (hbbs/hbbr deployed) -- [PHASE 6] Portainer Management - COMPLETE (6.2/6.3 cancelled - MikroTik incompatible) - -### In Progress -- [PHASE 3] Authentik Zero Trust - Actual Budget integrated, more services pending - -### Blocked -- [PHASE 2] Fossorial Stack - Gerbil requires paid Pangolin license - -### Not Started -- [PHASE 4] Remote Gaming (Sunshine/Moonlight) - Starting now - -### Known Issues -- HomeAssistant_inabox: Exited (1) 3 days ago -- pgAdmin4: Exited (137) 2 weeks ago - -## 2026-01-18 - Phase 4 Started: MacBook Prepared - -### MacBook Setup Complete -- [PHASE 4] Moonlight v6.1.0 already installed -- [PHASE 4] Tailscale connected (100.68.118.59) - -### Pending - Nobara Setup -- Install Sunshine on Nobara -- Configure VA-API encoding -- Pair with Moonlight - -### Instructions Saved -- MacBook: ~/Documents/NOBARA-SUNSHINE-SETUP.md - -### Status -- [PHASE 4] MacBook client ready, awaiting Nobara server setup - -## 2026-01-18 - NetAlertX & Uptime Kuma Fixes (Partial) - -### Uptime Kuma - FIXED -- [SERVICE] Added Traefik route for uptime.xtrm-lab.org -- Protected with Authentik forward auth -- Service URL: http://192.168.31.2:3001 - -### NetAlertX - IN PROGRESS -- [ISSUE] Container not scanning network - shows 0 devices -- [ROOT CAUSE] Multiple config files exist: - - /app/config/app.conf (mounted from host) - updated correctly - - /app/back/app.conf (container internal) - has old value '--localnet' -- [ATTEMPTED] Updated /mnt/user/appdata/netalertx/config/app.conf -- [ATTEMPTED] Updated database Settings table -- [ATTEMPTED] Deleted database to force reload -- [DISCOVERED] App reads from /app/back/app.conf which is generated at startup - -### NetAlertX Fix Required -1. The /app/back/app.conf needs to be updated to: - SCAN_SUBNETS=['192.168.31.0/24 --interface=br0'] -2. This file is regenerated on container start from /app/config/app.conf -3. May need to use Settings UI at https://netalert.xtrm-lab.org to change SCAN_SUBNETS - -### Manual ARP Scan Test - WORKS -Command: docker exec NetAlertX arp-scan --localnet --interface=br0 -Result: Found 20 devices on 192.168.31.0/24 - -### Pending Tasks -- Fix NetAlertX to use correct subnet config -- Add Tailscale network scanning (may not work - ARP doesn't work over tunnels) -- User requested: RustFS for personal CDN (assets hosting) - -### Status -- [SERVICE] uptime.xtrm-lab.org - WORKING -- [SERVICE] netalertx - PARTIALLY BROKEN (config issue) - -## 2026-01-18 - NetAlertX FIXED - -### Resolution -- [SERVICE] NetAlertX now scanning network correctly - found 21 devices -- [FIX] Updated config in multiple locations: - - /data/config/app.conf (runtime config inside container) - - /app/back/app.conf (plugin reads from here) - - /mnt/user/appdata/netalertx/config/app.conf (host mount for persistence) - -### Config Change -``` -SCAN_SUBNETS=['192.168.31.0/24 --interface=br0'] -``` - -### Root Cause Summary -- NetAlertX has complex config handling with multiple config file locations -- /app/config (mounted) -> copied to /data/config on startup -- /data/config/app.conf is read by the app -- /app/back/app.conf is read by plugins at runtime -- All three needed to be updated - -### Verified Working -- ARP scan found 21 devices on 192.168.31.0/24 -- Devices visible at https://netalert.xtrm-lab.org/devices.php - -### Note on Tailscale Scanning -- ARP scanning does NOT work over Tailscale (point-to-point tunnel, no broadcast) -- Tailscale devices need to be added manually or via different discovery method - -## 2026-01-18 - RustFS CDN Deployed - -### Service Details -- [SERVICE] RustFS - S3-compatible object storage for personal CDN -- Image: rustfs/rustfs:latest -- Ports: 9010 (S3 API), 9011 (Console) -- Data: /mnt/user/appdata/rustfs/data -- Logs: /mnt/user/appdata/rustfs/logs - -### Access URLs -- S3 API: https://cdn.xtrm-lab.org -- Console: http://192.168.31.2:9011/rustfs/console/ -- Credentials stored in: /mnt/user/appdata/rustfs/CREDENTIALS.txt - -### Traefik Route -- Host: cdn.xtrm-lab.org -- No Authentik protection (public CDN for assets) -- S3 authentication handles access control - -### Usage -Create bucket and upload assets via: -- RustFS Console at port 9011 -- S3-compatible CLI tools (aws-cli, rclone, etc.) -- Direct S3 API calls - -### Example S3 CLI Usage -```bash -# Configure aws-cli -aws configure set aws_access_key_id -aws configure set aws_secret_access_key - -# Create bucket -aws --endpoint-url https://cdn.xtrm-lab.org s3 mb s3://assets - -# Upload file -aws --endpoint-url https://cdn.xtrm-lab.org s3 cp image.png s3://assets/ - -# Public URL (after setting bucket policy) -https://cdn.xtrm-lab.org/assets/image.png -``` - -### Status -- [SERVICE] rustfs - RUNNING -- [PHASE N/A] Personal CDN - COMPLETED - -## 2026-01-18 - PostgreSQL Data Path Restored & Phase 3 Verified - -### Root Cause Analysis -- [INCIDENT] PostgreSQL container was recreated at 07:40 UTC with wrong data path -- [CAUSE] Container used default path `/mnt/user/appdata/postgresql17` instead of configured `/mnt/user/appdata/postgresql` -- [IMPACT] Authentik started with empty database, all configuration appeared lost -- [DATA] Original data was safe in `/mnt/user/appdata/postgresql/` the entire time - -### Resolution -- [FIX] Stopped postgresql17 container -- [FIX] Recreated container with correct volume mount: `/mnt/user/appdata/postgresql:/var/lib/postgresql/data` -- [FIX] Restarted Authentik containers -- [VERIFIED] All Authentik data restored (users, groups, applications, providers) - -### Other Fixes This Session -- [SERVICE] Uptime-Kuma-API: Added missing ADMIN_PASSWORD environment variable -- [TRAEFIK] Added Docker provider constraint to filter broken container labels -- [TRAEFIK] Added missing routes: authentik, transmission, nextcloud to dynamic.yml - -### Phase 3 Verification - COMPLETED -Verified Authentik Zero Trust configuration: -- Users: akadmin, admin, jazzymc (3 active users) -- Groups: authentik Admins, authentik Read-only -- Outpost: Embedded Outpost running (proxy type) -- Applications: XTRM-Lab Protected Services, Actual Budget -- Proxy Provider: forward_domain mode for auth.xtrm-lab.org -- 2FA: 2 TOTP devices configured -- Protected Services: 12 routes using authentik-forward-auth middleware - -### Services Status -- [SERVICE] auth.xtrm-lab.org - WORKING (302 redirect to login) -- [SERVICE] uptime.xtrm-lab.org - WORKING (forward auth active) -- [SERVICE] ph2.xtrm-lab.org - WORKING (forward auth active) -- [SERVICE] All forward-auth protected services - WORKING - -### Documentation Updated -- [DOC] 03-PHASE3-AUTHENTIK-ZEROTRUST.md - Marked as COMPLETED with verified state - -## 2026-01-18 - Phase 5 RustDesk Verified - -### Server-Side Verification Complete -- [x] Keypair exists: /mnt/user/appdata/rustdesk-server/id_ed25519* -- [x] Public Key: +Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8= -- [x] hbbs container: Up 10+ hours -- [x] hbbr container: Up 10+ hours -- [x] MikroTik NAT: 4 rules configured (21115-21117 TCP, 21116 UDP) -- [x] DNS: rustdesk.xtrm-lab.org → 62.73.120.142 -- [x] Port 21116 TCP: Externally accessible (verified via nc) -- [x] Port 21117 TCP: Externally accessible (verified via nc) - -### Client Configuration -ID Server: rustdesk.xtrm-lab.org -Relay Server: rustdesk.xtrm-lab.org -Key: +Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8= - -### Documentation Updated -- [DOC] 05-PHASE5-RUSTDESK.md - Rewritten with verified state, marked SERVER-SIDE COMPLETE -- [PENDING] Client-side testing (user to verify remote sessions work) - -## 2026-01-18 - Phase 7 GitOps Plan Created - -### New Phase: Gitea + Woodpecker CI -- [DOC] Created 08-PHASE7-GITEA-GITOPS.md -- [PLAN] Lightweight GitOps stack for infrastructure management -- [COMPONENTS] Gitea (~200MB) + Woodpecker Server/Agent (~200MB) -- [TOTAL RESOURCES] ~400MB RAM, ~700MB storage - -### Planned Features -- Git version control for all configs -- Automated YAML validation -- CI/CD pipeline for deployments -- Auto-rollback on health check failure -- Authentik SSO integration -- Safe AI agent integration - -### URLs (Planned) -- git.xtrm-lab.org - Gitea web UI -- ci.xtrm-lab.org - Woodpecker CI dashboard - -## 2026-01-18 - Phase 5 RustDesk FULLY COMPLETED - -### Client Testing - SUCCESS -- [PHASE 5] Nobara → Mac connection verified WORKING -- [FIX] macOS required Accessibility permission for keyboard/mouse control -- Video, keyboard, and mouse all functioning - -### Registered Clients -| ID | Platform | Registered | -|----|----------|------------| -| 527588949 | macOS | 16:47 UTC | -| 20116399 | Nobara | 17:21 UTC | - -### Status -- [PHASE 5] RustDesk Self-Hosted - **FULLY COMPLETE** - -## 2026-01-19 - Phase 4 Remote Gaming Progress - -### Nobara Setup -- [PHASE 4] Tailscale installed on Nobara PC (xtrm-pc) -- Tailscale IP: 100.98.57.73 -- [PHASE 4] VA-API verified working on AMD RX 6600 - - H.264 encoding: ✅ VAEntrypointEncSlice - - HEVC encoding: ✅ VAEntrypointEncSlice (Main + Main10) - - Driver: Mesa Gallium 25.3.2 (radeonsi, navi23) - -### Sunshine Flatpak Issues -- [ISSUE] Flatpak Sunshine failed to initialize capture -- wlr capture: Missing wlr-export-dmabuf protocol (KDE Plasma incompatible) -- pw (PipeWire) capture: Portal permissions blocked -- KMS capture: setcap not applicable to Flatpak sandboxed binaries - -### Resolution -- [ACTION] Switching to native Sunshine installation via COPR -- Native install allows proper setcap for KMS capture -- Commands: - - - 1. dev.lizardbyte.app.Sunshine stable r - - -Uninstalling… -Uninstall complete. - -### Tailscale Network Status -| Device | Tailscale IP | Status | -|--------|--------------|--------| -| xtrm-pc (Nobara) | 100.98.57.73 | Online | -| kaloyans-macbook-air | 100.68.118.59 | Online | -| xtrm-unraid | 100.100.208.70 | Online | - -### Status -- [PHASE 4] IN PROGRESS - Pending native Sunshine installation - -## 2026-01-19 - Phase 4 Remote Gaming Progress - -### Nobara Setup -- [PHASE 4] Tailscale installed on Nobara PC (xtrm-pc) -- Tailscale IP: 100.98.57.73 -- [PHASE 4] VA-API verified working on AMD RX 6600 - - H.264 encoding: VAEntrypointEncSlice - - HEVC encoding: VAEntrypointEncSlice (Main + Main10) - - Driver: Mesa Gallium 25.3.2 (radeonsi, navi23) - -### Sunshine Flatpak Issues -- [ISSUE] Flatpak Sunshine failed to initialize capture -- wlr capture: Missing wlr-export-dmabuf protocol (KDE Plasma incompatible) -- pw (PipeWire) capture: Portal permissions blocked -- KMS capture: setcap not applicable to Flatpak sandboxed binaries - -### Resolution -- [ACTION] Switching to native Sunshine installation via COPR -- Native install allows proper setcap for KMS capture - -### Tailscale Network Status -| Device | Tailscale IP | Status | -|--------|--------------|--------| -| xtrm-pc (Nobara) | 100.98.57.73 | Online | -| kaloyans-macbook-air | 100.68.118.59 | Online | -| xtrm-unraid | 100.100.208.70 | Online | - -### Status -- [PHASE 4] IN PROGRESS - Pending native Sunshine installation +- \ No newline at end of file