20 KiB
20 KiB
Infrastructure Changelog
2026-01-18
- [INFRA] Added pending task: Static IP assignment for critical services on dockerproxy and bridge networks
- [SERVICE] postgresql17: Recreated container (was stopped due to port conflict)
- [SERVICE] authentik + authentik-worker: Restarted after PostgreSQL fix
- [TEMPLATE] Added RustDesk container templates with icons
- [TEMPLATE] Updated Pi-hole template with proper Unraid CA metadata
Track all changes to services, configurations, and phase progress.
2026-01-17 - Homarr + Portainer Integration
Portainer App Added to Homarr
- [SERVICE] homarr: Added Portainer app to dashboard
- Section: Monitoring
- URL: http://100.100.208.70:9002 (Tailscale)
- Ping URL: http://192.168.31.2:9002 (LAN)
Docker Integration Added
- [SERVICE] homarr: Added Docker integration via socket
- Integration name: Docker (Unraid)
- Socket: unix:///var/run/docker.sock
- Linked to Portainer app for container status display
Database Changes
- Added app record for Portainer
- Added item and item_layout for Monitoring section
- Added integration record for Docker
- Linked integration to Portainer item
Access
- Homarr: https://xtrm-lab.org
- Portainer visible in Monitoring section
2026-01-17 - Phase 6.2/6.3 Cancelled: MikroTik Incompatible
Discovery
- MikroTik RouterOS containers are NOT Docker-based
- No
/var/run/docker.sockexists on MikroTik - Portainer cannot connect to MikroTik's container runtime
What Was Attempted
- Created veth-socat interface (172.17.0.5)
- Deployed alpine/socat container
- Added firewall and NAT rules for port 2375
- Socat failed:
No such file or directoryfor docker.sock
Cleanup Performed
- Removed socat container
- Removed veth-socat interface and bridge port
- Removed docker_sock mount
- Removed firewall/NAT rules for port 2375
Conclusion
- Phase 6.2 and 6.3 are NOT FEASIBLE
- MikroTik containers must be managed via RouterOS CLI/WebFig
- Portainer remains useful for Unraid-only container management
Status Update
- [PHASE 6.1] COMPLETED - Portainer managing Unraid
- [PHASE 6.2] CANCELLED - MikroTik incompatible
- [PHASE 6.3] CANCELLED - MikroTik incompatible
2026-01-17 - Unraid Container Labels Fixed
Containers Updated
- [SERVICE] unbound: Added Unraid labels (
net.unraid.docker.managed,net.unraid.docker.icon) - [SERVICE] portainer: Added Unraid labels + Tailscale labels
Portainer Labels
net.unraid.docker.managed=dockermannet.unraid.docker.icon- Portainer iconnet.unraid.docker.webui=http://100.100.208.70:9002tailscale.expose=truetailscale.host=100.100.208.70tailscale.port=9002
Unbound Labels
net.unraid.docker.managed=dockermannet.unraid.docker.icon- Unbound icon
Note
Both containers recreated to apply labels. Services verified working after recreation.
2026-01-17 - Phase 6.1 Completed: Portainer CE Deployed
Portainer CE Installation
- [PHASE 6.1] Portainer CE deployed on Unraid - COMPLETED
- Container:
portainer/portainer-ce:latest - HTTP Port: 9002 (changed from 9000 due to Authentik conflict)
- HTTPS Port: 9444
- Data:
/mnt/user/appdata/portainer - LAN URL:
http://192.168.31.2:9002 - Tailscale URL:
http://100.100.208.70:9002
Port Conflict Resolution
- Original plan: port 9000
- Conflict: Authentik already using port 9000
- Resolution: Mapped to port 9002 (HTTP) and 9444 (HTTPS)
Next Steps
- Phase 6.2: Deploy Socat proxy on MikroTik (port 2375)
- Phase 6.3: Connect MikroTik environment to Portainer
Status
- [PHASE 6.1] COMPLETED - Portainer running, needs initial setup via web UI
- [PHASE 6.2] NOT STARTED
- [PHASE 6.3] NOT STARTED
2026-01-17 - Phase 6 Added: Multi-Host Docker Management
New Documentation
- [PHASE 6] Created 06-PHASE6-PORTAINER-MANAGEMENT.md
- Portainer CE deployment plan for unified Docker management
- Covers Unraid local setup and MikroTik remote API via Socat
Phase 6 Components
- Phase 6.1: Portainer CE installation on Unraid (port 9002)
- Phase 6.2: MikroTik Socat proxy for Docker API exposure (port 2375)
- Phase 6.3: Unified dashboard connection
Security Considerations
- MikroTik firewall rules to restrict Docker API access to Unraid only
- Unauthenticated API requires network-level security
Status
- [PHASE 6] IN PROGRESS - Phase 6.1 completed
2026-01-17 - Status Audit
Verified Working
- [PHASE 1] Tailscale on Unraid - WORKING (100.100.208.70)
- [PHASE 1] nebula-sync Pi-hole sync - HEALTHY (was unhealthy, now fixed)
- [PHASE 1] stunnel-dot (DoT on 853) - WORKING
- [PHASE 2] Pangolin controller - RUNNING (13 days uptime)
- [PHASE 3] Authentik main container - HEALTHY
Issues Found
- [PHASE 1] DoH-Server - RUNNING but BROKEN (can't reach Pi-hole upstream from dockerproxy network)
- [PHASE 2] gerbil - CRASHED (Pangolin returns empty CIDR for WG config)
- [PHASE 3] authentik-worker - CRASHED (PostgreSQL DNS resolution failure)
- [PHASE 5] RustDesk - NOT DEPLOYED
MikroTik NAT Status
- Port 853 (DoT) -> Unraid stunnel - OK
- Port 5443 (DoH) -> MikroTik Pi-hole (wrong target, should be Unraid DoH-Server)
- Port 51820 (Fossorial WG) - NOT CONFIGURED
- Ports 21115-21119 (RustDesk) - NOT CONFIGURED
Template for Future Entries
YYYY-MM-DD - Description
Changes
- [PHASE X] description - STATUS
- [SERVICE] name: what changed
Issues
- description of any problems found
Notes
- any relevant context
2026-01-17 - DNS Infrastructure Fixes
DoH-Server
- [PHASE 1] DoH-Server - WORKING at
doh.xtrm-lab.org(notdns.xtrm-lab.orgas documented) - [ISSUE] Infrastructure docs reference
dns.xtrm-lab.orgbut container usesdoh.xtrm-lab.org - [ACTION NEEDED] Either update docs OR add Traefik route for
dns.xtrm-lab.org
Unraid Unbound - FIXED
- [PHASE 1] Replaced broken klutchell/unbound with mvance/unbound:latest
- [ROOT CAUSE 1] Original image missing root.hints/root.key files (distroless image issue)
- [ROOT CAUSE 2] MikroTik NAT rules were hijacking Unbound's outbound DNS (192.168.31.0/24 -> Pi-hole)
- [ROOT CAUSE 3] IPv6 not working on br0 macvlan, causing timeout loops
MikroTik NAT Changes
- Added rule 6: "Allow Unraid Unbound" - accept UDP from 192.168.31.5 port 53
- Added rule 8: "Allow Unraid Unbound TCP" - accept TCP from 192.168.31.5 port 53
- These rules placed BEFORE the "Force DNS to Pi-hole" rules
Unbound Configuration
- Location: /mnt/user/appdata/unbound-mvance/
- Custom config: a-records.conf (disables IPv6, sets logging)
- Image: mvance/unbound:latest
- Network: br0 (macvlan) at 192.168.31.5
Verified Working
- Unraid Unbound (192.168.31.5) - RESOLVED google.com, github.com, cloudflare.com
- Unraid Pi-hole upstreams: 172.17.0.3 (MikroTik Unbound) + 192.168.31.5 (Unraid Unbound)
- DoH endpoint working at doh.xtrm-lab.org
- stunnel-dot (DoT) - already working
Still Pending
- MikroTik Pi-hole upstream config needs verification (check if it uses both Unbounds)
- Docs need update: dns.xtrm-lab.org vs doh.xtrm-lab.org
MikroTik Pi-hole Upstreams - FIXED
- [PHASE 1] MikroTik Pi-hole was using Google DNS (8.8.8.8, 8.8.4.4) instead of local Unbounds
- Changed upstreams via Pi-hole v6 API to:
- 172.17.0.3#53 - MikroTik local Unbound
- 192.168.31.5#53 - Unraid Unbound
- DNS resolution tested and working
Full DNS Redundancy Now Achieved
- Unraid Pi-hole upstreams: 172.17.0.3, 192.168.31.5
- MikroTik Pi-hole upstreams: 172.17.0.3, 192.168.31.5
- Both Unbounds working as recursive resolvers
- nebula-sync keeps blocklists in sync between Pi-holes
2026-01-17 - Gerbil Investigation: Feature Not Available
Issue
- Gerbil kept crashing with "invalid CIDR address" error
- Exit node was correctly configured in database
- API returned empty data despite valid configuration
Root Cause
- Pangolin 1.14 Community Edition does not include Exit Nodes feature
- Exit Nodes / Gerbil functionality requires paid Pangolin license
- The API endpoint exists but returns empty data for CE users
Resolution
- Removed gerbil container (feature not available)
- Existing MikroTik WireGuard VPN provides equivalent remote access functionality
- Phase 2 (Fossorial Stack) marked as blocked pending license upgrade
Status
- [PHASE 2] gerbil - REMOVED (requires paid Pangolin license)
- [PHASE 2] Pangolin controller - RUNNING (limited to CE features)
2026-01-18 - Actual Budget OIDC Integration with Authentik
Problem
- Actual Budget OIDC login failing with multiple errors
Fixes Applied
1. DNS Resolution (EAI_AGAIN)
- Issue: Container couldn't resolve auth.xtrm-lab.org
- Fix: Added
--add-host=auth.xtrm-lab.org:<traefik-ip>to container - Template: /boot/config/plugins/dockerMan/templates-user/my-actual-budget.xml
2. JWT Signing Algorithm (HS256 vs RS256)
- Issue: Authentik signed tokens with HS256, Actual Budget expected RS256
- Root Cause: OAuth2 provider had no signing key configured
- Fix: Set signing_key_id to 'authentik Internal JWT Certificate' in database
- SQL:
UPDATE authentik_providers_oauth2_oauth2provider SET signing_key_id = '48203833-f562-4ec6-b782-f566e6d960d5' WHERE client_id = 'actual-budget';
3. Insufficient Scope
- Issue: Provider had no scope mappings assigned
- Fix: Added openid, email, profile scopes to provider
- SQL:
INSERT INTO authentik_core_provider_property_mappings (provider_id, propertymapping_id) VALUES (3, 'a24eea06-...'), (3, '4394c150-...'), (3, '7272ab52-...');
Traefik Static IP
- Issue: Traefik IP was dynamic, would break actual-budget on restart
- Fix: Assigned static IP 172.18.0.10 to Traefik on dockerproxy network
- Template: Added
--ip=172.18.0.10to ExtraParams in my-traefik.xml
Final Configuration
| Component | Setting |
|---|---|
| Traefik | 172.18.0.10 (static) on dockerproxy |
| Actual Budget | --add-host=auth.xtrm-lab.org:172.18.0.10 |
| Authentik Provider | actual-budget with RS256 signing + scopes |
Actual Budget OIDC Environment
ACTUAL_OPENID_DISCOVERY_URL=https://auth.xtrm-lab.org/application/o/actual-budget/.well-known/openid-configuration
ACTUAL_OPENID_CLIENT_ID=actual-budget
ACTUAL_OPENID_CLIENT_SECRET=<secret>
ACTUAL_OPENID_SERVER_HOSTNAME=https://actual.xtrm-lab.org
Status
- [PHASE 3] Actual Budget OIDC - WORKING
- [SERVICE] traefik: Static IP 172.18.0.10 configured
- [SERVICE] actual-budget: OIDC login via Authentik working
2026-01-18 - Phase 5 Completed: RustDesk Self-Hosted Deployment
Keypair Generation
- [PHASE 5] Generated Ed25519 keypair for encrypted connections
- Public Key:
+Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8= - Data directory: /mnt/user/appdata/rustdesk-server/
Containers Deployed
- [SERVICE] rustdesk-hbbs: ID/Rendezvous server on ports 21115-21116 (TCP), 21116 (UDP), 21118-21119
- [SERVICE] rustdesk-hbbr: Relay server on port 21117
- Both containers configured with
-k _for mandatory encryption - AutoKuma labels added for Uptime Kuma monitoring
MikroTik Configuration
- Added NAT rules 24-27 for RustDesk ports
- Added firewall forward rules (Allow RustDesk TCP/UDP)
- Ports forwarded: 21115 (NAT test), 21116 (TCP+UDP), 21117 (Relay)
DNS
- rustdesk.xtrm-lab.org already resolving to 62.73.120.142 (DNS only, no proxy)
Verification
- All TCP ports (21115, 21116, 21117) accessible externally
- Both containers running healthy
- Logs show successful startup with keypair loaded
Client Configuration
| Setting | Value |
|---|---|
| ID Server | rustdesk.xtrm-lab.org |
| Relay Server | rustdesk.xtrm-lab.org |
| Public Key | +Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8= |
Status
- [PHASE 5] RustDesk Self-Hosted - COMPLETED
2026-01-18 - Vaultwarden 502 Fix
Issue
- Vaultwarden returning unexpected error when creating new logins
- Traefik logs showed 502 Bad Gateway errors
Root Cause
- Traefik config pointed to
http://192.168.31.2:4743 - Vaultwarden container had no port 4743 mapping (port 80/tcp was not published)
- Both containers on
dockerproxynetwork but config used host IP
Fix
- Updated
/mnt/user/appdata/traefik/dynamic.yml - Changed:
url: "http://192.168.31.2:4743"→url: "http://vaultwarden:80" - Uses Docker internal DNS which resolves to container IP on dockerproxy network
Status
- [SERVICE] vaultwarden: Working - can create/edit logins
2026-01-18 - Progress Summary
Completed Phases
- [PHASE 1] DNS Portability - COMPLETE (DoH, DoT, Unbound redundancy)
- [PHASE 5] RustDesk Self-Hosted - COMPLETE (hbbs/hbbr deployed)
- [PHASE 6] Portainer Management - COMPLETE (6.2/6.3 cancelled - MikroTik incompatible)
In Progress
- [PHASE 3] Authentik Zero Trust - Actual Budget integrated, more services pending
Blocked
- [PHASE 2] Fossorial Stack - Gerbil requires paid Pangolin license
Not Started
- [PHASE 4] Remote Gaming (Sunshine/Moonlight) - Starting now
Known Issues
- HomeAssistant_inabox: Exited (1) 3 days ago
- pgAdmin4: Exited (137) 2 weeks ago
2026-01-18 - Phase 4 Started: MacBook Prepared
MacBook Setup Complete
- [PHASE 4] Moonlight v6.1.0 already installed
- [PHASE 4] Tailscale connected (100.68.118.59)
Pending - Nobara Setup
- Install Sunshine on Nobara
- Configure VA-API encoding
- Pair with Moonlight
Instructions Saved
- MacBook: ~/Documents/NOBARA-SUNSHINE-SETUP.md
Status
- [PHASE 4] MacBook client ready, awaiting Nobara server setup
2026-01-18 - NetAlertX & Uptime Kuma Fixes (Partial)
Uptime Kuma - FIXED
- [SERVICE] Added Traefik route for uptime.xtrm-lab.org
- Protected with Authentik forward auth
- Service URL: http://192.168.31.2:3001
NetAlertX - IN PROGRESS
- [ISSUE] Container not scanning network - shows 0 devices
- [ROOT CAUSE] Multiple config files exist:
- /app/config/app.conf (mounted from host) - updated correctly
- /app/back/app.conf (container internal) - has old value '--localnet'
- [ATTEMPTED] Updated /mnt/user/appdata/netalertx/config/app.conf
- [ATTEMPTED] Updated database Settings table
- [ATTEMPTED] Deleted database to force reload
- [DISCOVERED] App reads from /app/back/app.conf which is generated at startup
NetAlertX Fix Required
- The /app/back/app.conf needs to be updated to: SCAN_SUBNETS=['192.168.31.0/24 --interface=br0']
- This file is regenerated on container start from /app/config/app.conf
- May need to use Settings UI at https://netalert.xtrm-lab.org to change SCAN_SUBNETS
Manual ARP Scan Test - WORKS
Command: docker exec NetAlertX arp-scan --localnet --interface=br0 Result: Found 20 devices on 192.168.31.0/24
Pending Tasks
- Fix NetAlertX to use correct subnet config
- Add Tailscale network scanning (may not work - ARP doesn't work over tunnels)
- User requested: RustFS for personal CDN (assets hosting)
Status
- [SERVICE] uptime.xtrm-lab.org - WORKING
- [SERVICE] netalertx - PARTIALLY BROKEN (config issue)
2026-01-18 - NetAlertX FIXED
Resolution
- [SERVICE] NetAlertX now scanning network correctly - found 21 devices
- [FIX] Updated config in multiple locations:
- /data/config/app.conf (runtime config inside container)
- /app/back/app.conf (plugin reads from here)
- /mnt/user/appdata/netalertx/config/app.conf (host mount for persistence)
Config Change
SCAN_SUBNETS=['192.168.31.0/24 --interface=br0']
Root Cause Summary
- NetAlertX has complex config handling with multiple config file locations
- /app/config (mounted) -> copied to /data/config on startup
- /data/config/app.conf is read by the app
- /app/back/app.conf is read by plugins at runtime
- All three needed to be updated
Verified Working
- ARP scan found 21 devices on 192.168.31.0/24
- Devices visible at https://netalert.xtrm-lab.org/devices.php
Note on Tailscale Scanning
- ARP scanning does NOT work over Tailscale (point-to-point tunnel, no broadcast)
- Tailscale devices need to be added manually or via different discovery method
2026-01-18 - RustFS CDN Deployed
Service Details
- [SERVICE] RustFS - S3-compatible object storage for personal CDN
- Image: rustfs/rustfs:latest
- Ports: 9010 (S3 API), 9011 (Console)
- Data: /mnt/user/appdata/rustfs/data
- Logs: /mnt/user/appdata/rustfs/logs
Access URLs
- S3 API: https://cdn.xtrm-lab.org
- Console: http://192.168.31.2:9011/rustfs/console/
- Credentials stored in: /mnt/user/appdata/rustfs/CREDENTIALS.txt
Traefik Route
- Host: cdn.xtrm-lab.org
- No Authentik protection (public CDN for assets)
- S3 authentication handles access control
Usage
Create bucket and upload assets via:
- RustFS Console at port 9011
- S3-compatible CLI tools (aws-cli, rclone, etc.)
- Direct S3 API calls
Example S3 CLI Usage
# Configure aws-cli
aws configure set aws_access_key_id <access_key>
aws configure set aws_secret_access_key <secret_key>
# Create bucket
aws --endpoint-url https://cdn.xtrm-lab.org s3 mb s3://assets
# Upload file
aws --endpoint-url https://cdn.xtrm-lab.org s3 cp image.png s3://assets/
# Public URL (after setting bucket policy)
https://cdn.xtrm-lab.org/assets/image.png
Status
- [SERVICE] rustfs - RUNNING
- [PHASE N/A] Personal CDN - COMPLETED
2026-01-18 - PostgreSQL Data Path Restored & Phase 3 Verified
Root Cause Analysis
- [INCIDENT] PostgreSQL container was recreated at 07:40 UTC with wrong data path
- [CAUSE] Container used default path
/mnt/user/appdata/postgresql17instead of configured/mnt/user/appdata/postgresql - [IMPACT] Authentik started with empty database, all configuration appeared lost
- [DATA] Original data was safe in
/mnt/user/appdata/postgresql/the entire time
Resolution
- [FIX] Stopped postgresql17 container
- [FIX] Recreated container with correct volume mount:
/mnt/user/appdata/postgresql:/var/lib/postgresql/data - [FIX] Restarted Authentik containers
- [VERIFIED] All Authentik data restored (users, groups, applications, providers)
Other Fixes This Session
- [SERVICE] Uptime-Kuma-API: Added missing ADMIN_PASSWORD environment variable
- [TRAEFIK] Added Docker provider constraint to filter broken container labels
- [TRAEFIK] Added missing routes: authentik, transmission, nextcloud to dynamic.yml
Phase 3 Verification - COMPLETED
Verified Authentik Zero Trust configuration:
- Users: akadmin, admin, jazzymc (3 active users)
- Groups: authentik Admins, authentik Read-only
- Outpost: Embedded Outpost running (proxy type)
- Applications: XTRM-Lab Protected Services, Actual Budget
- Proxy Provider: forward_domain mode for auth.xtrm-lab.org
- 2FA: 2 TOTP devices configured
- Protected Services: 12 routes using authentik-forward-auth middleware
Services Status
- [SERVICE] auth.xtrm-lab.org - WORKING (302 redirect to login)
- [SERVICE] uptime.xtrm-lab.org - WORKING (forward auth active)
- [SERVICE] ph2.xtrm-lab.org - WORKING (forward auth active)
- [SERVICE] All forward-auth protected services - WORKING
Documentation Updated
- [DOC] 03-PHASE3-AUTHENTIK-ZEROTRUST.md - Marked as COMPLETED with verified state
2026-01-18 - Phase 5 RustDesk Verified
Server-Side Verification Complete
- Keypair exists: /mnt/user/appdata/rustdesk-server/id_ed25519*
- Public Key: +Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8=
- hbbs container: Up 10+ hours
- hbbr container: Up 10+ hours
- MikroTik NAT: 4 rules configured (21115-21117 TCP, 21116 UDP)
- DNS: rustdesk.xtrm-lab.org → 62.73.120.142
- Port 21116 TCP: Externally accessible (verified via nc)
- Port 21117 TCP: Externally accessible (verified via nc)
Client Configuration
ID Server: rustdesk.xtrm-lab.org Relay Server: rustdesk.xtrm-lab.org Key: +Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8=
Documentation Updated
- [DOC] 05-PHASE5-RUSTDESK.md - Rewritten with verified state, marked SERVER-SIDE COMPLETE
- [PENDING] Client-side testing (user to verify remote sessions work)
2026-01-18 - Phase 7 GitOps Plan Created
New Phase: Gitea + Woodpecker CI
- [DOC] Created 08-PHASE7-GITEA-GITOPS.md
- [PLAN] Lightweight GitOps stack for infrastructure management
- [COMPONENTS] Gitea (~200MB) + Woodpecker Server/Agent (~200MB)
- [TOTAL RESOURCES] ~400MB RAM, ~700MB storage
Planned Features
- Git version control for all configs
- Automated YAML validation
- CI/CD pipeline for deployments
- Auto-rollback on health check failure
- Authentik SSO integration
- Safe AI agent integration
URLs (Planned)
- git.xtrm-lab.org - Gitea web UI
- ci.xtrm-lab.org - Woodpecker CI dashboard