jazzymc/infrastructure

Fork 0

Files

jazzymc 62a6267026

ci/woodpecker/push/woodpecker Pipeline was successful

Details

Add infrastructure documentation

2026-01-18 16:57:25 +02:00

20 KiB

Raw Blame History

Infrastructure Changelog

2026-01-18

[INFRA] Added pending task: Static IP assignment for critical services on dockerproxy and bridge networks
[SERVICE] postgresql17: Recreated container (was stopped due to port conflict)
[SERVICE] authentik + authentik-worker: Restarted after PostgreSQL fix
[TEMPLATE] Added RustDesk container templates with icons
[TEMPLATE] Updated Pi-hole template with proper Unraid CA metadata

Track all changes to services, configurations, and phase progress.

2026-01-17 - Homarr + Portainer Integration

Portainer App Added to Homarr

[SERVICE] homarr: Added Portainer app to dashboard
Section: Monitoring
URL: http://100.100.208.70:9002 (Tailscale)
Ping URL: http://192.168.31.2:9002 (LAN)

Docker Integration Added

[SERVICE] homarr: Added Docker integration via socket
Integration name: Docker (Unraid)
Socket: unix:///var/run/docker.sock
Linked to Portainer app for container status display

Database Changes

Added app record for Portainer
Added item and item_layout for Monitoring section
Added integration record for Docker
Linked integration to Portainer item

Access

Homarr: https://xtrm-lab.org
Portainer visible in Monitoring section

2026-01-17 - Phase 6.2/6.3 Cancelled: MikroTik Incompatible

Discovery

MikroTik RouterOS containers are NOT Docker-based
No /var/run/docker.sock exists on MikroTik
Portainer cannot connect to MikroTik's container runtime

What Was Attempted

Created veth-socat interface (172.17.0.5)
Deployed alpine/socat container
Added firewall and NAT rules for port 2375
Socat failed: No such file or directory for docker.sock

Cleanup Performed

Removed socat container
Removed veth-socat interface and bridge port
Removed docker_sock mount
Removed firewall/NAT rules for port 2375

Conclusion

Phase 6.2 and 6.3 are NOT FEASIBLE
MikroTik containers must be managed via RouterOS CLI/WebFig
Portainer remains useful for Unraid-only container management

Status Update

[PHASE 6.1] COMPLETED - Portainer managing Unraid
[PHASE 6.2] CANCELLED - MikroTik incompatible
[PHASE 6.3] CANCELLED - MikroTik incompatible

2026-01-17 - Unraid Container Labels Fixed

Containers Updated

[SERVICE] unbound: Added Unraid labels (net.unraid.docker.managed, net.unraid.docker.icon)
[SERVICE] portainer: Added Unraid labels + Tailscale labels

Portainer Labels

net.unraid.docker.managed=dockerman
net.unraid.docker.icon - Portainer icon
net.unraid.docker.webui=http://100.100.208.70:9002
tailscale.expose=true
tailscale.host=100.100.208.70
tailscale.port=9002

Unbound Labels

net.unraid.docker.managed=dockerman
net.unraid.docker.icon - Unbound icon

Note

Both containers recreated to apply labels. Services verified working after recreation.

2026-01-17 - Phase 6.1 Completed: Portainer CE Deployed

Portainer CE Installation

[PHASE 6.1] Portainer CE deployed on Unraid - COMPLETED
Container: portainer/portainer-ce:latest
HTTP Port: 9002 (changed from 9000 due to Authentik conflict)
HTTPS Port: 9444
Data: /mnt/user/appdata/portainer
LAN URL: http://192.168.31.2:9002
Tailscale URL: http://100.100.208.70:9002

Port Conflict Resolution

Original plan: port 9000
Conflict: Authentik already using port 9000
Resolution: Mapped to port 9002 (HTTP) and 9444 (HTTPS)

Next Steps

Phase 6.2: Deploy Socat proxy on MikroTik (port 2375)
Phase 6.3: Connect MikroTik environment to Portainer

Status

[PHASE 6.1] COMPLETED - Portainer running, needs initial setup via web UI
[PHASE 6.2] NOT STARTED
[PHASE 6.3] NOT STARTED

2026-01-17 - Phase 6 Added: Multi-Host Docker Management

New Documentation

[PHASE 6] Created 06-PHASE6-PORTAINER-MANAGEMENT.md
Portainer CE deployment plan for unified Docker management
Covers Unraid local setup and MikroTik remote API via Socat

Phase 6 Components

Phase 6.1: Portainer CE installation on Unraid (port 9002)
Phase 6.2: MikroTik Socat proxy for Docker API exposure (port 2375)
Phase 6.3: Unified dashboard connection

Security Considerations

MikroTik firewall rules to restrict Docker API access to Unraid only
Unauthenticated API requires network-level security

Status

[PHASE 6] IN PROGRESS - Phase 6.1 completed

2026-01-17 - Status Audit

Verified Working

[PHASE 1] Tailscale on Unraid - WORKING (100.100.208.70)
[PHASE 1] nebula-sync Pi-hole sync - HEALTHY (was unhealthy, now fixed)
[PHASE 1] stunnel-dot (DoT on 853) - WORKING
[PHASE 2] Pangolin controller - RUNNING (13 days uptime)
[PHASE 3] Authentik main container - HEALTHY

Issues Found

[PHASE 1] DoH-Server - RUNNING but BROKEN (can't reach Pi-hole upstream from dockerproxy network)
[PHASE 2] gerbil - CRASHED (Pangolin returns empty CIDR for WG config)
[PHASE 3] authentik-worker - CRASHED (PostgreSQL DNS resolution failure)
[PHASE 5] RustDesk - NOT DEPLOYED

MikroTik NAT Status

Port 853 (DoT) -> Unraid stunnel - OK
Port 5443 (DoH) -> MikroTik Pi-hole (wrong target, should be Unraid DoH-Server)
Port 51820 (Fossorial WG) - NOT CONFIGURED
Ports 21115-21119 (RustDesk) - NOT CONFIGURED

Template for Future Entries

YYYY-MM-DD - Description

Changes

[PHASE X] description - STATUS
[SERVICE] name: what changed

Issues

description of any problems found

Notes

any relevant context

2026-01-17 - DNS Infrastructure Fixes

DoH-Server

[PHASE 1] DoH-Server - WORKING at doh.xtrm-lab.org (not dns.xtrm-lab.org as documented)
[ISSUE] Infrastructure docs reference dns.xtrm-lab.org but container uses doh.xtrm-lab.org
[ACTION NEEDED] Either update docs OR add Traefik route for dns.xtrm-lab.org

Unraid Unbound - FIXED

[PHASE 1] Replaced broken klutchell/unbound with mvance/unbound:latest
[ROOT CAUSE 1] Original image missing root.hints/root.key files (distroless image issue)
[ROOT CAUSE 2] MikroTik NAT rules were hijacking Unbound's outbound DNS (192.168.31.0/24 -> Pi-hole)
[ROOT CAUSE 3] IPv6 not working on br0 macvlan, causing timeout loops

MikroTik NAT Changes

Added rule 6: "Allow Unraid Unbound" - accept UDP from 192.168.31.5 port 53
Added rule 8: "Allow Unraid Unbound TCP" - accept TCP from 192.168.31.5 port 53
These rules placed BEFORE the "Force DNS to Pi-hole" rules

Unbound Configuration

Location: /mnt/user/appdata/unbound-mvance/
Custom config: a-records.conf (disables IPv6, sets logging)
Image: mvance/unbound:latest
Network: br0 (macvlan) at 192.168.31.5

Verified Working

Unraid Unbound (192.168.31.5) - RESOLVED google.com, github.com, cloudflare.com
Unraid Pi-hole upstreams: 172.17.0.3 (MikroTik Unbound) + 192.168.31.5 (Unraid Unbound)
DoH endpoint working at doh.xtrm-lab.org
stunnel-dot (DoT) - already working

Still Pending

MikroTik Pi-hole upstream config needs verification (check if it uses both Unbounds)
Docs need update: dns.xtrm-lab.org vs doh.xtrm-lab.org

MikroTik Pi-hole Upstreams - FIXED

[PHASE 1] MikroTik Pi-hole was using Google DNS (8.8.8.8, 8.8.4.4) instead of local Unbounds
Changed upstreams via Pi-hole v6 API to:
- 172.17.0.3#53 - MikroTik local Unbound
- 192.168.31.5#53 - Unraid Unbound
DNS resolution tested and working

Full DNS Redundancy Now Achieved

Unraid Pi-hole upstreams: 172.17.0.3, 192.168.31.5
MikroTik Pi-hole upstreams: 172.17.0.3, 192.168.31.5
Both Unbounds working as recursive resolvers
nebula-sync keeps blocklists in sync between Pi-holes

2026-01-17 - Gerbil Investigation: Feature Not Available

Issue

Gerbil kept crashing with "invalid CIDR address" error
Exit node was correctly configured in database
API returned empty data despite valid configuration

Root Cause

Pangolin 1.14 Community Edition does not include Exit Nodes feature
Exit Nodes / Gerbil functionality requires paid Pangolin license
The API endpoint exists but returns empty data for CE users

Resolution

Removed gerbil container (feature not available)
Existing MikroTik WireGuard VPN provides equivalent remote access functionality
Phase 2 (Fossorial Stack) marked as blocked pending license upgrade

Status

[PHASE 2] gerbil - REMOVED (requires paid Pangolin license)
[PHASE 2] Pangolin controller - RUNNING (limited to CE features)

2026-01-18 - Actual Budget OIDC Integration with Authentik

Problem

Actual Budget OIDC login failing with multiple errors

Fixes Applied

1. DNS Resolution (EAI_AGAIN)

Issue: Container couldn't resolve auth.xtrm-lab.org
Fix: Added --add-host=auth.xtrm-lab.org:<traefik-ip> to container
Template: /boot/config/plugins/dockerMan/templates-user/my-actual-budget.xml

2. JWT Signing Algorithm (HS256 vs RS256)

Issue: Authentik signed tokens with HS256, Actual Budget expected RS256
Root Cause: OAuth2 provider had no signing key configured
Fix: Set signing_key_id to 'authentik Internal JWT Certificate' in database
SQL: UPDATE authentik_providers_oauth2_oauth2provider SET signing_key_id = '48203833-f562-4ec6-b782-f566e6d960d5' WHERE client_id = 'actual-budget';

3. Insufficient Scope

Issue: Provider had no scope mappings assigned
Fix: Added openid, email, profile scopes to provider
SQL: INSERT INTO authentik_core_provider_property_mappings (provider_id, propertymapping_id) VALUES (3, 'a24eea06-...'), (3, '4394c150-...'), (3, '7272ab52-...');

Traefik Static IP

Issue: Traefik IP was dynamic, would break actual-budget on restart
Fix: Assigned static IP 172.18.0.10 to Traefik on dockerproxy network
Template: Added --ip=172.18.0.10 to ExtraParams in my-traefik.xml

Final Configuration

Component	Setting
Traefik	172.18.0.10 (static) on dockerproxy
Actual Budget	--add-host=auth.xtrm-lab.org:172.18.0.10
Authentik Provider	actual-budget with RS256 signing + scopes

Actual Budget OIDC Environment

ACTUAL_OPENID_DISCOVERY_URL=https://auth.xtrm-lab.org/application/o/actual-budget/.well-known/openid-configuration
ACTUAL_OPENID_CLIENT_ID=actual-budget
ACTUAL_OPENID_CLIENT_SECRET=<secret>
ACTUAL_OPENID_SERVER_HOSTNAME=https://actual.xtrm-lab.org

Status

[PHASE 3] Actual Budget OIDC - WORKING
[SERVICE] traefik: Static IP 172.18.0.10 configured
[SERVICE] actual-budget: OIDC login via Authentik working

2026-01-18 - Phase 5 Completed: RustDesk Self-Hosted Deployment

Keypair Generation

[PHASE 5] Generated Ed25519 keypair for encrypted connections
Public Key: +Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8=
Data directory: /mnt/user/appdata/rustdesk-server/

Containers Deployed

[SERVICE] rustdesk-hbbs: ID/Rendezvous server on ports 21115-21116 (TCP), 21116 (UDP), 21118-21119
[SERVICE] rustdesk-hbbr: Relay server on port 21117
Both containers configured with -k _ for mandatory encryption
AutoKuma labels added for Uptime Kuma monitoring

MikroTik Configuration

Added NAT rules 24-27 for RustDesk ports
Added firewall forward rules (Allow RustDesk TCP/UDP)
Ports forwarded: 21115 (NAT test), 21116 (TCP+UDP), 21117 (Relay)

DNS

rustdesk.xtrm-lab.org already resolving to 62.73.120.142 (DNS only, no proxy)

Verification

All TCP ports (21115, 21116, 21117) accessible externally
Both containers running healthy
Logs show successful startup with keypair loaded

Client Configuration

Setting	Value
ID Server	rustdesk.xtrm-lab.org
Relay Server	rustdesk.xtrm-lab.org
Public Key	+Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8=

Status

[PHASE 5] RustDesk Self-Hosted - COMPLETED

2026-01-18 - Vaultwarden 502 Fix

Issue

Vaultwarden returning unexpected error when creating new logins
Traefik logs showed 502 Bad Gateway errors

Root Cause

Traefik config pointed to http://192.168.31.2:4743
Vaultwarden container had no port 4743 mapping (port 80/tcp was not published)
Both containers on dockerproxy network but config used host IP

Fix

Updated /mnt/user/appdata/traefik/dynamic.yml
Changed: url: "http://192.168.31.2:4743" → url: "http://vaultwarden:80"
Uses Docker internal DNS which resolves to container IP on dockerproxy network

Status

[SERVICE] vaultwarden: Working - can create/edit logins

2026-01-18 - Progress Summary

Completed Phases

[PHASE 1] DNS Portability - COMPLETE (DoH, DoT, Unbound redundancy)
[PHASE 5] RustDesk Self-Hosted - COMPLETE (hbbs/hbbr deployed)
[PHASE 6] Portainer Management - COMPLETE (6.2/6.3 cancelled - MikroTik incompatible)

In Progress

[PHASE 3] Authentik Zero Trust - Actual Budget integrated, more services pending

Blocked

[PHASE 2] Fossorial Stack - Gerbil requires paid Pangolin license

Not Started

[PHASE 4] Remote Gaming (Sunshine/Moonlight) - Starting now

Known Issues

HomeAssistant_inabox: Exited (1) 3 days ago
pgAdmin4: Exited (137) 2 weeks ago

2026-01-18 - Phase 4 Started: MacBook Prepared

MacBook Setup Complete

[PHASE 4] Moonlight v6.1.0 already installed
[PHASE 4] Tailscale connected (100.68.118.59)

Pending - Nobara Setup

Install Sunshine on Nobara
Configure VA-API encoding
Pair with Moonlight

Instructions Saved

MacBook: ~/Documents/NOBARA-SUNSHINE-SETUP.md

Status

[PHASE 4] MacBook client ready, awaiting Nobara server setup

2026-01-18 - NetAlertX & Uptime Kuma Fixes (Partial)

Uptime Kuma - FIXED

[SERVICE] Added Traefik route for uptime.xtrm-lab.org
Protected with Authentik forward auth
Service URL: http://192.168.31.2:3001

NetAlertX - IN PROGRESS

[ISSUE] Container not scanning network - shows 0 devices
[ROOT CAUSE] Multiple config files exist:
- /app/config/app.conf (mounted from host) - updated correctly
- /app/back/app.conf (container internal) - has old value '--localnet'
[ATTEMPTED] Updated /mnt/user/appdata/netalertx/config/app.conf
[ATTEMPTED] Updated database Settings table
[ATTEMPTED] Deleted database to force reload
[DISCOVERED] App reads from /app/back/app.conf which is generated at startup

NetAlertX Fix Required

The /app/back/app.conf needs to be updated to: SCAN_SUBNETS=['192.168.31.0/24 --interface=br0']
This file is regenerated on container start from /app/config/app.conf
May need to use Settings UI at https://netalert.xtrm-lab.org to change SCAN_SUBNETS

Manual ARP Scan Test - WORKS

Command: docker exec NetAlertX arp-scan --localnet --interface=br0 Result: Found 20 devices on 192.168.31.0/24

Pending Tasks

Fix NetAlertX to use correct subnet config
Add Tailscale network scanning (may not work - ARP doesn't work over tunnels)
User requested: RustFS for personal CDN (assets hosting)

Status

[SERVICE] uptime.xtrm-lab.org - WORKING
[SERVICE] netalertx - PARTIALLY BROKEN (config issue)

2026-01-18 - NetAlertX FIXED

Resolution

[SERVICE] NetAlertX now scanning network correctly - found 21 devices
[FIX] Updated config in multiple locations:
- /data/config/app.conf (runtime config inside container)
- /app/back/app.conf (plugin reads from here)
- /mnt/user/appdata/netalertx/config/app.conf (host mount for persistence)

Config Change

SCAN_SUBNETS=['192.168.31.0/24 --interface=br0']

Root Cause Summary

NetAlertX has complex config handling with multiple config file locations
/app/config (mounted) -> copied to /data/config on startup
/data/config/app.conf is read by the app
/app/back/app.conf is read by plugins at runtime
All three needed to be updated

Verified Working

ARP scan found 21 devices on 192.168.31.0/24
Devices visible at https://netalert.xtrm-lab.org/devices.php

Note on Tailscale Scanning

ARP scanning does NOT work over Tailscale (point-to-point tunnel, no broadcast)
Tailscale devices need to be added manually or via different discovery method

2026-01-18 - RustFS CDN Deployed

Service Details

[SERVICE] RustFS - S3-compatible object storage for personal CDN
Image: rustfs/rustfs:latest
Ports: 9010 (S3 API), 9011 (Console)
Data: /mnt/user/appdata/rustfs/data
Logs: /mnt/user/appdata/rustfs/logs

Access URLs

S3 API: https://cdn.xtrm-lab.org
Console: http://192.168.31.2:9011/rustfs/console/
Credentials stored in: /mnt/user/appdata/rustfs/CREDENTIALS.txt

Traefik Route

Host: cdn.xtrm-lab.org
No Authentik protection (public CDN for assets)
S3 authentication handles access control

Usage

Create bucket and upload assets via:

RustFS Console at port 9011
S3-compatible CLI tools (aws-cli, rclone, etc.)
Direct S3 API calls

Example S3 CLI Usage

# Configure aws-cli
aws configure set aws_access_key_id <access_key>
aws configure set aws_secret_access_key <secret_key>

# Create bucket
aws --endpoint-url https://cdn.xtrm-lab.org s3 mb s3://assets

# Upload file
aws --endpoint-url https://cdn.xtrm-lab.org s3 cp image.png s3://assets/

# Public URL (after setting bucket policy)
https://cdn.xtrm-lab.org/assets/image.png

Status

[SERVICE] rustfs - RUNNING
[PHASE N/A] Personal CDN - COMPLETED

2026-01-18 - PostgreSQL Data Path Restored & Phase 3 Verified

Root Cause Analysis

[INCIDENT] PostgreSQL container was recreated at 07:40 UTC with wrong data path
[CAUSE] Container used default path /mnt/user/appdata/postgresql17 instead of configured /mnt/user/appdata/postgresql
[IMPACT] Authentik started with empty database, all configuration appeared lost
[DATA] Original data was safe in /mnt/user/appdata/postgresql/ the entire time

Resolution

[FIX] Stopped postgresql17 container
[FIX] Recreated container with correct volume mount: /mnt/user/appdata/postgresql:/var/lib/postgresql/data
[FIX] Restarted Authentik containers
[VERIFIED] All Authentik data restored (users, groups, applications, providers)

Other Fixes This Session

[SERVICE] Uptime-Kuma-API: Added missing ADMIN_PASSWORD environment variable
[TRAEFIK] Added Docker provider constraint to filter broken container labels
[TRAEFIK] Added missing routes: authentik, transmission, nextcloud to dynamic.yml

Phase 3 Verification - COMPLETED

Verified Authentik Zero Trust configuration:

Users: akadmin, admin, jazzymc (3 active users)
Groups: authentik Admins, authentik Read-only
Outpost: Embedded Outpost running (proxy type)
Applications: XTRM-Lab Protected Services, Actual Budget
Proxy Provider: forward_domain mode for auth.xtrm-lab.org
2FA: 2 TOTP devices configured
Protected Services: 12 routes using authentik-forward-auth middleware

Services Status

[SERVICE] auth.xtrm-lab.org - WORKING (302 redirect to login)
[SERVICE] uptime.xtrm-lab.org - WORKING (forward auth active)
[SERVICE] ph2.xtrm-lab.org - WORKING (forward auth active)
[SERVICE] All forward-auth protected services - WORKING

Documentation Updated

[DOC] 03-PHASE3-AUTHENTIK-ZEROTRUST.md - Marked as COMPLETED with verified state

2026-01-18 - Phase 5 RustDesk Verified

Server-Side Verification Complete

Keypair exists: /mnt/user/appdata/rustdesk-server/id_ed25519*
Public Key: +Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8=
hbbs container: Up 10+ hours
hbbr container: Up 10+ hours
MikroTik NAT: 4 rules configured (21115-21117 TCP, 21116 UDP)
DNS: rustdesk.xtrm-lab.org → 62.73.120.142
Port 21116 TCP: Externally accessible (verified via nc)
Port 21117 TCP: Externally accessible (verified via nc)

Client Configuration

ID Server: rustdesk.xtrm-lab.org Relay Server: rustdesk.xtrm-lab.org Key: +Xlxh96tqwh9tD58ctOmB05Qpfs0ByCoLQcF+yCw0J8=

Documentation Updated

[DOC] 05-PHASE5-RUSTDESK.md - Rewritten with verified state, marked SERVER-SIDE COMPLETE
[PENDING] Client-side testing (user to verify remote sessions work)

2026-01-18 - Phase 7 GitOps Plan Created

New Phase: Gitea + Woodpecker CI

[DOC] Created 08-PHASE7-GITEA-GITOPS.md
[PLAN] Lightweight GitOps stack for infrastructure management
[COMPONENTS] Gitea (~200MB) + Woodpecker Server/Agent (~200MB)
[TOTAL RESOURCES] ~400MB RAM, ~700MB storage

Planned Features

Git version control for all configs
Automated YAML validation
CI/CD pipeline for deployments
Auto-rollback on health check failure
Authentik SSO integration
Safe AI agent integration

URLs (Planned)

git.xtrm-lab.org - Gitea web UI
ci.xtrm-lab.org - Woodpecker CI dashboard

20 KiB Raw Blame History

Infrastructure Changelog

2026-01-18

2026-01-17 - Homarr + Portainer Integration

Portainer App Added to Homarr

Docker Integration Added

Database Changes

Access

2026-01-17 - Phase 6.2/6.3 Cancelled: MikroTik Incompatible

Discovery

What Was Attempted

Cleanup Performed

Conclusion

Status Update

2026-01-17 - Unraid Container Labels Fixed

Containers Updated

Portainer Labels

Unbound Labels

Note

2026-01-17 - Phase 6.1 Completed: Portainer CE Deployed

Portainer CE Installation

Port Conflict Resolution

Next Steps

Status

2026-01-17 - Phase 6 Added: Multi-Host Docker Management

New Documentation

Phase 6 Components

Security Considerations

Status

2026-01-17 - Status Audit

Verified Working

Issues Found

MikroTik NAT Status

Template for Future Entries

YYYY-MM-DD - Description

Changes

Issues

Notes

2026-01-17 - DNS Infrastructure Fixes

DoH-Server

Unraid Unbound - FIXED

MikroTik NAT Changes

Unbound Configuration

Verified Working

Still Pending

MikroTik Pi-hole Upstreams - FIXED

Full DNS Redundancy Now Achieved

2026-01-17 - Gerbil Investigation: Feature Not Available

Issue

Root Cause

Resolution

Status

2026-01-18 - Actual Budget OIDC Integration with Authentik

Problem

Fixes Applied

1. DNS Resolution (EAI_AGAIN)

2. JWT Signing Algorithm (HS256 vs RS256)

3. Insufficient Scope

Traefik Static IP

Final Configuration

Actual Budget OIDC Environment

Status

2026-01-18 - Phase 5 Completed: RustDesk Self-Hosted Deployment

Keypair Generation

Containers Deployed

MikroTik Configuration

DNS

Verification

Client Configuration

Status

2026-01-18 - Vaultwarden 502 Fix

Issue

Root Cause

Fix

Status

2026-01-18 - Progress Summary

Completed Phases

In Progress

Blocked

Not Started

20 KiB

Raw Blame History