arr-stack Kubernetes Migration Assessment¶
Status: Assessment Phase Priority: Low (Backlog) Created: 2025-12-02 Owner: Infrastructure Team
Executive Summary¶
This document assesses the feasibility, benefits, risks, and implementation approach for migrating the arr-stack media automation system from Docker Compose on VM 100 to Kubernetes.
Current State: arr-stack runs on VM 100 (10.89.97.50) using Docker Compose with 11 services, stable and operational.
Key Finding: Migration is technically feasible but introduces significant complexity with moderate benefits. The VPN networking requirement (Gluetun) is the primary technical challenge.
Recommendation: Defer migration until one of these conditions is met: 1. VM 100 experiences reliability issues 2. Need arises for advanced K8s features (autoscaling, canary deployments) 3. Unified K8s management becomes critical operational requirement 4. Solution for VPN sidecar networking is proven in homelab context
Table of Contents¶
- Current Architecture
- Migration Drivers
- Technical Challenges
- Proposed Kubernetes Architecture
- Implementation Options
- Resource Requirements
- Migration Plan
- Risk Assessment
- Cost-Benefit Analysis
- Recommendation
Current Architecture¶
Infrastructure¶
VM: VM 100 (10.89.97.50)
OS: Debian 12
Orchestration: Docker Compose
Location: /opt/arr-stack/docker-compose.yml
Storage: /mnt mounted from NAS (LXC 101)
Service Topology¶
┌─────────────────────────────────────────────────────────┐
│ VM 100 │
│ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Gluetun (VPN Container - Mullvad WireGuard) │ │
│ │ • Routes all download traffic through VPN │ │
│ │ • Exposes ports for SABnzbd (8080) & Deluge │ │
│ └──────────────────────────────────────────────────┘ │
│ ▲ ▲ │
│ │ network_mode: service │ │
│ ┌──────┴──────┐ ┌──────┴──────┐ │
│ │ SABnzbd │ │ Deluge │ │
│ │ (Usenet) │ │ (Torrent) │ │
│ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Sonarr │ │ Radarr │ │ Lidarr │ │ Bazarr │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ ┌──────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Prowlarr │ │ Overseerr │ │Jellyseerr │ │
│ └──────────┘ └───────────┘ └───────────┘ │
│ │
│ └──────────┘ │
│ │Watchtower│ (Auto-updates at 3:00 AM daily) │
│ └──────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
/mnt (NFS from LXC 101)
├── media/
│ ├── tv/
│ ├── movies/
│ ├── music/
│ └── torrents/
└── downloads/
Key Characteristics¶
VPN Dependency:
- Gluetun container provides VPN tunnel
- SABnzbd and Deluge use network_mode: "service:gluetun"
- All download traffic routes through Mullvad WireGuard
Storage:
- Config data: /opt/arr-stack/configs/ (local to VM)
- Media data: /mnt (NFS mount from NAS)
- Total media size: ~2TB
- Config size: ~500MB
Networking: - All services exposed via direct port mapping - Forward auth via K8s Ingress (already implemented) - Services communicate via Docker bridge network
Updates: - Watchtower auto-updates containers daily at 3:00 AM - LinuxServer.io images (well-maintained)
Migration Drivers¶
Benefits of Migrating to Kubernetes¶
1. Unified Management¶
- Single orchestration platform (K8s) for all services
- Consistent deployment patterns across homelab
- Centralized configuration management
2. Improved Observability¶
- Native Prometheus metrics scraping
- Grafana dashboards for arr-stack services
- Centralized logging via Loki (already deployed)
- Better visibility into resource usage
3. Enhanced Reliability¶
- Automatic pod restarts on failure
- Health checks and readiness probes
- Resource limits and requests enforced
- Better isolation between services
4. Advanced Features¶
- Horizontal pod autoscaling (if needed)
- Rolling updates with zero downtime
- Canary deployments for testing
- Network policies for security
5. Disaster Recovery¶
- Declarative manifests in git (GitOps)
- Easier backup/restore via Velero
- Consistent with other homelab apps
Why Current Setup Works Well¶
Stability: Docker Compose is battle-tested, no issues in production
Simplicity: Single docker-compose.yml, easy to understand and modify
VPN Integration: network_mode: service works perfectly for routing traffic
Resource Efficiency: No K8s overhead, direct access to host resources
Update Strategy: Watchtower handles updates automatically
Low Maintenance: "Set and forget" - hasn't required intervention
Technical Challenges¶
1. VPN Networking (Primary Challenge)¶
Problem: Kubernetes doesn't support network_mode: "service:container" directly.
Current Approach:
K8s Options:
Option A: Sidecar Container Pattern¶
- Deploy Gluetun as sidecar in each download client pod
- Use shared network namespace (pod-level networking)
- Pros: Clean, K8s-native approach
- Cons: Duplicate VPN connections, more resource usage
Option B: Shared Network Namespace¶
- Deploy Gluetun in separate pod with hostNetwork
- Route download client traffic through Gluetun pod IP
- Pros: Single VPN connection
- Cons: Complex networking, requires CNI plugin support
Option C: VPN Gateway Service¶
- Create dedicated VPN gateway pod
- Use K8s Service with specific routing rules
- Pros: Centralized VPN management
- Cons: Requires advanced networking configuration
Option D: Keep Gluetun on VM, Connect via Network Policy¶
- Leave VPN container on VM 100
- Connect K8s pods to VM via external service
- Pros: Minimal changes to working VPN setup
- Cons: Defeats purpose of full migration, hybrid complexity
Recommendation: Start with Option A (Sidecar) for simplicity and K8s-native approach.
2. Storage Migration¶
Challenge: Large media library and config data need persistent storage.
Current:
- Config: Local to VM at /opt/arr-stack/configs/ (~500MB)
- Media: NFS mount from LXC 101 at /mnt (~2TB)
K8s Options:
Config Storage¶
- Option 1: Longhorn PersistentVolumes (current K8s storage class)
- Pros: Integrated, replicated, backed up
-
Cons: Overhead for small config files
-
Option 2: ConfigMaps for read-only configs
- Pros: K8s-native, version controlled
-
Cons: Only for non-sensitive, read-only data
-
Option 3: Hostpath on K8s worker node
- Pros: Fast, local storage
- Cons: Node-specific, not portable
Recommendation: Use Longhorn PVCs for config data (proper K8s pattern).
Media Storage¶
- Option 1: NFS StorageClass pointing to LXC 101
- Pros: No data migration needed, shared across nodes
-
Cons: Requires NFS StorageClass setup (not yet configured)
-
Option 2: Direct NFS PersistentVolume
- Pros: Explicit control over mount
-
Cons: Manual PV creation per app
-
Option 3: Keep on VM, mount via external service
- Pros: No migration needed
- Cons: Hybrid architecture, defeats purpose
Recommendation: Create NFS StorageClass for /vault/subvol-101-disk-0/media.
3. Port Management¶
Current: Direct port exposure via VM (8080, 8112, 8989, etc.)
K8s: Services need LoadBalancer IPs or Ingress routing (already have Ingress for auth).
Solution: Use existing Ingress configuration (forward auth already implemented).
4. State and Data Consistency¶
Concerns: - Database files in configs/ (SQLite for most arr services) - Download state in SABnzbd/Deluge - Queue state in Sonarr/Radarr
Mitigation: 1. Full backup before migration 2. Quiesce services (pause downloads, complete in-progress) 3. Migrate config data to K8s PVCs 4. Test with read-only mounts first 5. Validate data integrity post-migration
5. Auto-Updates (Watchtower Replacement)¶
Current: Watchtower updates containers daily at 3:00 AM.
K8s Options:
- Manual image updates with kubectl set image
- ArgoCD image updater (if using GitOps)
- Renovate bot for manifest updates
- Custom CronJob to check for image updates
Recommendation: Manual updates or ArgoCD image updater (if deploying ArgoCD).
Proposed Kubernetes Architecture¶
Namespace Structure¶
apiVersion: v1
kind: Namespace
metadata:
name: arr-stack
labels:
name: arr-stack
monitoring: enabled
Pod Architecture (Sidecar Pattern)¶
SABnzbd Pod (with Gluetun Sidecar)¶
apiVersion: v1
kind: Pod
metadata:
name: sabnzbd
namespace: arr-stack
spec:
shareProcessNamespace: true
containers:
# VPN Sidecar
- name: gluetun
image: qmcgaw/gluetun:latest
securityContext:
capabilities:
add:
- NET_ADMIN
env:
- name: VPN_SERVICE_PROVIDER
value: "mullvad"
- name: VPN_TYPE
value: "wireguard"
- name: WIREGUARD_PRIVATE_KEY
valueFrom:
secretKeyRef:
name: vpn-credentials
key: wireguard-key
- name: SERVER_CITIES
value: "Boston MA"
# Health check for VPN connectivity
livenessProbe:
exec:
command: ["sh", "-c", "wget -q --spider https://api.ipify.org"]
initialDelaySeconds: 30
periodSeconds: 60
# SABnzbd Application
- name: sabnzbd
image: lscr.io/linuxserver/sabnzbd:latest
env:
- name: PUID
value: "1000"
- name: PGID
value: "1000"
- name: TZ
value: "America/New_York"
volumeMounts:
- name: config
mountPath: /config
- name: media
mountPath: /data
ports:
- containerPort: 8080
name: http
volumes:
- name: config
persistentVolumeClaim:
claimName: sabnzbd-config
- name: media
persistentVolumeClaim:
claimName: arr-stack-media # Shared NFS volume
Key Points:
- shareProcessNamespace: true enables sidecar networking
- Both containers share same network namespace
- Gluetun provides VPN tunnel, SABnzbd routes through it
- Health check validates VPN connectivity
Storage Architecture¶
Config Storage (Longhorn PVC)¶
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sonarr-config
namespace: arr-stack
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 5Gi
Media Storage (NFS PVC)¶
apiVersion: v1
kind: PersistentVolume
metadata:
name: arr-stack-media
spec:
capacity:
storage: 5Ti
accessModes:
- ReadWriteMany
nfs:
server: 10.89.97.237 # LXC 101 NAS
path: /vault/subvol-101-disk-0/media
mountOptions:
- nfsvers=4.1
- hard
- timeo=600
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: arr-stack-media
namespace: arr-stack
spec:
accessModes:
- ReadWriteMany
storageClassName: "" # Static PV binding
volumeName: arr-stack-media
resources:
requests:
storage: 5Ti
Service Deployment Pattern¶
Each arr service (Sonarr, Radarr, etc.) follows this pattern:
- Deployment: Manages replica set and rolling updates
- PVC: Persistent config storage (Longhorn)
- Service: ClusterIP for internal communication
- Ingress: Forward auth via Authentik (already configured)
Example (Sonarr):
apiVersion: apps/v1
kind: Deployment
metadata:
name: sonarr
namespace: arr-stack
spec:
replicas: 1
strategy:
type: Recreate # SQLite databases can't handle concurrent access
selector:
matchLabels:
app: sonarr
template:
metadata:
labels:
app: sonarr
spec:
containers:
- name: sonarr
image: lscr.io/linuxserver/sonarr:latest
env:
- name: PUID
value: "1000"
- name: PGID
value: "1000"
- name: TZ
value: "America/New_York"
volumeMounts:
- name: config
mountPath: /config
- name: media
mountPath: /data
ports:
- containerPort: 8989
name: http
livenessProbe:
httpGet:
path: /ping
port: 8989
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /ping
port: 8989
initialDelaySeconds: 15
periodSeconds: 10
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "1000m"
volumes:
- name: config
persistentVolumeClaim:
claimName: sonarr-config
- name: media
persistentVolumeClaim:
claimName: arr-stack-media
Implementation Options¶
Option 1: Full Migration (All Services)¶
Scope: Migrate all 11 arr-stack services to K8s at once.
Pros: - Clean cutover, no hybrid state - Unified management from day one - Simpler to reason about
Cons: - Higher risk (all eggs in one basket) - Longer downtime window - Harder to rollback if issues arise
Downtime: 2-4 hours (backup, quiesce, migrate, validate)
Option 2: Phased Migration (Service by Service)¶
Phase 1: Management services (Prowlarr, Overseerr, Jellyseerr) - No VPN dependency - Lower risk - Validate K8s patterns
Phase 2: Content management (Sonarr, Radarr, Lidarr, Bazarr) - Core functionality - Test storage integration - Validate service communication
Phase 3: Download clients (SABnzbd, Deluge + Gluetun) - VPN complexity - Most critical for privacy - Highest risk components
Pros: - Lower risk per phase - Easier rollback - Learn and adapt between phases
Cons: - Hybrid architecture during transition - Longer total migration time - Service intercommunication across VM/K8s
Downtime per phase: 30-60 minutes
Option 3: Parallel Deployment (Keep VM as Fallback)¶
Approach: 1. Deploy arr-stack to K8s alongside VM deployment 2. Run both in parallel with separate configs 3. Test K8s version thoroughly 4. Cutover when confident 5. Keep VM as hot standby for 1-2 weeks
Pros: - Zero downtime migration - Easy rollback (just switch back) - Full validation before cutover
Cons: - Duplicate downloads during testing - More complex (manage two instances) - Requires duplicate storage for configs
Downtime: None (cutover via DNS/Ingress)
Resource Requirements¶
Compute Resources¶
| Service | Current (VM) | K8s Requests | K8s Limits | Notes |
|---|---|---|---|---|
| Gluetun (x2) | - | 100m / 128Mi | 200m / 256Mi | Per sidecar |
| SABnzbd | - | 250m / 512Mi | 1000m / 1Gi | CPU-intensive (extraction) |
| Deluge | - | 250m / 512Mi | 500m / 1Gi | Torrent handling |
| Sonarr | - | 250m / 512Mi | 1000m / 1Gi | API-heavy |
| Radarr | - | 250m / 512Mi | 1000m / 1Gi | API-heavy |
| Lidarr | - | 250m / 512Mi | 1000m / 1Gi | API-heavy |
| Prowlarr | - | 250m / 256Mi | 500m / 512Mi | Lightweight |
| Bazarr | - | 250m / 256Mi | 500m / 512Mi | Subtitle processing |
| Overseerr | - | 250m / 256Mi | 500m / 512Mi | Request management |
| Jellyseerr | - | 250m / 256Mi | 500m / 512Mi | Request management |
| Total | ~2 cores, 4GB | ~2.5 cores, 4.5GB | ~7 cores, 9GB | With VPN sidecars |
Cluster Capacity: - Current: 3 worker nodes, ~12 cores, ~24GB RAM total - Arr-stack would use: ~21% CPU requests, ~19% memory requests - Verdict: Sufficient capacity exists
Storage Resources¶
| Type | Size | K8s Storage | Notes |
|---|---|---|---|
| Config (Sonarr) | 50-100MB | Longhorn PVC (5Gi) | SQLite database |
| Config (Radarr) | 50-100MB | Longhorn PVC (5Gi) | SQLite database |
| Config (Lidarr) | 50-100MB | Longhorn PVC (5Gi) | SQLite database |
| Config (Prowlarr) | 10-20MB | Longhorn PVC (1Gi) | Lightweight |
| Config (Bazarr) | 50-100MB | Longhorn PVC (5Gi) | Subtitle cache |
| Config (SABnzbd) | 50MB | Longhorn PVC (5Gi) | Queue/history |
| Config (Deluge) | 50MB | Longhorn PVC (5Gi) | Torrent state |
| Config (Overseerr) | 50MB | Longhorn PVC (5Gi) | Request database |
| Config (Jellyseerr) | 50MB | Longhorn PVC (5Gi) | Request database |
| Config (Gluetun) | 1MB | ConfigMap | VPN config |
| Media | ~2TB | NFS PV | Shared, no migration |
| Total Longhorn | ~500MB | ~41Gi provisioned | Overprovisioned for growth |
Storage Impact: - Longhorn: +41Gi provisioned (~10Gi actual usage) - NFS: No impact (existing mount) - Verdict: Minimal impact on Longhorn capacity
Network Resources¶
MetalLB IPs: Already using K8s Ingress (no additional IPs needed)
Bandwidth: Same as current (downloads via VPN, LAN access)
DNS: Already configured (*.internal domain)
Migration Plan¶
Prerequisites¶
-
NFS StorageClass Setup
-
Secrets Creation
-
Backup Current State
Phase 1: Non-VPN Services (Low Risk)¶
Services: Prowlarr, Overseerr, Jellyseerr
Steps:
-
Create namespace and storage:
-
Migrate config data:
-
Deploy services:
-
Validate:
- Services start successfully
- Configs loaded correctly
- Web UI accessible via Ingress
-
Authentication works (Authentik forward auth)
-
Monitor for 24-48 hours
Rollback: Restart services on VM 100, update Ingress endpoints back to VM.
Phase 2: Content Management (Medium Risk)¶
Services: Sonarr, Radarr, Lidarr, Bazarr
Steps:
- Quiesce services:
- Pause all monitoring/searching in Sonarr/Radarr/Lidarr
- Let current downloads complete
-
Wait for idle state
-
Migrate config data (same process as Phase 1)
-
Deploy services:
-
Validate:
- All series/movies/music libraries intact
- API keys still valid (check connections to Prowlarr, download clients)
- Queue processing resumes
-
File imports work correctly
-
Monitor for 24-48 hours
Rollback: Restart on VM, copy back any changed config data.
Phase 3: Download Clients (High Risk)¶
Services: Gluetun, SABnzbd, Deluge
Steps:
- Pause all downloads:
- Pause SABnzbd queue
- Pause all torrents in Deluge
-
Wait for idle state
-
Validate VPN connectivity on K8s:
-
Deploy download clients:
-
Validate:
- VPN connection active (check IP via Gluetun logs)
- SABnzbd/Deluge accessible via K8s service
- Download history preserved
- Test download through VPN
-
Verify download completes and imports to Sonarr/Radarr
-
Update Sonarr/Radarr/Lidarr download client endpoints:
-
Change from
http://10.89.97.50:8080tohttp://sabnzbd.arr-stack.svc.cluster.local:8080 -
Resume operations, monitor closely for 48 hours
Rollback: Critical - VPN credentials stored in K8s secrets, easy to restart on VM if needed.
Phase 4: Decommission VM 100¶
After 1-2 weeks of stable operation:
-
Final backup of VM 100:
-
Stop Docker Compose on VM:
-
Optionally: Repurpose VM 100 or shut down to save resources
Do NOT delete VM until 30+ days of stable K8s operation.
Timeline Estimate¶
| Phase | Duration | Downtime | Dependencies |
|---|---|---|---|
| Prerequisites | 2-4 hours | None | NFS StorageClass, secrets |
| Phase 1 (Non-VPN) | 4-6 hours | 30 min/service | Prerequisites complete |
| Phase 2 (Content Mgmt) | 4-6 hours | 1 hour | Phase 1 stable for 48h |
| Phase 3 (Download Clients) | 6-8 hours | 2 hours | Phase 2 stable for 48h |
| Phase 4 (Decommission) | 1 hour | None | Phase 3 stable for 2 weeks |
| Total | 17-25 hours | ~4 hours total | 3-4 weeks elapsed |
Risk Assessment¶
High Risks¶
1. VPN Connectivity Issues¶
Risk: Gluetun sidecar fails to establish VPN connection in K8s.
Impact: Download clients exposed to ISP (privacy leak), downloads fail.
Probability: Medium (new networking pattern, untested in homelab)
Mitigation: - Test VPN sidecar pattern extensively before migration - Add liveness probes to validate VPN connectivity - Fail-closed: Block downloads if VPN is down - Keep VM 100 as hot standby during initial rollout
Rollback: Immediately switch back to VM 100, investigate K8s networking.
2. Data Corruption During Migration¶
Risk: Config database corruption during PVC migration.
Impact: Loss of series/movie metadata, download history, custom settings.
Probability: Low (well-tested migration process)
Mitigation: - Full backup before migration - Quiesce all services before copying data - Validate data integrity post-migration (checksum comparison) - Keep VM backup for 30+ days
Rollback: Restore from backup, restart on VM 100.
3. Storage Performance Degradation¶
Risk: NFS storage slower than local disk on VM.
Impact: Slower imports, higher CPU usage, potential timeout issues.
Probability: Medium (network-attached storage inherently slower)
Mitigation: - Benchmark NFS performance before migration - Use NFSv4.1 with optimized mount options - Monitor import times and adjust if needed - Consider caching layer if performance issues persist
Rollback: Move back to VM 100 local storage.
Medium Risks¶
4. Service Intercommunication Issues¶
Risk: Sonarr/Radarr can't reach SABnzbd/Deluge after migration.
Impact: Downloads fail to trigger, imports fail.
Probability: Low (K8s DNS is reliable)
Mitigation: - Test service discovery before full migration - Use K8s service DNS names consistently - Add readiness probes to ensure services are reachable
Rollback: Update service endpoints back to VM IPs.
5. Resource Contention¶
Risk: Arr-stack pods compete for CPU/memory with other K8s apps.
Impact: Performance degradation, OOM kills.
Probability: Low (sufficient cluster capacity)
Mitigation: - Set appropriate resource requests/limits - Monitor cluster resource usage - Scale cluster if needed (add worker node)
Rollback: Reduce replica count or restart on VM.
Low Risks¶
6. Ingress Forward Auth Issues¶
Risk: Authentik forward auth breaks after migration.
Impact: Can't access arr-stack web UIs.
Probability: Very Low (forward auth already working, Ingress config unchanged)
Mitigation: - No changes to Ingress manifests needed - Test forward auth after each phase
Rollback: Update Ingress endpoints back to VM IPs.
7. Auto-Update Disruption¶
Risk: Loss of Watchtower auto-updates.
Impact: Manual update process required.
Probability: High (Watchtower not migrating)
Mitigation: - Document manual update process - Consider ArgoCD image updater for automation - Set up alerts for outdated images
No rollback needed: Manual updates are acceptable.
Cost-Benefit Analysis¶
Costs (Effort & Risk)¶
| Category | Effort | Risk | Notes |
|---|---|---|---|
| Planning & Documentation | 8 hours | Low | This document |
| NFS StorageClass Setup | 2 hours | Low | One-time, reusable |
| Manifest Creation | 16 hours | Medium | 11 services + shared resources |
| Testing VPN Pattern | 8 hours | High | Critical path, unknown unknowns |
| Phase 1 Migration | 6 hours | Low | Non-VPN services |
| Phase 2 Migration | 6 hours | Medium | Content management |
| Phase 3 Migration | 8 hours | High | VPN-dependent downloads |
| Validation & Monitoring | 20 hours | Medium | Over 3-4 weeks |
| Documentation Updates | 4 hours | Low | Update arr-stack.md |
| Total Effort | ~78 hours | Medium-High | ~2 weeks full-time |
Benefits (Qualitative)¶
| Benefit | Value | Timeframe | Notes |
|---|---|---|---|
| Unified Management | Medium | Immediate | Single platform for all services |
| Improved Observability | Medium | Immediate | Prometheus/Grafana integration |
| Better Reliability | Low-Medium | 1-3 months | Health checks, auto-restarts |
| GitOps-Ready | Low | Long-term | If implementing ArgoCD |
| Disaster Recovery | Low-Medium | Long-term | Velero backups, declarative config |
| Resource Efficiency | Negative | Immediate | K8s overhead vs Docker Compose |
| Operational Simplicity | Negative | Short-term | More complex during learning curve |
Quantitative Analysis¶
Cost: ~78 hours @ $0/hour = $0 (homelab, personal time)
Benefit: Moderate improvement in observability and management
Break-Even: Unclear - benefits are mostly qualitative
Opportunity Cost: 78 hours could be spent on: - New homelab applications (AI subtitle generator, etc.) - Improving existing apps (trip-planner budget tracking) - Infrastructure improvements (SSL/TLS, backups, monitoring)
Recommendation¶
Primary Recommendation: DEFER MIGRATION¶
Rationale:
-
Current State is Stable: arr-stack on VM 100 has been running reliably with zero issues. "If it ain't broke, don't fix it."
-
High Effort, Moderate Benefits: 78 hours of migration effort for benefits that are mostly qualitative. The improvement in observability and management doesn't justify the risk and effort.
-
VPN Networking Uncertainty: The sidecar pattern for Gluetun is untested in this homelab. This introduces unknown risks for a privacy-critical component.
-
Better Alternatives Exist: The same 78 hours could deliver higher-value improvements:
- SSL/TLS for all services (security improvement)
- Backup strategy with Velero (disaster recovery)
- New applications (AI subtitle generator, knowledge base)
-
Enhanced monitoring and alerting
-
Hybrid Architecture Complexity: During phased migration, managing services across VM and K8s adds operational complexity.
-
Resource Efficiency Loss: Docker Compose is more efficient than K8s for this use case (no orchestration overhead).
Conditions for Reconsidering Migration¶
Revisit this decision if any of these conditions arise:
-
VM 100 Reliability Issues: Hardware failures, persistent Docker problems, or stability concerns
-
Need for K8s-Specific Features: Requirements for autoscaling, canary deployments, or advanced traffic management
-
Unified Management Becomes Critical: If managing hybrid VM/K8s becomes painful operationally
-
Proven VPN Pattern: If Gluetun sidecar pattern is successfully implemented and validated in another project
-
ArgoCD Deployment: If implementing GitOps for other apps, arr-stack could benefit from inclusion
Alternative: Incremental Improvements to Current Setup¶
Instead of full migration, consider these low-effort improvements:
- Add Prometheus Exporters:
- Deploy node exporter on VM 100
- Scrape arr service metrics via existing APIs
-
Create Grafana dashboards
-
Backup Automation:
- Schedule daily config backups to
/vault -
Test restoration procedure
-
Monitoring Integration:
- Send Docker logs to Loki (already deployed)
-
Set up alerts for service failures
-
GitOps for Config:
- Version control
docker-compose.ymlin tower-fleet repo - Document update procedures
Effort: ~8-10 hours total Benefit: Improved observability with minimal risk Cost: $0 (no downtime, no migration risk)
Appendices¶
Appendix A: NFS StorageClass Manifest¶
# /root/tower-fleet/manifests/storage/nfs-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-media
provisioner: kubernetes.io/no-provisioner # Static provisioning
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: arr-stack-media
spec:
capacity:
storage: 5Ti
accessModes:
- ReadWriteMany
nfs:
server: 10.89.97.237
path: /vault/subvol-101-disk-0/media
mountOptions:
- nfsvers=4.1
- hard
- timeo=600
- retrans=2
- noresvport
Appendix B: Gluetun Test Pod¶
# /root/tower-fleet/manifests/arr-stack/gluetun-test.yaml
apiVersion: v1
kind: Pod
metadata:
name: gluetun-test
namespace: arr-stack
spec:
shareProcessNamespace: true
containers:
- name: gluetun
image: qmcgaw/gluetun:latest
securityContext:
capabilities:
add:
- NET_ADMIN
env:
- name: VPN_SERVICE_PROVIDER
value: "mullvad"
- name: VPN_TYPE
value: "wireguard"
- name: WIREGUARD_PRIVATE_KEY
valueFrom:
secretKeyRef:
name: vpn-credentials
key: wireguard-key
- name: SERVER_CITIES
value: "Boston MA"
- name: test
image: busybox
command: ["sleep", "3600"]
Test VPN connectivity:
kubectl exec -n arr-stack gluetun-test -c test -- wget -qO- https://api.ipify.org
# Should return Mullvad exit node IP, not home IP
Appendix C: Reference Links¶
- Gluetun Documentation
- Kubernetes Sidecar Pattern
- NFS Persistent Volumes
- arr-stack Current Documentation
- arr-stack SSO Implementation
Document Status: Draft Next Review: When reconsidering migration (see conditions above) Maintained By: Infrastructure Team