SubtitleAI Production Deployment Plan¶

Overview¶

Deploy SubtitleAI to k3s cluster following the same patterns as home-portal and money-tracker, with additional components for background processing.

Architecture Components¶

1. Web Application (Next.js)¶

Container: Next.js 16 app
Deployment: Standard k8s Deployment
Resources: 256Mi RAM, 100m CPU (like money-tracker)
Ports: 3000 (HTTP)
Features: UI for upload, job management, download

2. Worker Service (Python/Celery)¶

Container: Custom Python image with Whisper
Deployment: k8s Deployment (scalable)
Resources: 2Gi RAM, 1000m CPU (transcription is CPU-intensive)
Dependencies:
ffmpeg
OpenAI Whisper (~3.5GB models)
PyTorch
Supabase Python SDK
Volume Mount: /vault for future media library scanning

3. Poller Service (Python)¶

Container: Lightweight Python poller
Deployment: k8s Deployment (single replica)
Resources: 128Mi RAM, 50m CPU
Function: Polls database every 5s for pending jobs

4. Redis (Message Queue)¶

Deployment: StatefulSet or use existing Redis if available
Resources: 256Mi RAM, 100m CPU
Persistence: Optional (jobs stored in Supabase)

Deployment Workflow¶

Phase 1: Prerequisites ✓¶

[x] App developed in LXC 180 (dev environment)
[x] Worker tested in LXC 181
[x] Supabase schema configured (subtitleai)
[x] Storage buckets created (subtitleai-uploads, subtitleai-outputs)

Phase 2: Containerization¶

Location: LXC 180 (has Docker)

2.1 Create Dockerfiles¶

Next.js App: Standard multi-stage build (like money-tracker)
Worker: Python 3.11 + Whisper + dependencies
Poller: Lightweight Python image (shares base with worker)

2.2 Build Images Locally¶

# In LXC 180
cd /root/subtitleai

# Build Next.js app
docker build -t subtitleai-web:v1.0.0 -f Dockerfile.web .

# Build worker (includes poller)
docker build -t subtitleai-worker:v1.0.0 -f Dockerfile.worker ./worker

2.3 Push to Private Registry¶

REGISTRY="10.89.97.201:30500"

# Tag and push web app
docker tag subtitleai-web:v1.0.0 ${REGISTRY}/subtitleai-web:v1.0.0
docker push ${REGISTRY}/subtitleai-web:v1.0.0

# Tag and push worker
docker tag subtitleai-worker:v1.0.0 ${REGISTRY}/subtitleai-worker:v1.0.0
docker push ${REGISTRY}/subtitleai-worker:v1.0.0

Phase 3: Kubernetes Manifests¶

Location: /root/tower-fleet/manifests/apps/subtitleai/

3.1 Create Manifest Files¶

namespace.yaml - subtitleai namespace
web-deployment.yaml - Next.js app deployment
web-service.yaml - LoadBalancer service (port 80→3000)
web-ingress.yaml - Ingress (subtitles.internal)
worker-deployment.yaml - Celery worker deployment
poller-deployment.yaml - Database poller deployment
redis-statefulset.yaml - Redis for Celery
redis-service.yaml - ClusterIP service for Redis
configmap.yaml - Shared config (Redis URL, poll interval)
secret.yaml - Supabase credentials (from sealed-secrets)

3.2 Apply Manifests¶

kubectl apply -f /root/tower-fleet/manifests/apps/subtitleai/

Phase 4: Deployment Script¶

Location: /root/tower-fleet/scripts/deploy-subtitleai.sh

Pattern: Follow deploy-home-portal.sh structure

Key Steps: 1. Pull latest code from git (LXC 180) 2. Update .env.production with Supabase keys from k8s secrets 3. Build Docker images (web + worker) 4. Tag images with auto-incremented semver 5. Push to private registry 6. Update k8s deployments with new image tags 7. Wait for rollout completion 8. Run health checks

Versioning Strategy: - Web app: v1.0.0 → v1.0.1 (auto-increment) - Worker: Same version as web app (keep in sync)

Phase 5: Initial Deployment¶

5.1 Environment Variables¶

# Web App (.env.production)
NEXT_PUBLIC_SUPABASE_URL=http://10.89.97.214:8000
NEXT_PUBLIC_SUPABASE_ANON_KEY=<from k8s secrets>

# Worker (ConfigMap/Secret)
SUPABASE_URL=http://10.89.97.214:8000
SUPABASE_SERVICE_KEY=<from k8s secrets>
REDIS_URL=redis://subtitleai-redis:6379/0
POLL_INTERVAL=5

5.2 Resource Allocation¶

# Web App
requests: { cpu: 100m, memory: 256Mi }
limits: { cpu: 500m, memory: 512Mi }

# Worker (transcription intensive)
requests: { cpu: 1000m, memory: 2Gi }
limits: { cpu: 2000m, memory: 4Gi }

# Poller (lightweight)
requests: { cpu: 50m, memory: 128Mi }
limits: { cpu: 100m, memory: 256Mi }

# Redis
requests: { cpu: 100m, memory: 256Mi }
limits: { cpu: 200m, memory: 512Mi }

5.3 Ingress Configuration¶

host: subtitles.internal
annotations:
  nginx.ingress.kubernetes.io/proxy-body-size: "2048m"  # 2GB file uploads
  nginx.ingress.kubernetes.io/proxy-read-timeout: "600"  # 10min for long uploads

Phase 6: Testing¶

6.1 Smoke Tests¶

[ ] Web UI loads at http://subtitles.internal
[ ] Can login with Supabase auth
[ ] Upload small video file
[ ] Worker picks up job
[ ] Transcription completes
[ ] Download SRT file

6.2 Load Testing¶

[ ] Multiple concurrent uploads
[ ] Worker scales (test with kubectl scale)
[ ] Large file uploads (2GB max)

Phase 7: Monitoring & Observability¶

7.1 Add ServiceMonitor (Prometheus)¶

# Similar to home-portal-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: subtitleai-web
  namespace: subtitleai
spec:
  selector:
    matchLabels:
      app: subtitleai-web
  endpoints:
  - port: http
    path: /api/metrics

7.2 Logging¶

Use Loki for log aggregation (already in cluster)
Query: {namespace="subtitleai"}
Worker logs: Celery task output
Poller logs: Job polling activity

7.3 Alerts (Optional)¶

Worker pod crashes
High memory usage (>3.5Gi worker)
Failed jobs rate

Volume Mounts (Future)¶

Media Library Access¶

For scanning existing media files, mount /vault into worker pods:

# worker-deployment.yaml
volumes:
  - name: media
    hostPath:
      path: /vault/subvol-101-disk-0/media
      type: Directory
volumeMounts:
  - name: media
    mountPath: /media
    readOnly: true  # Read-only for safety

Note: Requires nodeSelector or nodeAffinity to ensure pods run on nodes with /vault access.

Bazarr Integration (Future)¶

Option 1: API Integration¶

Bazarr at http://10.89.97.50:6767
Use Bazarr API to query subtitle status
No volume mounts needed

Option 2: File System Scanning¶

Mount same /vault path as Bazarr
Scan for .srt, .ass, .vtt files
Build own subtitle inventory

Migration Plan (Dev → Prod)¶

Current State¶

Dev: LXC 180 (Next.js on port 3000)
Worker: LXC 181 (systemd services)

Transition Steps¶

Deploy to k3s (new namespace)
Test thoroughly in k3s
Update DNS/ingress to point to k3s
Decommission LXC dev environments (or keep for development)

Rollback Plan¶

Keep LXC 180/181 running during initial k3s deployment
If k8s deployment fails, traffic still on LXC
Use kubectl rollout undo for k8s rollbacks

Checklist¶

Pre-Deployment¶

[ ] Review and finalize this plan
[ ] Create Dockerfiles (web + worker)
[ ] Create k8s manifests (namespace, deployments, services, ingress)
[ ] Create deployment script
[ ] Test build locally in LXC 180

Deployment¶

[ ] Apply namespace and secrets
[ ] Deploy Redis
[ ] Deploy worker and poller
[ ] Deploy web app
[ ] Configure ingress
[ ] Run smoke tests

Post-Deployment¶

[ ] Add monitoring/alerts
[ ] Document in tower-fleet repo
[ ] Update /root/PROJECTS.md
[ ] Add to home-portal dashboard (optional)

Lessons Learned (Post-Deployment)¶

Issue 1: TypeScript Build Errors Not Caught in Development¶

Problem: Next.js 16 production build (npm run build) failed with TypeScript errors that weren't caught during development (npm run dev).

Root Cause: - Dev mode uses lenient TypeScript checking and runtime type coercion - Production build enforces strict TypeScript compilation - Errors existed all along but only surfaced during Docker build

Specific Errors Fixed: 1. Async params in Next.js 16: Dynamic route params are now Promise<{ id: string }> instead of sync objects - Fixed in: /api/jobs/[id]/route.ts, /api/jobs/[id]/retry/route.ts, /api/subtitles/[id]/download/route.ts - Solution: Change params type and await them: const { id } = await params

Supabase foreign key joins return arrays: Joins like jobs!inner(user_id) return arrays, not single objects
Fixed in: /app/jobs/page.tsx, /app/jobs/[id]/page.tsx, /api/subtitles/[id]/download/route.ts
Solution: Use !inner hint and handle array access: const video = Array.isArray(job.videos) ? job.videos[0] : job.videos

Prevention: Run npm run build locally before deploying to catch TypeScript errors early.

Issue 2: Missing Environment Variables in Poller¶

Problem: Poller pod crashed on startup with "Connection refused" to Redis.

Root Cause: REDIS_URL environment variable not configured in poller deployment manifest.

Fix: Added to manifests/apps/subtitleai/poller-deployment.yaml:

env:
  - name: REDIS_URL
    valueFrom:
      configMapKeyRef:
        name: subtitleai-config
        key: REDIS_URL

Prevention: Cross-reference all os.getenv() calls in code with deployment manifest env vars.

Issue 3: Docker Registry HTTP vs HTTPS¶

Problem: docker push failed with "server gave HTTP response to HTTPS client".

Root Cause: Private registry at 10.89.97.201:30500 uses HTTP, not HTTPS.

Fix: Configured Docker daemon in LXC 180:

// /etc/docker/daemon.json
{
  "insecure-registries": ["10.89.97.201:30500"]
}

Note: This is already configured in build environments, but new containers need this config.

Issue 4: Silent Poller (Not Actually an Issue)¶

Observation: Poller logs only showed startup messages, no polling activity.

Explanation: Poller is designed to only log when it finds pending jobs. Silent operation is normal when no jobs are pending.

Verification: Manually tested database query - confirmed poller polls every 5s but only logs on job discovery.

Development Workflow¶

Local Development (LXC 180)¶

cd /root/projects/subtitleai
npm run dev                  # Web app on :3000
npx supabase status          # Local Supabase

Production Deployment (K8s Cluster)¶

cd /root/tower-fleet
./scripts/deploy-subtitleai.sh   # Build, push, deploy
kubectl get pods -n subtitleai   # Verify deployment

LXC 181 Status¶

Previous: Ran worker and poller via systemd for testing
Current: Decommissioned - poller service stopped/disabled
Reason: All services now running in k8s production

Next Steps¶

Create Dockerfiles - ✅ DONE (web + worker)
Build and test locally - ✅ DONE (LXC 180)
Create k8s manifests - ✅ DONE (11 manifest files)
Create deployment script - ✅ DONE (deploy-subtitleai.sh)
Deploy to k3s - ✅ DONE (v1.0.0 live at 10.89.97.213)
Test end-to-end - ✅ DONE (upload → transcription → download working)