SubtitleAI Implementation Documentation¶

Last Updated: 2025-11-24 Version: v1.0.1 (Production) Status: Active Development - Phase 2 Features

Overview¶

SubtitleAI is an automated subtitle generation system using OpenAI Whisper for speech-to-text transcription. The application supports multi-language transcription, multiple output formats, and advanced styling features.

Production URL: http://subtitles.internal (via Ingress) / http://10.89.97.213 (LoadBalancer) Development: LXC 180 at 10.89.97.161:3000 K8s Namespace: subtitleai

Architecture¶

High-Level Flow¶

User Upload → Next.js Web App → Supabase Storage
                  ↓
           Job Created (pending)
                  ↓
           Poller (5s interval)
                  ↓
           Celery Worker → Whisper Transcription
                  ↓
           Output Storage → Database Record
                  ↓
           User Download

Components¶

Next.js Web App (subtitleai-web)
User interface for upload, configuration, job management
API routes for file upload, job creation, downloads
Supabase auth integration with RLS
Python Worker (subtitleai-worker)
Celery task queue for async processing
OpenAI Whisper integration (base model)
Video download → transcription → upload pipeline
Poller (subtitleai-poller)
Monitors database for pending jobs (5s interval)
Submits jobs to Celery queue via Redis
Redis (subtitleai-redis)
Task queue backend for Celery
StatefulSet with persistent storage
Supabase (K8s shared instance)
PostgreSQL with dedicated subtitleai schema
Authentication with RLS policies
Storage buckets: subtitleai-uploads, subtitleai-outputs

Database Schema¶

Tables (all in `subtitleai` schema)¶

`videos`¶

Stores uploaded video metadata.

CREATE TABLE subtitleai.videos (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
  title TEXT NOT NULL,
  file_path TEXT NOT NULL,  -- Storage path (bucket/filename)
  duration INTEGER,
  uploaded_at TIMESTAMPTZ DEFAULT now(),
  source_language TEXT,
  metadata JSONB
);

`jobs`¶

Tracks subtitle generation jobs.

CREATE TABLE subtitleai.jobs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  video_id UUID REFERENCES subtitleai.videos(id) ON DELETE CASCADE,
  user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
  status TEXT NOT NULL CHECK (status IN ('pending', 'processing', 'complete', 'failed')),
  progress INTEGER DEFAULT 0,
  config JSONB NOT NULL,  -- Job configuration (languages, styles, etc.)
  celery_task_id TEXT,
  created_at TIMESTAMPTZ DEFAULT now(),
  completed_at TIMESTAMPTZ,
  error TEXT
);

Config Structure:

{
  "sourceLanguage": "en",
  "targetLanguage": "es",  // For dual-language mode
  "styleProfile": "default",  // default, learning, enhanced, accessibility
  "emotions": true,  // Emotion-based styling (not implemented)
  "entities": true,  // Entity recognition (not implemented)
  "dualLanguage": false  // Dual-language subtitle mode
}

`transcripts`¶

Stores raw transcription data (currently unused - planned for Phase 2).

CREATE TABLE subtitleai.transcripts (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  job_id UUID REFERENCES subtitleai.jobs(id) ON DELETE CASCADE,
  language TEXT NOT NULL,
  segments JSONB NOT NULL,  -- Whisper segments with timestamps
  entities JSONB,  -- Extracted entities (characters, locations, etc.)
  emotions JSONB,  -- Emotion analysis per segment
  created_at TIMESTAMPTZ DEFAULT now()
);

`subtitles`¶

Stores generated subtitle files.

CREATE TABLE subtitleai.subtitles (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  job_id UUID REFERENCES subtitleai.jobs(id) ON DELETE CASCADE,
  format TEXT NOT NULL CHECK (format IN ('ass', 'vtt', 'srt')),
  style_profile TEXT,
  file_path TEXT NOT NULL,  -- Path in subtitleai-outputs bucket
  file_size INTEGER,
  download_count INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT now()
);

`style_profiles`¶

User-saved subtitle style presets (not fully implemented).

CREATE TABLE subtitleai.style_profiles (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
  name TEXT NOT NULL,
  description TEXT,
  config JSONB NOT NULL,  -- ASS styling configuration
  is_public BOOLEAN DEFAULT false,
  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(user_id, name)
);

Current Implementation (v1.0.1)¶

Supported Features ✅¶

Video Upload: MP4, MKV, AVI (up to 2GB)
Speech-to-Text: OpenAI Whisper (base model)
Multi-Language: 12 languages (EN, ES, FR, DE, IT, PT, RU, JA, KO, ZH, AR, HI)
Output Format: SRT only
Authentication: Supabase Auth with RLS policies
Job Queue: Celery + Redis for async processing
Monitoring: Prometheus metrics (45 metrics)
Production: K8s deployment with Ingress routing

UI Components¶

Generate Page (/generate)
Video upload with drag-and-drop
Language selection (source language)
Dual-language toggle (UI only - not functional)
Style configurator (4 profiles)
Advanced features checkboxes (emotions, entities)
Jobs Page (/jobs)
Job history with status
Progress tracking
Download completed subtitles
Job Details (/jobs/[id])
Detailed job information
Real-time progress updates
Error messages
Download button

Worker Implementation¶

File: /root/projects/subtitleai/worker/tasks.py

Process: 1. Update job status to "processing" 2. Fetch job details and video path from database 3. Download video from subtitleai-uploads bucket 4. Run Whisper CLI:

whisper input_video --model base --output_format srt \
  --language <sourceLanguage>

5. Upload generated SRT to subtitleai-outputs bucket 6. Insert subtitle record in database 7. Mark job as "complete"

Limitations: - Only generates SRT format - No dual-language support - No emotion/entity analysis - No transcript storage for reuse - No ASS format support

Phase 2 Features (In Development)¶

1. ASS Format Support (Current Focus)¶

Goal: Generate Advanced SubStation Alpha (ASS) subtitles with styling support.

Requirements: - Parse Whisper output to extract timestamps - Generate ASS format with customizable styles - Support style profiles (default, learning, enhanced, accessibility) - Store alongside SRT format (multiple formats per job)

ASS Format Structure:

[Script Info]
Title: Video Title
ScriptType: v4.00+

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, ...
Style: Default,Arial,20,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,Default,,0,0,0,,Hello world!

2. Dual-Language Subtitles (Current Focus)¶

Goal: Display source language + translated language simultaneously.

Requirements: - Translate transcription using LLM (OpenAI GPT-4 or Claude) - Generate dual-language SRT/ASS with stacked text - Support learning mode features: - CEFR difficulty highlighting (A1-C2) - Vocabulary extraction - Synchronized bilingual display

Example Output (SRT):

1
00:00:00,000 --> 00:00:05,000
Hello, how are you?
Hola, ¿cómo estás?

2
00:00:05,000 --> 00:00:10,000
I'm doing well, thank you.
Estoy bien, gracias.

Example Output (ASS with Styling):

Dialogue: 0,0:00:00.00,0:00:05.00,Default,,0,0,0,,Hello, how are you?
Dialogue: 0,0:00:00.00,0:00:05.00,Translation,,0,0,0,,Hola, ¿cómo estás?

3. LLM Enhancement Service (High Priority)¶

Goal: Post-process Whisper output for higher quality.

Features: - Grammar and punctuation cleanup - Translation for dual-language mode - Emotion detection (for color-coding) - Entity recognition (characters, locations) - Context-aware improvements

API Integration: - OpenAI GPT-4o for translation + enhancement - Anthropic Claude for complex analysis - Cost: ~$0.01-0.02 per video

4. Additional Planned Features¶

VTT Format: WebVTT for web players
Whisper Model Selection: base / small / medium
Bazarr Integration: Media library scanning + batch processing
Worker Auto-Scaling: HPA based on queue depth
Larger Files: Support >2GB uploads
Manual Correction UI: Edit generated subtitles

File Locations¶

Development (LXC 180)¶

/root/projects/subtitleai/
├── src/
│   ├── app/              # Next.js pages and API routes
│   ├── components/       # React components
│   └── lib/              # Utilities, Supabase client
├── worker/
│   ├── tasks.py          # Celery task definitions
│   ├── poller.py         # Job polling service
│   └── worker.py         # Celery worker entry point
├── supabase/
│   └── migrations/       # Database migrations
└── Dockerfile            # Multi-stage build (web app)

Production (K8s)¶

/root/tower-fleet/
├── k8s/subtitleai/
│   ├── deployment-web.yaml
│   ├── deployment-worker.yaml
│   ├── deployment-poller.yaml
│   ├── statefulset-redis.yaml
│   ├── service.yaml
│   └── ingress.yaml
├── scripts/
│   ├── deploy-subtitleai.sh    # Deployment automation
│   └── migrate-app.sh          # Database migrations
└── docs/applications/
    └── subtitleai.md           # Production documentation

Development Workflow¶

Local Development¶

# Enter container
pct enter 180

# Start dev server
cd /root/subtitleai
npm run dev  # Port 3000

# Run worker locally (optional)
cd worker
celery -A worker worker --loglevel=info

Making Changes¶

Make code changes in LXC 180
Test locally with dev server

Commit changes:

git add -A
git commit -m "feat: description"
git push

Deploy to K8s:

cd /root/tower-fleet
./scripts/deploy-subtitleai.sh

Database Migrations¶

# Create new migration (from LXC 180)
cd /root/subtitleai
npx supabase migration new <description>

# Edit migration in supabase/migrations/

# Apply to K8s Supabase
/root/tower-fleet/scripts/migrate-app.sh subtitleai

API Endpoints¶

POST /api/upload¶

Upload video file to Supabase Storage.

Request: FormData with file field Response: { fullPath: string, filename: string }

POST /api/jobs¶

Create new subtitle generation job.

Request:

{
  "videoTitle": "video.mp4",
  "videoPath": "user-id/video.mp4",
  "sourceLanguage": "en",
  "config": {
    "targetLanguage": "es",
    "styleProfile": "default",
    "emotions": true,
    "entities": true,
    "dualLanguage": false
  }
}

Response:

{
  "success": true,
  "job": {
    "id": "uuid",
    "video_id": "uuid",
    "status": "pending",
    "progress": 0,
    "created_at": "2025-11-24T00:00:00Z"
  }
}

GET /api/jobs¶

List user's jobs.

Response:

{
  "jobs": [
    {
      "id": "uuid",
      "status": "complete",
      "progress": 100,
      "created_at": "2025-11-24T00:00:00Z",
      "videos": {
        "title": "video.mp4",
        "source_language": "en"
      }
    }
  ]
}

GET /api/jobs/[id]¶

Get job details.

POST /api/jobs/[id]/retry¶

Retry failed job.

GET /api/subtitles/[id]/download¶

Download generated subtitle file.

GET /api/metrics¶

Prometheus metrics endpoint (unauthenticated).

Monitoring¶

Prometheus Metrics¶

Endpoint: http://10.89.97.213/api/metrics

Key Metrics: - subtitleai_jobs_total{status} - Total jobs by status - subtitleai_videos_uploaded_total - Total videos uploaded - subtitleai_subtitles_downloaded_total - Total downloads - subtitleai_processing_duration_seconds - Job processing time

Logs¶

# Web app logs
kubectl logs -n subtitleai -l app=subtitleai-web -f

# Worker logs
kubectl logs -n subtitleai -l app=subtitleai-worker -f

# Poller logs
kubectl logs -n subtitleai -l app=subtitleai-poller -f

Health Check¶

kubectl get pods -n subtitleai
kubectl top pods -n subtitleai

Dependencies¶

Web App (Next.js)¶

next - ^15.3.0
react - ^19.1.0
@supabase/supabase-js - ^2.49.2
@supabase/ssr - ^0.9.1
prom-client - ^15.1.3 (Prometheus metrics)

Worker (Python)¶

celery - Task queue
redis - Queue backend
openai-whisper - Speech-to-text
supabase - Database/storage client

Known Issues¶

Dual-Language Mode: UI exists but backend not implemented
Emotion/Entity Styling: Checkboxes present but no processing
Style Profiles: Database table exists but not fully utilized
Transcript Storage: Table exists but worker doesn't populate it
Single Format: Only SRT generated, no ASS/VTT yet

Next Steps (Phase 2)¶

Implement ASS Format Generation ← Current Focus
Add ASS writer function in worker
Parse Whisper segments for timestamps
Apply style profiles to ASS output
Generate both SRT and ASS per job
Implement Dual-Language Feature ← Current Focus
Add translation service (OpenAI API)
Modify worker to translate segments
Generate dual-language SRT/ASS
Update UI to show translation options
Add LLM Enhancement Service
Create enhancement task in worker
Integrate OpenAI/Claude APIs
Add quality scoring
Make it optional (premium feature)
VTT Format Support
Add VTT writer function
Update format selection in UI
Bazarr Integration
API client for Bazarr
Media library scanning
Batch job creation

References¶

Production Docs: /root/tower-fleet/docs/applications/subtitleai.md
Deployment Plan: /root/tower-fleet/docs/applications/subtitleai-deployment-plan.md
Testing Guide: /root/projects/subtitleai/TESTING.md
Worker README: /root/projects/subtitleai/worker/README.md
K8s Manifests: /root/tower-fleet/k8s/subtitleai/

Maintained By: Claude Code Last Review: 2025-11-24