Skip to content

SubtitleAI Implementation Documentation

Last Updated: 2025-11-24 Version: v1.0.1 (Production) Status: Active Development - Phase 2 Features


Overview

SubtitleAI is an automated subtitle generation system using OpenAI Whisper for speech-to-text transcription. The application supports multi-language transcription, multiple output formats, and advanced styling features.

Production URL: http://subtitles.internal (via Ingress) / http://10.89.97.213 (LoadBalancer) Development: LXC 180 at 10.89.97.161:3000 K8s Namespace: subtitleai


Architecture

High-Level Flow

User Upload → Next.js Web App → Supabase Storage
           Job Created (pending)
           Poller (5s interval)
           Celery Worker → Whisper Transcription
           Output Storage → Database Record
           User Download

Components

  1. Next.js Web App (subtitleai-web)
  2. User interface for upload, configuration, job management
  3. API routes for file upload, job creation, downloads
  4. Supabase auth integration with RLS

  5. Python Worker (subtitleai-worker)

  6. Celery task queue for async processing
  7. OpenAI Whisper integration (base model)
  8. Video download → transcription → upload pipeline

  9. Poller (subtitleai-poller)

  10. Monitors database for pending jobs (5s interval)
  11. Submits jobs to Celery queue via Redis

  12. Redis (subtitleai-redis)

  13. Task queue backend for Celery
  14. StatefulSet with persistent storage

  15. Supabase (K8s shared instance)

  16. PostgreSQL with dedicated subtitleai schema
  17. Authentication with RLS policies
  18. Storage buckets: subtitleai-uploads, subtitleai-outputs

Database Schema

Tables (all in subtitleai schema)

videos

Stores uploaded video metadata.

CREATE TABLE subtitleai.videos (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
  title TEXT NOT NULL,
  file_path TEXT NOT NULL,  -- Storage path (bucket/filename)
  duration INTEGER,
  uploaded_at TIMESTAMPTZ DEFAULT now(),
  source_language TEXT,
  metadata JSONB
);

jobs

Tracks subtitle generation jobs.

CREATE TABLE subtitleai.jobs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  video_id UUID REFERENCES subtitleai.videos(id) ON DELETE CASCADE,
  user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
  status TEXT NOT NULL CHECK (status IN ('pending', 'processing', 'complete', 'failed')),
  progress INTEGER DEFAULT 0,
  config JSONB NOT NULL,  -- Job configuration (languages, styles, etc.)
  celery_task_id TEXT,
  created_at TIMESTAMPTZ DEFAULT now(),
  completed_at TIMESTAMPTZ,
  error TEXT
);

Config Structure:

{
  "sourceLanguage": "en",
  "targetLanguage": "es",  // For dual-language mode
  "styleProfile": "default",  // default, learning, enhanced, accessibility
  "emotions": true,  // Emotion-based styling (not implemented)
  "entities": true,  // Entity recognition (not implemented)
  "dualLanguage": false  // Dual-language subtitle mode
}

transcripts

Stores raw transcription data (currently unused - planned for Phase 2).

CREATE TABLE subtitleai.transcripts (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  job_id UUID REFERENCES subtitleai.jobs(id) ON DELETE CASCADE,
  language TEXT NOT NULL,
  segments JSONB NOT NULL,  -- Whisper segments with timestamps
  entities JSONB,  -- Extracted entities (characters, locations, etc.)
  emotions JSONB,  -- Emotion analysis per segment
  created_at TIMESTAMPTZ DEFAULT now()
);

subtitles

Stores generated subtitle files.

CREATE TABLE subtitleai.subtitles (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  job_id UUID REFERENCES subtitleai.jobs(id) ON DELETE CASCADE,
  format TEXT NOT NULL CHECK (format IN ('ass', 'vtt', 'srt')),
  style_profile TEXT,
  file_path TEXT NOT NULL,  -- Path in subtitleai-outputs bucket
  file_size INTEGER,
  download_count INTEGER DEFAULT 0,
  created_at TIMESTAMPTZ DEFAULT now()
);

style_profiles

User-saved subtitle style presets (not fully implemented).

CREATE TABLE subtitleai.style_profiles (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
  name TEXT NOT NULL,
  description TEXT,
  config JSONB NOT NULL,  -- ASS styling configuration
  is_public BOOLEAN DEFAULT false,
  created_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(user_id, name)
);

Current Implementation (v1.0.1)

Supported Features ✅

  • Video Upload: MP4, MKV, AVI (up to 2GB)
  • Speech-to-Text: OpenAI Whisper (base model)
  • Multi-Language: 12 languages (EN, ES, FR, DE, IT, PT, RU, JA, KO, ZH, AR, HI)
  • Output Format: SRT only
  • Authentication: Supabase Auth with RLS policies
  • Job Queue: Celery + Redis for async processing
  • Monitoring: Prometheus metrics (45 metrics)
  • Production: K8s deployment with Ingress routing

UI Components

  1. Generate Page (/generate)
  2. Video upload with drag-and-drop
  3. Language selection (source language)
  4. Dual-language toggle (UI only - not functional)
  5. Style configurator (4 profiles)
  6. Advanced features checkboxes (emotions, entities)

  7. Jobs Page (/jobs)

  8. Job history with status
  9. Progress tracking
  10. Download completed subtitles

  11. Job Details (/jobs/[id])

  12. Detailed job information
  13. Real-time progress updates
  14. Error messages
  15. Download button

Worker Implementation

File: /root/projects/subtitleai/worker/tasks.py

Process: 1. Update job status to "processing" 2. Fetch job details and video path from database 3. Download video from subtitleai-uploads bucket 4. Run Whisper CLI:

whisper input_video --model base --output_format srt \
  --language <sourceLanguage>
5. Upload generated SRT to subtitleai-outputs bucket 6. Insert subtitle record in database 7. Mark job as "complete"

Limitations: - Only generates SRT format - No dual-language support - No emotion/entity analysis - No transcript storage for reuse - No ASS format support


Phase 2 Features (In Development)

1. ASS Format Support (Current Focus)

Goal: Generate Advanced SubStation Alpha (ASS) subtitles with styling support.

Requirements: - Parse Whisper output to extract timestamps - Generate ASS format with customizable styles - Support style profiles (default, learning, enhanced, accessibility) - Store alongside SRT format (multiple formats per job)

ASS Format Structure:

[Script Info]
Title: Video Title
ScriptType: v4.00+

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, ...
Style: Default,Arial,20,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1

[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,Default,,0,0,0,,Hello world!

2. Dual-Language Subtitles (Current Focus)

Goal: Display source language + translated language simultaneously.

Requirements: - Translate transcription using LLM (OpenAI GPT-4 or Claude) - Generate dual-language SRT/ASS with stacked text - Support learning mode features: - CEFR difficulty highlighting (A1-C2) - Vocabulary extraction - Synchronized bilingual display

Example Output (SRT):

1
00:00:00,000 --> 00:00:05,000
Hello, how are you?
Hola, ¿cómo estás?

2
00:00:05,000 --> 00:00:10,000
I'm doing well, thank you.
Estoy bien, gracias.

Example Output (ASS with Styling):

Dialogue: 0,0:00:00.00,0:00:05.00,Default,,0,0,0,,Hello, how are you?
Dialogue: 0,0:00:00.00,0:00:05.00,Translation,,0,0,0,,Hola, ¿cómo estás?

3. LLM Enhancement Service (High Priority)

Goal: Post-process Whisper output for higher quality.

Features: - Grammar and punctuation cleanup - Translation for dual-language mode - Emotion detection (for color-coding) - Entity recognition (characters, locations) - Context-aware improvements

API Integration: - OpenAI GPT-4o for translation + enhancement - Anthropic Claude for complex analysis - Cost: ~$0.01-0.02 per video

4. Additional Planned Features

  • VTT Format: WebVTT for web players
  • Whisper Model Selection: base / small / medium
  • Bazarr Integration: Media library scanning + batch processing
  • Worker Auto-Scaling: HPA based on queue depth
  • Larger Files: Support >2GB uploads
  • Manual Correction UI: Edit generated subtitles

File Locations

Development (LXC 180)

/root/projects/subtitleai/
├── src/
│   ├── app/              # Next.js pages and API routes
│   ├── components/       # React components
│   └── lib/              # Utilities, Supabase client
├── worker/
│   ├── tasks.py          # Celery task definitions
│   ├── poller.py         # Job polling service
│   └── worker.py         # Celery worker entry point
├── supabase/
│   └── migrations/       # Database migrations
└── Dockerfile            # Multi-stage build (web app)

Production (K8s)

/root/tower-fleet/
├── k8s/subtitleai/
│   ├── deployment-web.yaml
│   ├── deployment-worker.yaml
│   ├── deployment-poller.yaml
│   ├── statefulset-redis.yaml
│   ├── service.yaml
│   └── ingress.yaml
├── scripts/
│   ├── deploy-subtitleai.sh    # Deployment automation
│   └── migrate-app.sh          # Database migrations
└── docs/applications/
    └── subtitleai.md           # Production documentation

Development Workflow

Local Development

# Enter container
pct enter 180

# Start dev server
cd /root/subtitleai
npm run dev  # Port 3000

# Run worker locally (optional)
cd worker
celery -A worker worker --loglevel=info

Making Changes

  1. Make code changes in LXC 180
  2. Test locally with dev server
  3. Commit changes:
    git add -A
    git commit -m "feat: description"
    git push
    
  4. Deploy to K8s:
    cd /root/tower-fleet
    ./scripts/deploy-subtitleai.sh
    

Database Migrations

# Create new migration (from LXC 180)
cd /root/subtitleai
npx supabase migration new <description>

# Edit migration in supabase/migrations/

# Apply to K8s Supabase
/root/tower-fleet/scripts/migrate-app.sh subtitleai

API Endpoints

POST /api/upload

Upload video file to Supabase Storage.

Request: FormData with file field Response: { fullPath: string, filename: string }

POST /api/jobs

Create new subtitle generation job.

Request:

{
  "videoTitle": "video.mp4",
  "videoPath": "user-id/video.mp4",
  "sourceLanguage": "en",
  "config": {
    "targetLanguage": "es",
    "styleProfile": "default",
    "emotions": true,
    "entities": true,
    "dualLanguage": false
  }
}

Response:

{
  "success": true,
  "job": {
    "id": "uuid",
    "video_id": "uuid",
    "status": "pending",
    "progress": 0,
    "created_at": "2025-11-24T00:00:00Z"
  }
}

GET /api/jobs

List user's jobs.

Response:

{
  "jobs": [
    {
      "id": "uuid",
      "status": "complete",
      "progress": 100,
      "created_at": "2025-11-24T00:00:00Z",
      "videos": {
        "title": "video.mp4",
        "source_language": "en"
      }
    }
  ]
}

GET /api/jobs/[id]

Get job details.

POST /api/jobs/[id]/retry

Retry failed job.

GET /api/subtitles/[id]/download

Download generated subtitle file.

GET /api/metrics

Prometheus metrics endpoint (unauthenticated).


Monitoring

Prometheus Metrics

Endpoint: http://10.89.97.213/api/metrics

Key Metrics: - subtitleai_jobs_total{status} - Total jobs by status - subtitleai_videos_uploaded_total - Total videos uploaded - subtitleai_subtitles_downloaded_total - Total downloads - subtitleai_processing_duration_seconds - Job processing time

Logs

# Web app logs
kubectl logs -n subtitleai -l app=subtitleai-web -f

# Worker logs
kubectl logs -n subtitleai -l app=subtitleai-worker -f

# Poller logs
kubectl logs -n subtitleai -l app=subtitleai-poller -f

Health Check

kubectl get pods -n subtitleai
kubectl top pods -n subtitleai

Dependencies

Web App (Next.js)

  • next - ^15.3.0
  • react - ^19.1.0
  • @supabase/supabase-js - ^2.49.2
  • @supabase/ssr - ^0.9.1
  • prom-client - ^15.1.3 (Prometheus metrics)

Worker (Python)

  • celery - Task queue
  • redis - Queue backend
  • openai-whisper - Speech-to-text
  • supabase - Database/storage client

Known Issues

  1. Dual-Language Mode: UI exists but backend not implemented
  2. Emotion/Entity Styling: Checkboxes present but no processing
  3. Style Profiles: Database table exists but not fully utilized
  4. Transcript Storage: Table exists but worker doesn't populate it
  5. Single Format: Only SRT generated, no ASS/VTT yet

Next Steps (Phase 2)

  1. Implement ASS Format Generation ← Current Focus
  2. Add ASS writer function in worker
  3. Parse Whisper segments for timestamps
  4. Apply style profiles to ASS output
  5. Generate both SRT and ASS per job

  6. Implement Dual-Language Feature ← Current Focus

  7. Add translation service (OpenAI API)
  8. Modify worker to translate segments
  9. Generate dual-language SRT/ASS
  10. Update UI to show translation options

  11. Add LLM Enhancement Service

  12. Create enhancement task in worker
  13. Integrate OpenAI/Claude APIs
  14. Add quality scoring
  15. Make it optional (premium feature)

  16. VTT Format Support

  17. Add VTT writer function
  18. Update format selection in UI

  19. Bazarr Integration

  20. API client for Bazarr
  21. Media library scanning
  22. Batch job creation

References

  • Production Docs: /root/tower-fleet/docs/applications/subtitleai.md
  • Deployment Plan: /root/tower-fleet/docs/applications/subtitleai-deployment-plan.md
  • Testing Guide: /root/projects/subtitleai/TESTING.md
  • Worker README: /root/projects/subtitleai/worker/README.md
  • K8s Manifests: /root/tower-fleet/k8s/subtitleai/

Maintained By: Claude Code Last Review: 2025-11-24