SubtitleAI Implementation Documentation¶
Last Updated: 2025-11-24 Version: v1.0.1 (Production) Status: Active Development - Phase 2 Features
Overview¶
SubtitleAI is an automated subtitle generation system using OpenAI Whisper for speech-to-text transcription. The application supports multi-language transcription, multiple output formats, and advanced styling features.
Production URL: http://subtitles.internal (via Ingress) / http://10.89.97.213 (LoadBalancer)
Development: LXC 180 at 10.89.97.161:3000
K8s Namespace: subtitleai
Architecture¶
High-Level Flow¶
User Upload → Next.js Web App → Supabase Storage
↓
Job Created (pending)
↓
Poller (5s interval)
↓
Celery Worker → Whisper Transcription
↓
Output Storage → Database Record
↓
User Download
Components¶
- Next.js Web App (subtitleai-web)
- User interface for upload, configuration, job management
- API routes for file upload, job creation, downloads
-
Supabase auth integration with RLS
-
Python Worker (subtitleai-worker)
- Celery task queue for async processing
- OpenAI Whisper integration (base model)
-
Video download → transcription → upload pipeline
-
Poller (subtitleai-poller)
- Monitors database for pending jobs (5s interval)
-
Submits jobs to Celery queue via Redis
-
Redis (subtitleai-redis)
- Task queue backend for Celery
-
StatefulSet with persistent storage
-
Supabase (K8s shared instance)
- PostgreSQL with dedicated
subtitleaischema - Authentication with RLS policies
- Storage buckets:
subtitleai-uploads,subtitleai-outputs
Database Schema¶
Tables (all in subtitleai schema)¶
videos¶
Stores uploaded video metadata.
CREATE TABLE subtitleai.videos (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
title TEXT NOT NULL,
file_path TEXT NOT NULL, -- Storage path (bucket/filename)
duration INTEGER,
uploaded_at TIMESTAMPTZ DEFAULT now(),
source_language TEXT,
metadata JSONB
);
jobs¶
Tracks subtitle generation jobs.
CREATE TABLE subtitleai.jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
video_id UUID REFERENCES subtitleai.videos(id) ON DELETE CASCADE,
user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
status TEXT NOT NULL CHECK (status IN ('pending', 'processing', 'complete', 'failed')),
progress INTEGER DEFAULT 0,
config JSONB NOT NULL, -- Job configuration (languages, styles, etc.)
celery_task_id TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
completed_at TIMESTAMPTZ,
error TEXT
);
Config Structure:
{
"sourceLanguage": "en",
"targetLanguage": "es", // For dual-language mode
"styleProfile": "default", // default, learning, enhanced, accessibility
"emotions": true, // Emotion-based styling (not implemented)
"entities": true, // Entity recognition (not implemented)
"dualLanguage": false // Dual-language subtitle mode
}
transcripts¶
Stores raw transcription data (currently unused - planned for Phase 2).
CREATE TABLE subtitleai.transcripts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_id UUID REFERENCES subtitleai.jobs(id) ON DELETE CASCADE,
language TEXT NOT NULL,
segments JSONB NOT NULL, -- Whisper segments with timestamps
entities JSONB, -- Extracted entities (characters, locations, etc.)
emotions JSONB, -- Emotion analysis per segment
created_at TIMESTAMPTZ DEFAULT now()
);
subtitles¶
Stores generated subtitle files.
CREATE TABLE subtitleai.subtitles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_id UUID REFERENCES subtitleai.jobs(id) ON DELETE CASCADE,
format TEXT NOT NULL CHECK (format IN ('ass', 'vtt', 'srt')),
style_profile TEXT,
file_path TEXT NOT NULL, -- Path in subtitleai-outputs bucket
file_size INTEGER,
download_count INTEGER DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT now()
);
style_profiles¶
User-saved subtitle style presets (not fully implemented).
CREATE TABLE subtitleai.style_profiles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE,
name TEXT NOT NULL,
description TEXT,
config JSONB NOT NULL, -- ASS styling configuration
is_public BOOLEAN DEFAULT false,
created_at TIMESTAMPTZ DEFAULT now(),
UNIQUE(user_id, name)
);
Current Implementation (v1.0.1)¶
Supported Features ✅¶
- Video Upload: MP4, MKV, AVI (up to 2GB)
- Speech-to-Text: OpenAI Whisper (base model)
- Multi-Language: 12 languages (EN, ES, FR, DE, IT, PT, RU, JA, KO, ZH, AR, HI)
- Output Format: SRT only
- Authentication: Supabase Auth with RLS policies
- Job Queue: Celery + Redis for async processing
- Monitoring: Prometheus metrics (45 metrics)
- Production: K8s deployment with Ingress routing
UI Components¶
- Generate Page (
/generate) - Video upload with drag-and-drop
- Language selection (source language)
- Dual-language toggle (UI only - not functional)
- Style configurator (4 profiles)
-
Advanced features checkboxes (emotions, entities)
-
Jobs Page (
/jobs) - Job history with status
- Progress tracking
-
Download completed subtitles
-
Job Details (
/jobs/[id]) - Detailed job information
- Real-time progress updates
- Error messages
- Download button
Worker Implementation¶
File: /root/projects/subtitleai/worker/tasks.py
Process:
1. Update job status to "processing"
2. Fetch job details and video path from database
3. Download video from subtitleai-uploads bucket
4. Run Whisper CLI:
subtitleai-outputs bucket
6. Insert subtitle record in database
7. Mark job as "complete"
Limitations: - Only generates SRT format - No dual-language support - No emotion/entity analysis - No transcript storage for reuse - No ASS format support
Phase 2 Features (In Development)¶
1. ASS Format Support (Current Focus)¶
Goal: Generate Advanced SubStation Alpha (ASS) subtitles with styling support.
Requirements: - Parse Whisper output to extract timestamps - Generate ASS format with customizable styles - Support style profiles (default, learning, enhanced, accessibility) - Store alongside SRT format (multiple formats per job)
ASS Format Structure:
[Script Info]
Title: Video Title
ScriptType: v4.00+
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, ...
Style: Default,Arial,20,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,2,10,10,10,1
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.00,Default,,0,0,0,,Hello world!
2. Dual-Language Subtitles (Current Focus)¶
Goal: Display source language + translated language simultaneously.
Requirements: - Translate transcription using LLM (OpenAI GPT-4 or Claude) - Generate dual-language SRT/ASS with stacked text - Support learning mode features: - CEFR difficulty highlighting (A1-C2) - Vocabulary extraction - Synchronized bilingual display
Example Output (SRT):
1
00:00:00,000 --> 00:00:05,000
Hello, how are you?
Hola, ¿cómo estás?
2
00:00:05,000 --> 00:00:10,000
I'm doing well, thank you.
Estoy bien, gracias.
Example Output (ASS with Styling):
Dialogue: 0,0:00:00.00,0:00:05.00,Default,,0,0,0,,Hello, how are you?
Dialogue: 0,0:00:00.00,0:00:05.00,Translation,,0,0,0,,Hola, ¿cómo estás?
3. LLM Enhancement Service (High Priority)¶
Goal: Post-process Whisper output for higher quality.
Features: - Grammar and punctuation cleanup - Translation for dual-language mode - Emotion detection (for color-coding) - Entity recognition (characters, locations) - Context-aware improvements
API Integration: - OpenAI GPT-4o for translation + enhancement - Anthropic Claude for complex analysis - Cost: ~$0.01-0.02 per video
4. Additional Planned Features¶
- VTT Format: WebVTT for web players
- Whisper Model Selection: base / small / medium
- Bazarr Integration: Media library scanning + batch processing
- Worker Auto-Scaling: HPA based on queue depth
- Larger Files: Support >2GB uploads
- Manual Correction UI: Edit generated subtitles
File Locations¶
Development (LXC 180)¶
/root/projects/subtitleai/
├── src/
│ ├── app/ # Next.js pages and API routes
│ ├── components/ # React components
│ └── lib/ # Utilities, Supabase client
├── worker/
│ ├── tasks.py # Celery task definitions
│ ├── poller.py # Job polling service
│ └── worker.py # Celery worker entry point
├── supabase/
│ └── migrations/ # Database migrations
└── Dockerfile # Multi-stage build (web app)
Production (K8s)¶
/root/tower-fleet/
├── k8s/subtitleai/
│ ├── deployment-web.yaml
│ ├── deployment-worker.yaml
│ ├── deployment-poller.yaml
│ ├── statefulset-redis.yaml
│ ├── service.yaml
│ └── ingress.yaml
├── scripts/
│ ├── deploy-subtitleai.sh # Deployment automation
│ └── migrate-app.sh # Database migrations
└── docs/applications/
└── subtitleai.md # Production documentation
Development Workflow¶
Local Development¶
# Enter container
pct enter 180
# Start dev server
cd /root/subtitleai
npm run dev # Port 3000
# Run worker locally (optional)
cd worker
celery -A worker worker --loglevel=info
Making Changes¶
- Make code changes in LXC 180
- Test locally with dev server
- Commit changes:
- Deploy to K8s:
Database Migrations¶
# Create new migration (from LXC 180)
cd /root/subtitleai
npx supabase migration new <description>
# Edit migration in supabase/migrations/
# Apply to K8s Supabase
/root/tower-fleet/scripts/migrate-app.sh subtitleai
API Endpoints¶
POST /api/upload¶
Upload video file to Supabase Storage.
Request: FormData with file field
Response: { fullPath: string, filename: string }
POST /api/jobs¶
Create new subtitle generation job.
Request:
{
"videoTitle": "video.mp4",
"videoPath": "user-id/video.mp4",
"sourceLanguage": "en",
"config": {
"targetLanguage": "es",
"styleProfile": "default",
"emotions": true,
"entities": true,
"dualLanguage": false
}
}
Response:
{
"success": true,
"job": {
"id": "uuid",
"video_id": "uuid",
"status": "pending",
"progress": 0,
"created_at": "2025-11-24T00:00:00Z"
}
}
GET /api/jobs¶
List user's jobs.
Response:
{
"jobs": [
{
"id": "uuid",
"status": "complete",
"progress": 100,
"created_at": "2025-11-24T00:00:00Z",
"videos": {
"title": "video.mp4",
"source_language": "en"
}
}
]
}
GET /api/jobs/[id]¶
Get job details.
POST /api/jobs/[id]/retry¶
Retry failed job.
GET /api/subtitles/[id]/download¶
Download generated subtitle file.
GET /api/metrics¶
Prometheus metrics endpoint (unauthenticated).
Monitoring¶
Prometheus Metrics¶
Endpoint: http://10.89.97.213/api/metrics
Key Metrics:
- subtitleai_jobs_total{status} - Total jobs by status
- subtitleai_videos_uploaded_total - Total videos uploaded
- subtitleai_subtitles_downloaded_total - Total downloads
- subtitleai_processing_duration_seconds - Job processing time
Logs¶
# Web app logs
kubectl logs -n subtitleai -l app=subtitleai-web -f
# Worker logs
kubectl logs -n subtitleai -l app=subtitleai-worker -f
# Poller logs
kubectl logs -n subtitleai -l app=subtitleai-poller -f
Health Check¶
Dependencies¶
Web App (Next.js)¶
next- ^15.3.0react- ^19.1.0@supabase/supabase-js- ^2.49.2@supabase/ssr- ^0.9.1prom-client- ^15.1.3 (Prometheus metrics)
Worker (Python)¶
celery- Task queueredis- Queue backendopenai-whisper- Speech-to-textsupabase- Database/storage client
Known Issues¶
- Dual-Language Mode: UI exists but backend not implemented
- Emotion/Entity Styling: Checkboxes present but no processing
- Style Profiles: Database table exists but not fully utilized
- Transcript Storage: Table exists but worker doesn't populate it
- Single Format: Only SRT generated, no ASS/VTT yet
Next Steps (Phase 2)¶
- Implement ASS Format Generation ← Current Focus
- Add ASS writer function in worker
- Parse Whisper segments for timestamps
- Apply style profiles to ASS output
-
Generate both SRT and ASS per job
-
Implement Dual-Language Feature ← Current Focus
- Add translation service (OpenAI API)
- Modify worker to translate segments
- Generate dual-language SRT/ASS
-
Update UI to show translation options
-
Add LLM Enhancement Service
- Create enhancement task in worker
- Integrate OpenAI/Claude APIs
- Add quality scoring
-
Make it optional (premium feature)
-
VTT Format Support
- Add VTT writer function
-
Update format selection in UI
-
Bazarr Integration
- API client for Bazarr
- Media library scanning
- Batch job creation
References¶
- Production Docs:
/root/tower-fleet/docs/applications/subtitleai.md - Deployment Plan:
/root/tower-fleet/docs/applications/subtitleai-deployment-plan.md - Testing Guide:
/root/projects/subtitleai/TESTING.md - Worker README:
/root/projects/subtitleai/worker/README.md - K8s Manifests:
/root/tower-fleet/k8s/subtitleai/
Maintained By: Claude Code Last Review: 2025-11-24