Skip to content

Incident: Database Migration Blocked by Idle Transaction

Date: 2025-01-04 Severity: P2 Duration: ~30 minutes Status: Resolved


Summary

Database schema migration (narrative clocks for palimpsest) was blocked by an "idle in transaction" session holding a lock on the rpg.threads table. Multiple stalled kubectl exec processes compounded the issue.


Timeline

Time (UTC) Event
17:37 Migration attempt started via kubectl exec
17:37-17:50 Multiple kubectl exec commands timeout and get backgrounded
17:50 Investigation reveals ~20 stalled kubectl exec processes
17:55 Switched to port-forward approach to diagnose
18:05 Discovered idle transaction (pid 766041) blocking all ALTERs
18:06 Terminated blocking transaction
18:06 All pending ALTERs cascaded through successfully
18:10 Constraints and index added, migration complete

Impact

  • Migration delayed by ~30 minutes
  • No data loss
  • No user-facing impact (palimpsest-api continued serving with old schema)

Root Cause

Two compounding issues:

  1. Idle transaction lock: A previous API session (likely from local development or a crashed connection) left a transaction open with a SELECT on rpg.entities. This acquired a lock that blocked DDL operations on rpg.threads.

  2. kubectl exec stalling: Claude Code's bash tool auto-backgrounds commands taking >1-2 seconds. Each stalled kubectl exec was backgrounded and a new one started, creating ~20 zombie processes all waiting for the same lock.


Resolution

  1. Killed stalled kubectl exec processes: pkill -f "kubectl exec -n supabase-sandbox postgres-0"
  2. Used port-forward instead: kubectl port-forward -n supabase-sandbox svc/postgres 5436:5432
  3. Identified blocking transaction via pg_stat_activity
  4. Terminated blocking session: SELECT pg_terminate_backend(766041);
  5. Applied migration SQL directly via psql

Lessons Learned

  1. kubectl exec is unreliable for long operations: Port-forward + psql is more robust for migrations
  2. Check for locks before DDL: Query pg_stat_activity for blocking sessions before applying migrations
  3. Idle transaction timeout: PostgreSQL should have idle_in_transaction_session_timeout configured
  4. Background process accumulation: Monitor for zombie kubectl processes during troubleshooting

Action Items

Item Owner Status
Add "check for blocking transactions" to migration runbook - Done (this incident)
Configure idle_in_transaction_session_timeout on postgres - Pending
Update database-migrations.md with port-forward preference - Pending
Consider connection pooler (PgBouncer) to prevent orphaned connections - Backlog

Commands Reference

# Check for blocking transactions
PGPASSWORD=postgres psql -h 127.0.0.1 -p 5436 -U postgres -d postgres -c \
  "SELECT pid, state, query FROM pg_stat_activity WHERE state <> 'idle' ORDER BY query_start;"

# Terminate blocking session
PGPASSWORD=postgres psql -h 127.0.0.1 -p 5436 -U postgres -d postgres -c \
  "SELECT pg_terminate_backend(<pid>);"

# Check for stale kubectl exec processes
ps aux | grep "kubectl exec" | head -10

# Kill stale kubectl exec processes
pkill -f "kubectl exec -n supabase-sandbox postgres-0"