Incident: Database Migration Blocked by Idle Transaction¶
Date: 2025-01-04 Severity: P2 Duration: ~30 minutes Status: Resolved
Summary¶
Database schema migration (narrative clocks for palimpsest) was blocked by an "idle in transaction" session holding a lock on the rpg.threads table. Multiple stalled kubectl exec processes compounded the issue.
Timeline¶
| Time (UTC) | Event |
|---|---|
| 17:37 | Migration attempt started via kubectl exec |
| 17:37-17:50 | Multiple kubectl exec commands timeout and get backgrounded |
| 17:50 | Investigation reveals ~20 stalled kubectl exec processes |
| 17:55 | Switched to port-forward approach to diagnose |
| 18:05 | Discovered idle transaction (pid 766041) blocking all ALTERs |
| 18:06 | Terminated blocking transaction |
| 18:06 | All pending ALTERs cascaded through successfully |
| 18:10 | Constraints and index added, migration complete |
Impact¶
- Migration delayed by ~30 minutes
- No data loss
- No user-facing impact (palimpsest-api continued serving with old schema)
Root Cause¶
Two compounding issues:
-
Idle transaction lock: A previous API session (likely from local development or a crashed connection) left a transaction open with a SELECT on
rpg.entities. This acquired a lock that blocked DDL operations onrpg.threads. -
kubectl exec stalling: Claude Code's bash tool auto-backgrounds commands taking >1-2 seconds. Each stalled
kubectl execwas backgrounded and a new one started, creating ~20 zombie processes all waiting for the same lock.
Resolution¶
- Killed stalled
kubectl execprocesses:pkill -f "kubectl exec -n supabase-sandbox postgres-0" - Used port-forward instead:
kubectl port-forward -n supabase-sandbox svc/postgres 5436:5432 - Identified blocking transaction via
pg_stat_activity - Terminated blocking session:
SELECT pg_terminate_backend(766041); - Applied migration SQL directly via psql
Lessons Learned¶
- kubectl exec is unreliable for long operations: Port-forward + psql is more robust for migrations
- Check for locks before DDL: Query
pg_stat_activityfor blocking sessions before applying migrations - Idle transaction timeout: PostgreSQL should have
idle_in_transaction_session_timeoutconfigured - Background process accumulation: Monitor for zombie kubectl processes during troubleshooting
Action Items¶
| Item | Owner | Status |
|---|---|---|
| Add "check for blocking transactions" to migration runbook | - | Done (this incident) |
Configure idle_in_transaction_session_timeout on postgres |
- | Pending |
| Update database-migrations.md with port-forward preference | - | Pending |
| Consider connection pooler (PgBouncer) to prevent orphaned connections | - | Backlog |
Commands Reference¶
# Check for blocking transactions
PGPASSWORD=postgres psql -h 127.0.0.1 -p 5436 -U postgres -d postgres -c \
"SELECT pid, state, query FROM pg_stat_activity WHERE state <> 'idle' ORDER BY query_start;"
# Terminate blocking session
PGPASSWORD=postgres psql -h 127.0.0.1 -p 5436 -U postgres -d postgres -c \
"SELECT pg_terminate_backend(<pid>);"
# Check for stale kubectl exec processes
ps aux | grep "kubectl exec" | head -10
# Kill stale kubectl exec processes
pkill -f "kubectl exec -n supabase-sandbox postgres-0"