Supabase Sandbox Deployment¶

Purpose: Deploy a complete Supabase sandbox environment for staging, backup testing, and safe experimentation.

Last Updated: 2025-12-03 Validated: Full deployment tested end-to-end

Overview¶

The Supabase sandbox is a complete replica of production running in the same Kubernetes cluster but in an isolated namespace. It enables:

Staging environment: Test risky schema changes before production
Backup validation: Verify backups can be restored successfully
Safe experimentation: Break things without affecting production
Disaster recovery practice: Rehearse complete recovery procedures

Quick Reference¶

Task	Command	Time
Deploy sandbox	`/root/scripts/deploy-sandbox-complete.sh`	2-3 min
Refresh with latest prod data	`/root/scripts/deploy-sandbox-complete.sh --destroy-first`	3-4 min
Destroy sandbox	`kubectl delete namespace supabase-sandbox`	1 min
Access Kong API	`kubectl port-forward -n supabase-sandbox svc/kong 8000:8000`	-
Access Studio	`kubectl port-forward -n supabase-sandbox svc/studio 3000:3000`	-
Check status	`kubectl get pods -n supabase-sandbox`	-

Jump to: - Initial Deployment - First time setup - Sandbox Refresh - Sync with production data - Using the Sandbox - Connect apps and test changes - Troubleshooting - Fix common problems

Quick Start (Recommended)¶

One-command deployment from production backup:

/root/scripts/deploy-sandbox-complete.sh

This script: 1. Deploys all Supabase components to supabase-sandbox namespace 2. Waits for PostgreSQL to be ready 3. Restores latest production backup 4. Fixes migration tracking tables 5. Restarts services 6. Validates deployment

Expected time: 2-3 minutes (optimized restore using kubectl cp + psql inside pod)

Architecture¶

Components Deployed¶

Component	Purpose	Resources
PostgreSQL	Database (StatefulSet)	100m CPU, 256Mi RAM, 10Gi storage
GoTrue	Authentication	50m CPU, 64Mi RAM
PostgREST	REST API	50m CPU, 64Mi RAM
Storage API	File storage	50m CPU, 64Mi RAM
postgres-meta	DB introspection	50m CPU, 128Mi RAM
Kong	API Gateway	100m CPU, 256Mi RAM ⚠️
Studio	Dashboard	50m CPU, 128Mi RAM

Total: ~500m CPU, ~1.5Gi RAM

⚠️ Critical: Kong requires minimum 512Mi memory limit or it will be OOMKilled.

Network Configuration¶

Services: All ClusterIP (internal-only)

Access via port-forward:

# Kong API
kubectl port-forward -n supabase-sandbox svc/kong 8000:8000

# Studio Dashboard
kubectl port-forward -n supabase-sandbox svc/studio 3000:3000

Why ClusterIP? - MetalLB pool typically exhausted (production uses 10.89.97.210-220) - Sandbox doesn't need external access - Port-forwarding works fine for testing

Manual Deployment (Step-by-Step)¶

Prerequisites¶

Check cluster capacity:

kubectl top nodes
# Need: ~500m CPU, ~1.5Gi RAM available

Check if sandbox exists:

kubectl get namespace supabase-sandbox
# If exists: kubectl delete namespace supabase-sandbox

Step 1: Generate JWT Secrets¶

/root/scripts/generate-sandbox-secrets.sh

Creates: - /root/tower-fleet/manifests/supabase-sandbox/secrets.yaml - /root/.supabase-sandbox-secrets.txt (plaintext reference)

Step 2: Deploy Sandbox¶

/root/scripts/deploy-supabase-sandbox.sh --deploy

This deploys all manifests in correct order: 1. Namespace, ConfigMap, Secrets 2. PostgreSQL init scripts (creates schemas and roles automatically) 3. PostgreSQL StatefulSet 4. Dependent services (GoTrue, PostgREST, Storage, Kong, Studio)

Wait for deployment:

kubectl get pods -n supabase-sandbox -w
# All pods should reach Running (except Storage/Kong may restart once)

Step 3: Restore Production Data¶

/root/scripts/sync-production-to-sandbox.sh --latest

This: - Decrypts latest production backup - Restores to sandbox PostgreSQL - Verifies schemas and table counts

⚠️ Known Issue: Storage migrations will fail after restore.

Step 4: Fix Migration Tracking¶

Problem: Production backup includes migrated data but not all migration tracking records.

Solution:

# Copy auth migrations
kubectl exec -n supabase postgres-0 -- \
  psql -U postgres -d postgres -c "COPY (SELECT * FROM auth.schema_migrations) TO STDOUT;" | \
kubectl exec -i -n supabase-sandbox postgres-0 -- \
  psql -U postgres -d postgres -c "COPY auth.schema_migrations FROM STDIN;"

# Copy storage migrations
kubectl exec -n supabase postgres-0 -- \
  psql -U postgres -d postgres -c "COPY (SELECT * FROM storage.migrations WHERE id >= 2) TO STDOUT;" | \
kubectl exec -i -n supabase-sandbox postgres-0 -- \
  psql -U postgres -d postgres -c "COPY storage.migrations FROM STDIN;"

Step 5: Restart Services¶

kubectl delete pod -n supabase-sandbox -l app=storage
kubectl delete pod -n supabase-sandbox -l app=kong

Wait for pods to become Running (20-30 seconds).

Step 6: Validate Deployment¶

# Check all pods running
kubectl get pods -n supabase-sandbox
# All should be Running (7 pods total)

# Test Kong API
kubectl port-forward -n supabase-sandbox svc/kong 8000:8000 &
curl http://localhost:8000/
# Should return Kong response

# Access Studio
kubectl port-forward -n supabase-sandbox svc/studio 3000:3000
# Open http://localhost:3000 in browser

Common Issues & Solutions¶

Issue 1: Storage Pod CrashLoopBackOff¶

Symptom:

Error: relation "storage.objects" does not exist

Cause: Storage migrations trying to run before tables exist.

Solution: Ensure Step 3 (restore production data) completed before storage starts.

Issue 2: Storage Migration "Column Already Exists"¶

Symptom:

Error: column "path_tokens" of relation "objects" already exists

Cause: Production backup restored data but not migration tracking.

Solution: Run Step 4 (fix migration tracking).

Issue 3: Kong Pod OOMKilled¶

Symptom:

kubectl get pods -n supabase-sandbox
# kong pod shows CrashLoopBackOff

kubectl describe pod -n supabase-sandbox -l app=kong | grep OOMKilled
# Shows: reason: OOMKilled

Cause: Kong memory limit too low (256Mi insufficient).

Solution: Kong manifest should have:

resources:
  limits:
    memory: 512Mi  # Not 256Mi!

Issue 4: LoadBalancer IPs Pending¶

Symptom:

kubectl get svc -n supabase-sandbox kong studio
# EXTERNAL-IP shows <pending>

Cause: MetalLB IP pool exhausted.

Check pool status:

kubectl get ipaddresspool -n metallb-system -o yaml | grep -A 5 "addresses:"
kubectl get svc -A | grep LoadBalancer | wc -l

Solution: Change to ClusterIP (already done in manifests):

spec:
  type: ClusterIP  # Not LoadBalancer

Using the Sandbox¶

Connect Applications¶

Update .env.local to point to sandbox:

# Option 1: Via port-forward
kubectl port-forward -n supabase-sandbox svc/kong 8000:8000
NEXT_PUBLIC_SUPABASE_URL=http://localhost:8000
NEXT_PUBLIC_SUPABASE_ANON_KEY=<see /root/.supabase-sandbox-secrets.txt>

# Option 2: Via cluster DNS (from within cluster)
NEXT_PUBLIC_SUPABASE_URL=http://kong.supabase-sandbox.svc.cluster.local:8000

Test Schema Changes¶

# 1. Port-forward to sandbox
kubectl port-forward -n supabase-sandbox svc/kong 8000:8000

# 2. Create migration
cd /root/projects/your-app
npx supabase migration new test_change

# 3. Apply to sandbox via Studio
kubectl port-forward -n supabase-sandbox svc/studio 3000:3000
# Open http://localhost:3000 → SQL Editor → Run migration

# 4. Test in app
npm run dev
# App connects to localhost:8000 (sandbox)

# 5. If works, apply to production
/root/scripts/migrate-app.sh your-app

Sandbox Refresh (Sync with Production)¶

Use case: Sandbox is running but you want fresh production data (e.g., after making schema changes in production).

Option 1: Complete Refresh (Recommended)¶

Destroys and recreates sandbox with latest production data:

/root/scripts/deploy-sandbox-complete.sh --destroy-first

When to use: - ✅ First refresh of the day/week - ✅ After major production schema changes - ✅ Want guaranteed clean slate - ✅ Sandbox has issues (CrashLoopBackOff, etc.)

Time: 60-120 minutes (full deployment + restore)

Option 2: In-Place Refresh (Faster)¶

Keeps sandbox running, only replaces database:

# WARNING: This uses pipe approach and may hang on large backups
# Consider using Option 1 instead for reliability

/root/scripts/sync-production-to-sandbox.sh --latest

When to use: - ⚠️ Small backups only (< 10MB compressed) - ⚠️ Quick data refresh without infrastructure changes - ⚠️ Known to work in your environment

Time: 30-90 minutes (restore only)

⚠️ Known Issue: This script uses the pipe approach (gpg | gunzip | kubectl exec psql) which can hang on large backups. If it hangs, use Option 1 instead.

Option 3: Manual Refresh (For Testing)¶

Step-by-step manual process:

# 1. Find latest backup
ls -lth /vault/backups/databases/supabase-backup-*.sql.gz.gpg | head -5

# 2. Decrypt to temp file (avoids pipe hang)
BACKUP="/vault/backups/databases/supabase-backup-20251202-221700.sql.gz.gpg"
gpg --batch --yes --passphrase-file /root/.database-backup-passphrase \
    --decrypt "$BACKUP" 2>/dev/null | gunzip > /tmp/manual-restore.sql

# 3. Restore to sandbox
kubectl exec -i -n supabase-sandbox postgres-0 -- \
    psql -U postgres -d postgres < /tmp/manual-restore.sql

# 4. Verify schemas
kubectl exec -n supabase-sandbox postgres-0 -- \
    psql -U postgres -d postgres -c "\dn"

# 5. Fix migration tracking (if needed)
kubectl exec -n supabase postgres-0 -- \
    psql -U postgres -d postgres -c "COPY (SELECT * FROM storage.migrations WHERE id >= 2) TO STDOUT;" | \
kubectl exec -i -n supabase-sandbox postgres-0 -- \
    psql -U postgres -d postgres -c "COPY storage.migrations FROM STDIN;"

# 6. Restart services
kubectl delete pod -n supabase-sandbox -l app=storage
kubectl delete pod -n supabase-sandbox -l app=kong

# 7. Cleanup
rm /tmp/manual-restore.sql

Refresh Frequency Recommendations¶

Daily development: - No refresh needed - sandbox persists changes - Use sandbox for testing migrations before production

Weekly testing: - Refresh Monday morning to sync with production - Ensures sandbox has latest production data

After production changes: - Refresh immediately after schema migrations - Validates migration works on production-like data

Ad-hoc: - Before testing RBAC Phase 3 migration - Before major feature development - When sandbox data is stale (> 1 week)

Performance Notes¶

Backup Restore Time¶

Observed: 60-90 minutes for 52M SQL dump (8.2M compressed)

Why so slow? 1. kubectl exec overhead: The 52M SQL file is streamed through the Kubernetes API to the pod 2. Sequential processing: PostgreSQL processes each DDL statement (CREATE TABLE, CREATE FUNCTION, etc.) one at a time 3. Complex schema: Production backup includes 100+ tables, functions, triggers, RLS policies across multiple schemas

Breakdown: - Steps 1-3: ~2 minutes (fast - deployment and PostgreSQL ready) - Step 4 (restore): 60-90 minutes (slow - kubectl exec bottleneck) - Steps 5-6: ~5 minutes (restarts and validation)

Why not faster? - Considered: Direct PostgreSQL connection (requires exposing port, security risk) - Considered: Copy file into pod first (adds complexity, still slow to execute SQL) - Current approach: Simple, secure, works reliably - just requires patience

Recommendation: - Run deploy-sandbox-complete.sh in background or tmux session - Let it run overnight if needed - The process is reliable - it won't hang (temp file approach prevents pipe issues)

Maintenance¶

Destroy Sandbox¶

/root/scripts/deploy-supabase-sandbox.sh --destroy

Deletes namespace, pods, PVCs, and all data.

Redeploy Sandbox¶

# Clean slate
/root/scripts/deploy-supabase-sandbox.sh --destroy
/root/scripts/deploy-sandbox-complete.sh

# Or update existing
kubectl apply -f /root/tower-fleet/manifests/supabase-sandbox/

Check Sandbox Status¶

/root/scripts/deploy-supabase-sandbox.sh --status

Shows: - Namespace status - Pod health - Service endpoints - PVC usage

Resource Requirements (Validated)¶

Minimum (tested and working): - CPU: 500m total - Memory: 1.5Gi total - Storage: 10Gi PVC

Per-service breakdown:

Service	CPU Request	Memory Limit	Notes
PostgreSQL	100m	1Gi	Can go lower but not recommended
Kong	100m	512Mi	⚠️ Less = OOMKilled
Studio	50m	256Mi
GoTrue	50m	128Mi
PostgREST	50m	256Mi
Storage	50m	256Mi
postgres-meta	50m	256Mi

What happens if you go too low: - Kong < 512Mi → OOMKilled, CrashLoopBackOff - PostgreSQL < 256Mi → Slow queries, potential crashes - Others < 64Mi → May work but unstable

Comparison: Sandbox vs Production¶

Aspect	Production	Sandbox
Namespace	`supabase`	`supabase-sandbox`
Services	LoadBalancer (external IPs)	ClusterIP (internal only)
Storage	20Gi PVC	10Gi PVC
Resources	Full (750m CPU, 2Gi RAM)	Reduced (500m CPU, 1.5Gi RAM)
Data	Live production	Restored from backup
Access	http://10.89.97.214:8000	Port-forward only

Troubleshooting Checklist¶

If deployment fails:

✅ Check cluster capacity: kubectl top nodes
✅ Check namespace clean: kubectl get namespace supabase-sandbox
✅ Check secrets generated: ls /root/.supabase-sandbox-secrets.txt
✅ Check init scripts applied: kubectl get configmap -n supabase-sandbox postgres-init-scripts
✅ Check PostgreSQL running: kubectl get pods -n supabase-sandbox -l app=postgres
✅ Check schemas created: kubectl exec -n supabase-sandbox postgres-0 -- psql -U postgres -c "\dn"

If services crash:

✅ Check logs: kubectl logs -n supabase-sandbox -l app=<service>
✅ Check resource limits: kubectl describe pod -n supabase-sandbox -l app=<service>
✅ Check for OOMKilled: kubectl get events -n supabase-sandbox --sort-by='.lastTimestamp'

If migrations fail:

✅ Check migration tables exist: kubectl exec -n supabase-sandbox postgres-0 -- psql -U postgres -c "SELECT * FROM storage.migrations;"
✅ Copy from production if needed (Step 4)
✅ Restart services: kubectl delete pod -n supabase-sandbox -l app=storage

References¶

Success Criteria¶

Sandbox deployment is successful when:

✅ All 7 pods in Running state ✅ Kong API responds: curl http://localhost:8000/ (via port-forward) ✅ Studio accessible: http://localhost:3000 (via port-forward) ✅ Production data restored with all schemas present ✅ No CrashLoopBackOff or OOMKilled errors ✅ Can connect app by changing .env.local