Skip to content

Operational Runbooks — v0 Stubs

Purpose: Lightweight operational procedures for incident response, support escalation, and emergency controls. Status: Stub document for Phase 0.5. Expand before production.


1. Incident Response

1.1 Key Compromise Response

Scenario: Release keys (stored in SealedSecrets) may have been compromised.

Severity: CRITICAL

Immediate Actions: 1. Pause all trigger executions (see Big Red Button below) 2. Identify scope of compromise:

# List all vaults with release keys
kubectl get secrets -n vault-core -l type=release-key -o name

# Check audit log for unauthorized access
kubectl logs -n vault-core deployment/trigger-executor --since=24h | grep "release-key"
3. Audit all recent executions:
SELECT * FROM vault_core.audit_log
WHERE type IN ('master_key_unwrapped', 'class_b_decryption')
AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY timestamp DESC;
4. Notify affected users if unauthorized decryption detected 5. File incident report with timeline and root cause

Key Rotation Procedure (Phase 0.5 - Server Escrow):

#!/bin/bash
# rotate-release-key.sh - Rotate release key for a single vault
# Usage: ./rotate-release-key.sh <vault_id>

VAULT_ID=$1
NAMESPACE="vault-core"

# 1. Generate new release key
NEW_RELEASE_KEY=$(openssl rand -base64 32)

# 2. Get current escrowed master key from database
ESCROWED_MASTER_KEY=$(psql -t -c "
  SELECT encode(escrowed_master_key, 'base64')
  FROM vault_core.vaults WHERE id = '$VAULT_ID'
")

# 3. Decrypt master key with OLD release key (requires old key still available)
OLD_RELEASE_KEY=$(kubectl get secret -n $NAMESPACE vault-$VAULT_ID-release-key \
  -o jsonpath='{.data.key}' | base64 -d)

# 4. Re-encrypt master key with NEW release key
# (This step requires the crypto service - shown as pseudocode)
# NEW_ESCROWED_MASTER_KEY = encrypt(decrypt(ESCROWED_MASTER_KEY, OLD_RELEASE_KEY), NEW_RELEASE_KEY)

# 5. Update database with new escrowed master key
psql -c "
  UPDATE vault_core.vaults
  SET escrowed_master_key = decode('$NEW_ESCROWED_MASTER_KEY', 'base64'),
      release_key_rotated_at = NOW()
  WHERE id = '$VAULT_ID'
"

# 6. Create new sealed secret
kubectl create secret generic vault-$VAULT_ID-release-key \
  --from-literal=key=$NEW_RELEASE_KEY \
  --dry-run=client -o yaml | \
  kubeseal --format yaml > vault-$VAULT_ID-release-key-sealed.yaml

# 7. Apply new sealed secret
kubectl apply -f vault-$VAULT_ID-release-key-sealed.yaml -n $NAMESPACE

# 8. Delete old secret (after verification)
kubectl delete secret vault-$VAULT_ID-release-key-old -n $NAMESPACE 2>/dev/null || true

# 9. Audit the rotation
psql -c "
  INSERT INTO vault_core.audit_log (type, resource_type, resource_id, payload)
  VALUES ('release_key_rotated', 'vault', '$VAULT_ID',
    '{\"reason\": \"key_compromise_response\", \"rotated_by\": \"operator\"}'::jsonb)
"

echo "Release key rotated for vault $VAULT_ID"

Batch Rotation (all vaults):

#!/bin/bash
# rotate-all-release-keys.sh - Emergency rotation of all release keys

# 1. Get all vault IDs
VAULT_IDS=$(psql -t -c "SELECT id FROM vault_core.vaults WHERE escrowed_master_key IS NOT NULL")

# 2. Rotate each vault
for VAULT_ID in $VAULT_IDS; do
  echo "Rotating vault: $VAULT_ID"
  ./rotate-release-key.sh $VAULT_ID
done

# 3. Invalidate any pending release packages (they used old keys)
psql -c "
  UPDATE vault_core.release_packages
  SET status = 'revoked',
      revoke_reason = 'key_rotation_emergency'
  WHERE status IN ('pending', 'accessed')
  AND created_at < NOW()
"

echo "All release keys rotated. Pending packages revoked."

Recovery: - Verify rotation completed for all affected vaults - Test trigger execution on a test vault - Resume trigger execution after verification - Monitor for any access failures


1.2 Trigger Misfire Response

Scenario: Trigger executed when it shouldn't have (false positive).

Severity: HIGH

Immediate Actions: 1. Attempt abort if trigger is in abortable state:

await abortTrigger(triggerId, {
  reason: 'operator_intervention',
  bypassAuth: true,  // Emergency override
});
2. If already released: - Revoke release package immediately:
UPDATE vault_core.release_packages
SET status = 'revoked'
WHERE trigger_id = '<trigger_id>';
- Contact executor to delete any downloaded content - Cannot un-decrypt, but can limit further access 3. Audit the failure: - Why did multi-signal verification fail? - Were all reminder channels attempted? - Was challenge window honored? 4. Contact user to verify status and explain what happened

Post-Incident: - Adjust trigger configuration if needed - Review multi-signal requirements - Document in incident log


1.3 Data Breach Response

Scenario: Unauthorized access to database or storage.

Severity: CRITICAL

Immediate Actions: 1. Assess scope: - Which tables/buckets were accessed? - Was it read-only or read-write? - Class C documents: ciphertext only (limited impact) - Class B documents: escrow keys needed (check SealedSecrets) - Class A documents: potential plaintext exposure 2. Contain breach: - Rotate database credentials - Revoke compromised service accounts - Review network policies 3. Notify users per data breach requirements: - Class A data: notify all affected users - Class B/C data: notify that ciphertext was accessed (limited impact) 4. Preserve evidence for forensics


2. Support Escalation Paths

2.1 Contested Trigger (User Says "I'm Not Dead")

Scenario: User contacts support claiming trigger fired incorrectly while they're alive.

Priority: URGENT (time-sensitive)

Verification Steps: 1. Verify identity: - Require MFA verification - Security questions if MFA unavailable - Video call for high-stakes situations 2. Check trigger state: - If abortable: abort immediately upon verification - If released: revoke release package, contact executor 3. Document verification method and outcome

Escalation: - If identity cannot be verified: DO NOT abort without management approval - If executor contests: require legal documentation - If user claims coercion: activate safety protocol (see ExitMap)


2.2 Executor Access Issues

Scenario: Executor cannot access released documents.

Troubleshooting: 1. Check release package status:

SELECT * FROM vault_core.release_packages
WHERE executor_id = '<executor_id>'
AND trigger_id = '<trigger_id>';
2. Common issues: - Link expired (expires_at < NOW()) - Download limit exceeded (download_count >= max_downloads) - Package revoked (status = 'revoked') 3. Resolution options: - Regenerate access link (if within policy) - Extend expiry (requires audit entry) - Increase download limit (requires audit entry)


2.3 Death Verification Disputes

Scenario: Parties disagree about whether death verification is legitimate.

Approach: 1. We do not adjudicate truth. We record process. 2. Document that: - Party A submitted certificate on [date] - Party B contested on [date] - Challenge window was [extended/honored] 3. If parties cannot resolve: - Extend challenge window - Require additional verification (notarized documents, court order) - DO NOT release until dispute resolved 4. Escalate to legal counsel if court involvement likely


3. Emergency Controls

3.1 Big Red Button: Pause All Triggers

Use Case: System-wide emergency requiring all trigger execution to stop.

Command:

# Set emergency pause flag
kubectl create configmap trigger-emergency -n vault-core \
  --from-literal=PAUSE_ALL_EXECUTIONS=true

# Restart trigger executor to pick up flag
kubectl rollout restart deployment/trigger-executor -n vault-core

Verification:

# Check no executions are running
kubectl logs -n vault-core deployment/trigger-executor | grep "execution_started"

Resume:

kubectl delete configmap trigger-emergency -n vault-core
kubectl rollout restart deployment/trigger-executor -n vault-core

Audit Requirement: Every pause/resume MUST be logged with: - Who initiated - Reason - Duration - Triggers affected


3.2 Pause Single Vault/Trigger

Use Case: Specific vault or trigger needs to be paused without system-wide impact.

Command:

-- Pause specific trigger
UPDATE vault_core.triggers
SET paused = true,
    paused_at = NOW(),
    paused_by = '<operator_id>',
    pause_reason = '<reason>'
WHERE id = '<trigger_id>';

-- Audit entry created automatically via trigger

Invariant: Paused triggers MUST NOT fire. Trigger executor must check paused flag.


3.3 Emergency User Account Lock

Use Case: Suspected account compromise or coercion.

Command:

-- Lock user account
UPDATE auth.users
SET locked = true,
    locked_at = NOW(),
    locked_reason = '<reason>'
WHERE id = '<user_id>';

-- Pause all user's triggers
UPDATE vault_core.triggers
SET paused = true,
    pause_reason = 'account_locked'
WHERE vault_id IN (
  SELECT id FROM vault_core.vaults WHERE owner_id = '<user_id>'
);

Recovery: - Requires identity verification - Requires management approval - All paused triggers reviewed before resuming


4. Operational Monitoring

4.1 Key Metrics to Watch

Metric Alert Threshold Action
Failed check-ins (24h) > 10% of active triggers Review notification delivery
Trigger execution failures Any Investigate immediately
Audit log gaps Any CRITICAL - investigate immediately
Release package access errors > 5% Review executor onboarding
Class B decryptions outside triggers Any SECURITY INCIDENT

4.2 Daily Operational Checks

  • [ ] Audit log integrity verification passes
  • [ ] No stuck triggers (armed > 30 days without check-in or execution)
  • [ ] SealedSecrets sync healthy
  • [ ] Notification delivery rates normal
  • [ ] No unauthorized access attempts logged

5. Governance

5.1 Who Can Do What

Action Requires
Pause single trigger On-call operator
Pause all triggers Senior operator + documented reason
Abort trigger User request + identity verification
Emergency abort (no user) Management approval
Revoke release package Operator + documented reason
Extend release package Operator + audit entry
Access audit logs Any operator (read-only)
Modify audit logs PROHIBITED

5.2 Audit Requirements

Every operator action must be logged with: - Operator identity - Action taken - Reason/justification - Affected resources - Timestamp

Operator actions are subject to same hash-chaining as user actions.


This is a v0 stub. Expand with specific procedures, contact lists, and detailed playbooks before production launch.