Operational Runbooks — v0 Stubs¶
Purpose: Lightweight operational procedures for incident response, support escalation, and emergency controls. Status: Stub document for Phase 0.5. Expand before production.
1. Incident Response¶
1.1 Key Compromise Response¶
Scenario: Release keys (stored in SealedSecrets) may have been compromised.
Severity: CRITICAL
Immediate Actions: 1. Pause all trigger executions (see Big Red Button below) 2. Identify scope of compromise:
# List all vaults with release keys
kubectl get secrets -n vault-core -l type=release-key -o name
# Check audit log for unauthorized access
kubectl logs -n vault-core deployment/trigger-executor --since=24h | grep "release-key"
SELECT * FROM vault_core.audit_log
WHERE type IN ('master_key_unwrapped', 'class_b_decryption')
AND timestamp > NOW() - INTERVAL '24 hours'
ORDER BY timestamp DESC;
Key Rotation Procedure (Phase 0.5 - Server Escrow):
#!/bin/bash
# rotate-release-key.sh - Rotate release key for a single vault
# Usage: ./rotate-release-key.sh <vault_id>
VAULT_ID=$1
NAMESPACE="vault-core"
# 1. Generate new release key
NEW_RELEASE_KEY=$(openssl rand -base64 32)
# 2. Get current escrowed master key from database
ESCROWED_MASTER_KEY=$(psql -t -c "
SELECT encode(escrowed_master_key, 'base64')
FROM vault_core.vaults WHERE id = '$VAULT_ID'
")
# 3. Decrypt master key with OLD release key (requires old key still available)
OLD_RELEASE_KEY=$(kubectl get secret -n $NAMESPACE vault-$VAULT_ID-release-key \
-o jsonpath='{.data.key}' | base64 -d)
# 4. Re-encrypt master key with NEW release key
# (This step requires the crypto service - shown as pseudocode)
# NEW_ESCROWED_MASTER_KEY = encrypt(decrypt(ESCROWED_MASTER_KEY, OLD_RELEASE_KEY), NEW_RELEASE_KEY)
# 5. Update database with new escrowed master key
psql -c "
UPDATE vault_core.vaults
SET escrowed_master_key = decode('$NEW_ESCROWED_MASTER_KEY', 'base64'),
release_key_rotated_at = NOW()
WHERE id = '$VAULT_ID'
"
# 6. Create new sealed secret
kubectl create secret generic vault-$VAULT_ID-release-key \
--from-literal=key=$NEW_RELEASE_KEY \
--dry-run=client -o yaml | \
kubeseal --format yaml > vault-$VAULT_ID-release-key-sealed.yaml
# 7. Apply new sealed secret
kubectl apply -f vault-$VAULT_ID-release-key-sealed.yaml -n $NAMESPACE
# 8. Delete old secret (after verification)
kubectl delete secret vault-$VAULT_ID-release-key-old -n $NAMESPACE 2>/dev/null || true
# 9. Audit the rotation
psql -c "
INSERT INTO vault_core.audit_log (type, resource_type, resource_id, payload)
VALUES ('release_key_rotated', 'vault', '$VAULT_ID',
'{\"reason\": \"key_compromise_response\", \"rotated_by\": \"operator\"}'::jsonb)
"
echo "Release key rotated for vault $VAULT_ID"
Batch Rotation (all vaults):
#!/bin/bash
# rotate-all-release-keys.sh - Emergency rotation of all release keys
# 1. Get all vault IDs
VAULT_IDS=$(psql -t -c "SELECT id FROM vault_core.vaults WHERE escrowed_master_key IS NOT NULL")
# 2. Rotate each vault
for VAULT_ID in $VAULT_IDS; do
echo "Rotating vault: $VAULT_ID"
./rotate-release-key.sh $VAULT_ID
done
# 3. Invalidate any pending release packages (they used old keys)
psql -c "
UPDATE vault_core.release_packages
SET status = 'revoked',
revoke_reason = 'key_rotation_emergency'
WHERE status IN ('pending', 'accessed')
AND created_at < NOW()
"
echo "All release keys rotated. Pending packages revoked."
Recovery: - Verify rotation completed for all affected vaults - Test trigger execution on a test vault - Resume trigger execution after verification - Monitor for any access failures
1.2 Trigger Misfire Response¶
Scenario: Trigger executed when it shouldn't have (false positive).
Severity: HIGH
Immediate Actions: 1. Attempt abort if trigger is in abortable state:
await abortTrigger(triggerId, {
reason: 'operator_intervention',
bypassAuth: true, // Emergency override
});
Post-Incident: - Adjust trigger configuration if needed - Review multi-signal requirements - Document in incident log
1.3 Data Breach Response¶
Scenario: Unauthorized access to database or storage.
Severity: CRITICAL
Immediate Actions: 1. Assess scope: - Which tables/buckets were accessed? - Was it read-only or read-write? - Class C documents: ciphertext only (limited impact) - Class B documents: escrow keys needed (check SealedSecrets) - Class A documents: potential plaintext exposure 2. Contain breach: - Rotate database credentials - Revoke compromised service accounts - Review network policies 3. Notify users per data breach requirements: - Class A data: notify all affected users - Class B/C data: notify that ciphertext was accessed (limited impact) 4. Preserve evidence for forensics
2. Support Escalation Paths¶
2.1 Contested Trigger (User Says "I'm Not Dead")¶
Scenario: User contacts support claiming trigger fired incorrectly while they're alive.
Priority: URGENT (time-sensitive)
Verification Steps: 1. Verify identity: - Require MFA verification - Security questions if MFA unavailable - Video call for high-stakes situations 2. Check trigger state: - If abortable: abort immediately upon verification - If released: revoke release package, contact executor 3. Document verification method and outcome
Escalation: - If identity cannot be verified: DO NOT abort without management approval - If executor contests: require legal documentation - If user claims coercion: activate safety protocol (see ExitMap)
2.2 Executor Access Issues¶
Scenario: Executor cannot access released documents.
Troubleshooting: 1. Check release package status:
SELECT * FROM vault_core.release_packages
WHERE executor_id = '<executor_id>'
AND trigger_id = '<trigger_id>';
expires_at < NOW())
- Download limit exceeded (download_count >= max_downloads)
- Package revoked (status = 'revoked')
3. Resolution options:
- Regenerate access link (if within policy)
- Extend expiry (requires audit entry)
- Increase download limit (requires audit entry)
2.3 Death Verification Disputes¶
Scenario: Parties disagree about whether death verification is legitimate.
Approach: 1. We do not adjudicate truth. We record process. 2. Document that: - Party A submitted certificate on [date] - Party B contested on [date] - Challenge window was [extended/honored] 3. If parties cannot resolve: - Extend challenge window - Require additional verification (notarized documents, court order) - DO NOT release until dispute resolved 4. Escalate to legal counsel if court involvement likely
3. Emergency Controls¶
3.1 Big Red Button: Pause All Triggers¶
Use Case: System-wide emergency requiring all trigger execution to stop.
Command:
# Set emergency pause flag
kubectl create configmap trigger-emergency -n vault-core \
--from-literal=PAUSE_ALL_EXECUTIONS=true
# Restart trigger executor to pick up flag
kubectl rollout restart deployment/trigger-executor -n vault-core
Verification:
# Check no executions are running
kubectl logs -n vault-core deployment/trigger-executor | grep "execution_started"
Resume:
kubectl delete configmap trigger-emergency -n vault-core
kubectl rollout restart deployment/trigger-executor -n vault-core
Audit Requirement: Every pause/resume MUST be logged with: - Who initiated - Reason - Duration - Triggers affected
3.2 Pause Single Vault/Trigger¶
Use Case: Specific vault or trigger needs to be paused without system-wide impact.
Command:
-- Pause specific trigger
UPDATE vault_core.triggers
SET paused = true,
paused_at = NOW(),
paused_by = '<operator_id>',
pause_reason = '<reason>'
WHERE id = '<trigger_id>';
-- Audit entry created automatically via trigger
Invariant: Paused triggers MUST NOT fire. Trigger executor must check paused flag.
3.3 Emergency User Account Lock¶
Use Case: Suspected account compromise or coercion.
Command:
-- Lock user account
UPDATE auth.users
SET locked = true,
locked_at = NOW(),
locked_reason = '<reason>'
WHERE id = '<user_id>';
-- Pause all user's triggers
UPDATE vault_core.triggers
SET paused = true,
pause_reason = 'account_locked'
WHERE vault_id IN (
SELECT id FROM vault_core.vaults WHERE owner_id = '<user_id>'
);
Recovery: - Requires identity verification - Requires management approval - All paused triggers reviewed before resuming
4. Operational Monitoring¶
4.1 Key Metrics to Watch¶
| Metric | Alert Threshold | Action |
|---|---|---|
| Failed check-ins (24h) | > 10% of active triggers | Review notification delivery |
| Trigger execution failures | Any | Investigate immediately |
| Audit log gaps | Any | CRITICAL - investigate immediately |
| Release package access errors | > 5% | Review executor onboarding |
| Class B decryptions outside triggers | Any | SECURITY INCIDENT |
4.2 Daily Operational Checks¶
- [ ] Audit log integrity verification passes
- [ ] No stuck triggers (armed > 30 days without check-in or execution)
- [ ] SealedSecrets sync healthy
- [ ] Notification delivery rates normal
- [ ] No unauthorized access attempts logged
5. Governance¶
5.1 Who Can Do What¶
| Action | Requires |
|---|---|
| Pause single trigger | On-call operator |
| Pause all triggers | Senior operator + documented reason |
| Abort trigger | User request + identity verification |
| Emergency abort (no user) | Management approval |
| Revoke release package | Operator + documented reason |
| Extend release package | Operator + audit entry |
| Access audit logs | Any operator (read-only) |
| Modify audit logs | PROHIBITED |
5.2 Audit Requirements¶
Every operator action must be logged with: - Operator identity - Action taken - Reason/justification - Affected resources - Timestamp
Operator actions are subject to same hash-chaining as user actions.
This is a v0 stub. Expand with specific procedures, contact lists, and detailed playbooks before production launch.