Skip to content

Infrastructure Audit — Current State Analysis


Overview

This document captures the current infrastructure capabilities relevant to building the Vault Platform, identifying what exists and what needs to be built.

Audit Date: 2025-12-16


Current Infrastructure Summary

What We Have

Capability Status Notes
K8s Cluster Production 3-node k3s, namespace isolation
Authentik SSO Production OAuth/OIDC, MFA, group-based RBAC
Supabase Production PostgreSQL, Storage, schema isolation
Sealed Secrets Production Encrypted K8s secrets
Background Workers Pattern exists Celery + Redis (subtitleai)
Monitoring Production Prometheus, Grafana, Loki
Deploy Scripts Automated Per-app deployment patterns

What We Don't Have

Capability Status Priority
End-to-End Encryption Not built CRITICAL
Field-Level Encryption Not built CRITICAL
Audit Logging Not built CRITICAL
Dead Man's Switch Not built CRITICAL
Notification Service Not built HIGH
Document Generation Not built HIGH
Multi-Party Access Partial HIGH
Triggered Actions Not built HIGH
Key Management Basic MEDIUM
Data Export Not built MEDIUM

Detailed Assessment

Kubernetes Cluster

Strengths: - Namespace isolation per app - Longhorn distributed storage - Private container registry - Ingress with TLS termination

Gaps for Vault Platform: - No network policies (pod-to-pod communication not restricted) - No mTLS between services - No dedicated namespace for sensitive workloads

Recommendation: Create vault-core namespace with stricter network policies.


Authentication (Authentik)

Strengths: - MFA required on first login - Group-based RBAC - OAuth/OIDC standard compliance - WebAuthn support (phishing-resistant)

Gaps for Vault Platform: - No "executor" role type - No delegation/power-of-attorney patterns - No death verification flow - No secondary contact verification

Recommendation: Extend Authentik groups for vault-specific roles; build custom flows for death verification.


Database (Supabase)

Strengths: - Schema-based multi-app isolation - Row-Level Security (RLS) - Storage API for files - Real-time subscriptions

Gaps for Vault Platform: - No encryption at rest beyond standard PostgreSQL - No audit table patterns - No trigger-based change tracking - RLS policies need extension for multi-party access

Recommendation: - Create vault_core schema for shared infrastructure - Implement append-only audit tables with triggers - Extend RLS for conditional access patterns


Secret Management

Strengths: - Sealed Secrets for K8s secrets - Secrets not in git (gitignored)

Gaps for Vault Platform: - No user key management - No key rotation automation - No escrow/recovery system - No HSM or external KMS

Recommendation: - Build key management service in application layer - Consider external KMS for production (AWS KMS, HashiCorp Vault) - Implement M-of-N recovery in application code


Background Processing

Strengths: - Celery + Redis pattern exists (subtitleai) - Worker deployment patterns established

Gaps for Vault Platform: - No scheduled job management - No trigger evaluation loop - No notification queue - No retry/failure handling patterns

Recommendation: - Extend Celery for trigger evaluation (runs every 5 minutes) - Add dedicated notification worker - Implement dead letter queue for failed jobs


Monitoring & Observability

Strengths: - Prometheus metrics collection - Grafana dashboards - Loki log aggregation - AlertManager for alerts

Gaps for Vault Platform: - No audit-specific dashboards - No trigger execution monitoring - No security event alerting - No compliance reporting

Recommendation: - Create vault-specific Grafana dashboards - Add alerts for security-relevant events - Implement audit trail integrity monitoring


Existing Patterns to Leverage

From Home Portal

  • Authentik OAuth integration
  • Group-based RBAC
  • Supabase RLS patterns
  • JWT bridge for auth

Reusable: Auth middleware, session management, RLS helpers

From Money Tracker

  • Financial data handling
  • User data isolation
  • Export patterns (if any)

Reusable: Data export approach, privacy patterns

From Subtitleai

  • Background worker architecture
  • Job queue patterns
  • Long-running task handling

Reusable: Celery worker setup, polling patterns

From RMS

  • AI integration patterns
  • Complex form handling
  • Multi-step workflows

Reusable: Form patterns, workflow state management


New Infrastructure Required

Tier 1: Foundation (Must Build)

Component Description Effort
Encryption Service Client/server encryption, all three classes 1 week
Key Management User keys, escrow, recovery 1 week
Audit Logger Append-only, hash-chained, signed 1 week
Trigger Engine State machine, evaluation loop 1-2 weeks

Tier 2: Services (Should Build)

Component Description Effort
Notification Service Email, SMS, push, templates 1 week
Document Generation PDF generation, templates 3 days
Export Service Data export in open format 3 days

Tier 3: Enhancement (Nice to Have)

Component Description Effort
External KMS Integration AWS KMS or HashiCorp Vault 1 week
Blockchain Anchoring Periodic hash anchoring 3 days
Multi-region Disaster recovery 2+ weeks

Database Schema Requirements

New Schema: vault_core

-- Core vault infrastructure shared by all products

-- Documents with encryption metadata
CREATE TABLE vault_core.documents (...);

-- User encryption keys
CREATE TABLE vault_core.user_keys (...);

-- Append-only audit log
CREATE TABLE vault_core.audit_log (...);

-- Triggers (dead man's switch, scheduled, etc.)
CREATE TABLE vault_core.triggers (...);

-- Check-ins for dead man's switch
CREATE TABLE vault_core.check_ins (...);

-- Access grants (multi-party sharing)
CREATE TABLE vault_core.access_grants (...);

-- Notifications sent
CREATE TABLE vault_core.notifications (...);

Product-Specific Schemas

Each product gets its own schema for product-specific data: - witness_vault.* - final_file.* - last_mile.* - parent_trap.* - exit_map.* - ghost_protocol.*


External Services Required

Tier 1: Essential

Service Purpose Options
Email Notifications, alerts Resend, SendGrid, SES
SMS Check-in reminders, alerts Twilio, SNS
Service Purpose Options
Push Notifications Mobile alerts Firebase, OneSignal
Error Tracking Production debugging Sentry

Tier 3: Future

Service Purpose Options
External KMS Key management AWS KMS, HashiCorp Vault
E-Signature Legal documents DocuSign, HelloSign
Video Processing Legacy recordings Cloudflare Stream, Mux

Security Hardening Required

Before Production

  • [ ] Network policies for vault namespace
  • [ ] Pod security policies/standards
  • [ ] Secret rotation procedures
  • [ ] Penetration testing
  • [ ] Security audit of encryption implementation
  • [ ] Incident response runbook

Production Monitoring

  • [ ] Failed authentication alerting
  • [ ] Trigger execution monitoring
  • [ ] Audit log integrity checks
  • [ ] Anomaly detection for access patterns

Estimated Infrastructure Investment

Phase 0.5 (Thin Slice)

  • Minimal new infrastructure
  • Use existing Supabase, add encryption library
  • Basic worker for trigger evaluation
  • Email via Resend

Effort: 1-2 developer-weeks

Phase 1 (Foundation)

  • Full encryption service
  • Key management service
  • Audit logger with integrity
  • Trigger engine
  • Notification service

Effort: 4-5 developer-weeks

Production Hardening

  • Security audit
  • Network policies
  • Monitoring dashboards
  • Runbooks

Effort: 2-3 developer-weeks


Recommendations

Immediate Actions

  1. Set up vault-core namespace in K8s with network policies
  2. Create Resend account for email notifications
  3. Create Twilio account for SMS
  4. Select encryption library (recommendation: libsodium via tweetnacl)

Short-Term

  1. Build encryption service as first foundation component
  2. Extend Supabase schema with vault_core tables
  3. Set up trigger worker using existing Celery patterns

Medium-Term

  1. Evaluate external KMS for production key management
  2. Implement blockchain anchoring for audit trail
  3. Consider multi-region for disaster recovery

This audit identifies the gap between current infrastructure and Vault Platform requirements. Most gaps are addressable with application-layer development on existing infrastructure.