Skip to content

Phase 6: Hardening - Implementation Plan

Target Version: 0.7.0 Status: Planning Created: 2025-12-23

Overview

Phase 6 transforms the intent system from "working" to "production-ready" with: - Reproducibility (replay from audit logs) - Integrity verification (automated chain checks) - Observability (metrics, alerting) - Usability (documentation)


Task Breakdown

Task 1: Replay Capability (P1)

Goal: Re-run any past intent execution from its audit log.

Use Cases: - Debugging: "Why did deploy fail last Tuesday?" - Disaster recovery: "Re-run the migration that worked" - Testing: "Replay this intent against staging"

Script: scripts/replay-intent.sh

Interface:

# Replay exact execution (same params, context)
./scripts/replay-intent.sh <request_id>

# Replay with dry-run first
./scripts/replay-intent.sh <request_id> --dry-run

# Replay with fresh context (new git sha, etc.)
./scripts/replay-intent.sh <request_id> --fresh-context

# Replay with modified params
./scripts/replay-intent.sh <request_id> --override app=money-tracker

Implementation Steps:

  1. Extract intent and params from audit log

    # From audit log, get first event
    intent=$(jq -r 'select(.event=="intent_received") | .intent' < log.jsonl)
    params=$(jq -r 'select(.event=="intent_received") | .params' < log.jsonl)
    intent_file=$(find intents/ -name "${intent}.yaml")
    

  2. Reconstruct execution command

    # Build params string
    params_str=$(echo "$params" | jq -r 'to_entries | map("\(.key)=\(.value)") | join(" ")')
    
    # Execute
    ./scripts/run-intent.sh "$intent_file" --params $params_str
    

  3. Add replay metadata to audit log

    {
      "event": "intent_received",
      "replay_of": "req_original_123",
      "replay_mode": "exact|fresh-context|override"
    }
    

  4. Handle edge cases

  5. Intent file changed since original run → warn user
  6. Context capture commands fail → use cached values or fail
  7. Original run used --confirm → require again

Deliverable: scripts/replay-intent.sh (~100 lines)

Acceptance Criteria: - [ ] Can replay any completed intent from audit log - [ ] Replay is logged with reference to original request - [ ] --dry-run shows what would execute - [ ] --fresh-context captures new environment state


Task 2: Audit Integrity Check (P1)

Goal: Automatically verify hash chains haven't been tampered with.

Existing: audit-viewer.sh verify <request_id> - verifies single log

New: scripts/audit-integrity-check.sh - batch verification + cron

Interface:

# Check all logs from today
./scripts/audit-integrity-check.sh

# Check specific date
./scripts/audit-integrity-check.sh --date 2025-12-22

# Check last N days
./scripts/audit-integrity-check.sh --days 7

# Output format
./scripts/audit-integrity-check.sh --format json

Implementation Steps:

  1. Create batch verification script

    #!/bin/bash
    # For each log file in date range:
    #   - Verify hash chain
    #   - Record result
    #   - Alert on failure
    

  2. Add cron job

    # /etc/cron.d/intent-audit-check
    0 2 * * * root /root/tower-fleet/scripts/audit-integrity-check.sh --days 1 --alert
    

  3. Alert on failure

  4. Write to /root/tower-fleet/logs/alerts/audit-integrity.log
  5. Exit non-zero (for cron email)
  6. Future: webhook notification

  7. Summary output

    Audit Integrity Check - 2025-12-23
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    Logs checked: 18
    Valid chains: 18
    Broken chains: 0
    Status: PASS
    

Deliverable: - scripts/audit-integrity-check.sh (~80 lines) - Cron configuration in scripts/cron/audit-check.cron

Acceptance Criteria: - [ ] Batch verification of all logs in date range - [ ] Cron job runs daily at 2am - [ ] Broken chain alerts written to log - [ ] Exit code reflects success/failure


Task 3: Failure Alerting (P2)

Goal: Notify when intents fail so issues don't go unnoticed.

Approach: Start simple (log file), add webhook later.

Implementation Steps:

  1. Add failure hook to run-intent.sh

    # On execution_completed with outcome=failed:
    log_failure "$request_id" "$intent" "$failed_step" "$error"
    

  2. Create failure log

    # /root/tower-fleet/logs/alerts/intent-failures.log
    2025-12-23T15:30:00Z | req_abc123 | deploy-app | build_image | Exit code 1
    

  3. Add daily failure summary (optional)

    # scripts/intent-failure-summary.sh
    # Summarizes failures from last 24h
    

  4. Future: Webhook integration

    # intents/config.yaml
    alerts:
      webhook_url: "https://discord.com/api/webhooks/..."
      on_failure: true
      on_rollback: true
    

Deliverable: - Failure logging in run-intent.sh (modify existing) - logs/alerts/ directory structure - Optional: scripts/intent-failure-summary.sh

Acceptance Criteria: - [ ] All failures logged to logs/alerts/intent-failures.log - [ ] Log includes: timestamp, request_id, intent, step, error - [ ] Easy to grep/tail for monitoring


Task 4: Metrics Collection (P2)

Goal: Track success rates, durations, and patterns.

Script: scripts/intent-metrics.sh

Interface:

# Summary for today
./scripts/intent-metrics.sh

# Summary for date range
./scripts/intent-metrics.sh --from 2025-12-20 --to 2025-12-23

# Per-intent breakdown
./scripts/intent-metrics.sh --by-intent

# JSON output (for dashboards)
./scripts/intent-metrics.sh --format json

Metrics to Collect:

Metric Description
total_executions Count of intent runs
success_count Completed successfully
failure_count Failed (with or without rollback)
rollback_count Failures that triggered rollback
avg_duration_ms Average execution time
p95_duration_ms 95th percentile duration
by_intent Breakdown per intent type
by_step Which steps fail most often

Implementation Steps:

  1. Parse audit logs

    # For each log file:
    #   - Extract intent name
    #   - Extract outcome (success/failed)
    #   - Calculate duration (first to last timestamp)
    #   - Record failed step if applicable
    

  2. Aggregate statistics

    # Use awk/jq to compute:
    #   - Counts by outcome
    #   - Duration percentiles
    #   - Failure hotspots
    

  3. Format output

    Intent System Metrics - 2025-12-23
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
    
    Overall:
      Total executions: 18
      Success rate: 94.4% (17/18)
      Avg duration: 45.2s
    
    By Intent:
      deploy-app:    8 runs, 87.5% success, avg 52s
      observe-app:   6 runs, 100% success, avg 3s
      restart-app:   4 runs, 100% success, avg 15s
    
    Top Failure Steps:
      build_image: 1 failure (npm install timeout)
    

Deliverable: scripts/intent-metrics.sh (~150 lines)

Acceptance Criteria: - [ ] Aggregates metrics from audit logs - [ ] Shows success rate, duration, by-intent breakdown - [ ] JSON output available for programmatic use - [ ] Date range filtering works


Task 5: Documentation (P2)

Goal: Complete user guide for the intent system.

Document: docs/guides/intent-system-user-guide.md

Outline:

# Intent System User Guide

## Quick Start
- Running your first intent
- Dry-run mode
- Understanding the output

## Available Intents
- deploy-app
- observe-app
- check-logs
- restart-app
- scale-app
- migrate-schema
- create-nextjs-app

## Using Slash Commands
- Command reference
- Examples

## Natural Language (LLM) Integration
- How intent matching works
- When to use slash commands vs natural language

## Audit Logs
- Viewing execution history
- Verifying integrity
- Replaying past executions

## Troubleshooting
- Common errors
- Lock issues
- Template resolution problems

## Extending the System
- Creating new intents
- Adding policy rules
- Custom verification checks

Deliverable: docs/guides/intent-system-user-guide.md (~300 lines)

Acceptance Criteria: - [ ] Quick start gets user running intent in <5 minutes - [ ] All 7 intents documented with examples - [ ] Troubleshooting covers common issues - [ ] Syncs to OtterWiki


Implementation Order

Based on dependencies and priority:

Week 1: P1 Tasks (Core Reliability)
├── Task 2: Audit Integrity Check (foundation for trust)
└── Task 1: Replay Capability (depends on stable audit logs)

Week 2: P2 Tasks (Observability)
├── Task 3: Failure Alerting (simple, high value)
└── Task 4: Metrics Collection (builds on alerting)

Week 3: P2 Tasks (Usability)
└── Task 5: Documentation (captures learnings from above)

Estimated Effort

Task New Code Modify Existing Total
Replay Capability ~100 lines ~20 lines ~2 hours
Audit Integrity ~80 lines cron setup ~1.5 hours
Failure Alerting ~30 lines ~40 lines ~1 hour
Metrics Collection ~150 lines - ~2 hours
Documentation ~300 lines - ~2 hours
Total ~660 lines ~60 lines ~8.5 hours

Success Criteria for Phase 6

Phase 6 is complete when:

  1. Replay works: Any past intent can be re-executed from its audit log
  2. Integrity verified daily: Cron job checks all audit chains
  3. Failures visible: All failures logged to dedicated alert log
  4. Metrics available: Can answer "what's our deploy success rate?"
  5. Documented: New user can run intents from reading the guide

Files to Create/Modify

New Files:

scripts/
├── replay-intent.sh           # Task 1
├── audit-integrity-check.sh   # Task 2
├── intent-metrics.sh          # Task 4
└── cron/
    └── audit-check.cron       # Task 2

logs/alerts/                   # Task 3
├── intent-failures.log
└── audit-integrity.log

docs/guides/
└── intent-system-user-guide.md  # Task 5

Modified Files:

scripts/run-intent.sh          # Task 3: Add failure logging hook
docs/architecture/intent-system-roadmap.md  # Update status


Next Action

Start with Task 2: Audit Integrity Check because: 1. It's foundational (ensures audit logs are trustworthy) 2. Uses existing audit-viewer.sh verify logic 3. Quick win (cron setup is straightforward) 4. Enables Task 1 (replay depends on trusted logs)

Ready to implement?