Specification: Locking¶
Spec ID:
tower-fleet/lock/v1Status: Normative Version: 1.0.0 Created: 2025-12-18
Abstract¶
This specification defines the locking mechanism for tower-fleet intent execution. It establishes lock identity, file format, acquisition/release semantics, heartbeat protocol, and stale lock recovery.
Conformance¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.
1. Goals¶
- Prevent concurrent mutation - Two executions MUST NOT mutate the same resource simultaneously
- Survive crashes - Locks MUST be recoverable after executor failure
- Be diagnosable - Lock state MUST be inspectable and understandable
- Support timeouts - Stale locks MUST be detectable and recoverable
2. Lock Identity¶
2.1 Lock Name¶
The lock name is derived from the intent's controls.lock field after template resolution.
With params.app = "money-tracker" and params.environment = "production":
- Lock name: money-tracker-production
2.2 Lock Name Constraints¶
Lock names MUST:
- Contain only: [a-z0-9_-]
- Be between 1 and 128 characters
- Be filesystem-safe (no path separators)
Lock names MUST NOT: - Contain uppercase letters - Contain spaces or special characters - Start or end with hyphen or underscore
2.3 Lock File Path¶
Lock files are stored at:
Example: /root/tower-fleet/logs/locks/money-tracker-production.lock
3. Lock File Format¶
3.1 Schema¶
Lock files MUST contain valid JSON with the following structure:
{
"lock_version": "v1",
"lock_name": "money-tracker-production",
"request_id": "req_abc123def456",
"actor": "claude-code",
"intent": "deploy-app",
"intent_version": "1.2.0",
"host_id": "tower-01",
"pid": 12345,
"created_at": "2025-12-18T10:30:10Z",
"last_heartbeat_at": "2025-12-18T10:30:40Z",
"ttl_seconds": 900,
"metadata": {}
}
3.2 Field Definitions¶
| Field | Type | Required | Description |
|---|---|---|---|
lock_version |
string | MUST | Always "v1" for this spec |
lock_name |
string | MUST | The resolved lock name |
request_id |
string | MUST | Unique identifier for this execution |
actor |
string | MUST | Who initiated the execution |
intent |
string | MUST | Intent name being executed |
intent_version |
string | MUST | Version of the intent definition |
host_id |
string | MUST | Identifier of the executing host |
pid |
integer | MUST | Process ID of the executor |
created_at |
string | MUST | ISO-8601 UTC timestamp of lock creation |
last_heartbeat_at |
string | MUST | ISO-8601 UTC timestamp of last heartbeat |
ttl_seconds |
integer | MUST | Maximum seconds between heartbeats before stale |
metadata |
object | MAY | Additional context (optional) |
3.3 Default TTL¶
If not specified in the intent, the default TTL is 900 seconds (15 minutes).
Intents MAY override via:
4. Acquisition Semantics¶
4.1 Acquisition Flow¶
┌─────────────────┐
│ Check if lock │
│ file exists │
└────────┬────────┘
│
┌────┴────┐
│ Exists? │
└────┬────┘
│
┌────┴────┐ ┌──────────────┐
│ Yes │───────►│ Check stale │
└─────────┘ └──────┬───────┘
│ │
│ ┌─────┴─────┐
│ │ Stale? │
│ └─────┬─────┘
│ │
│ ┌──────────┴──────────┐
│ │ │
│ ┌────┴────┐ ┌─────┴─────┐
│ │ Yes │ │ No │
│ └────┬────┘ └─────┬─────┘
│ │ │
│ ▼ ▼
│ ┌─────────┐ ┌───────────┐
│ │--force? │ │ BLOCKED │
│ └────┬────┘ └───────────┘
│ │
│ ┌────┴────┐
│ │ Yes │──────► Steal lock
│ └─────────┘
│ │
│ ┌────┴────┐
│ │ No │──────► BLOCKED (suggest --force)
│ └─────────┘
│
┌────┴────┐
│ No │
└────┬────┘
│
▼
┌─────────────────┐
│ Create lock │
│ atomically │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Emit audit: │
│ lock_acquired │
└─────────────────┘
4.2 Atomic Creation¶
Lock creation MUST be atomic to prevent race conditions.
Method 1: Exclusive file creation (recommended)
Method 2: Rename from temp
temp_path="${lock_path}.${request_id}.tmp"
echo "$lock_json" > "$temp_path"
mv -n "$temp_path" "$lock_path" || { rm "$temp_path"; exit 1; }
4.3 Acquisition Audit Event¶
On successful acquisition, emit:
{
"event": "lock_acquired",
"request_id": "req_abc123",
"timestamp": "2025-12-18T10:30:10Z",
"lock_name": "money-tracker-production",
"lock_path": "/root/tower-fleet/logs/locks/money-tracker-production.lock",
"ttl_seconds": 900
}
4.4 Blocked Response¶
When blocked by an active lock, the executor MUST:
- NOT proceed with execution
- Return error with lock holder information:
{
"error": "lock_blocked",
"lock_name": "money-tracker-production",
"held_by": {
"request_id": "req_xyz789",
"actor": "claude-code",
"intent": "deploy-app",
"created_at": "2025-12-18T10:25:00Z",
"last_heartbeat_at": "2025-12-18T10:29:45Z"
},
"suggestion": "Wait for completion or use --force-lock if stale"
}
5. Heartbeat Protocol¶
5.1 Purpose¶
Heartbeats allow detection of crashed executors by updating last_heartbeat_at periodically.
5.2 Heartbeat Interval¶
The executor MUST update the heartbeat at interval:
For default TTL of 900s, heartbeat every 300s (5 minutes).
RECOMMENDED: Use shorter interval of 15-30 seconds for better responsiveness.
5.3 Heartbeat Update¶
Heartbeat updates MUST:
1. Be atomic (write temp then rename, or in-place with flock)
2. Update only last_heartbeat_at field
3. Preserve all other fields
update_heartbeat() {
local lock_path="$1"
local temp_path="${lock_path}.heartbeat.tmp"
jq --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
'.last_heartbeat_at = $ts' \
"$lock_path" > "$temp_path"
mv "$temp_path" "$lock_path"
}
5.4 Heartbeat Failure¶
If heartbeat update fails:
1. Log warning but continue execution
2. Retry on next interval
3. After 3 consecutive failures, emit heartbeat_failed audit event
6. Stale Detection¶
6.1 Definition¶
A lock is stale if:
6.2 Stale Check Implementation¶
is_lock_stale() {
local lock_path="$1"
local last_heartbeat
last_heartbeat=$(jq -r '.last_heartbeat_at' "$lock_path")
local ttl
ttl=$(jq -r '.ttl_seconds' "$lock_path")
local last_epoch
last_epoch=$(date -d "$last_heartbeat" +%s)
local now_epoch
now_epoch=$(date +%s)
local age=$((now_epoch - last_epoch))
if [[ $age -gt $ttl ]]; then
echo "stale"
return 0
else
echo "active"
return 1
fi
}
6.3 Stale Lock Handling¶
When a stale lock is detected:
Default behavior (no --force):
{
"error": "lock_stale",
"lock_name": "money-tracker-production",
"stale_since": "2025-12-18T10:15:00Z",
"age_seconds": 1800,
"ttl_seconds": 900,
"held_by": {
"request_id": "req_old123",
"actor": "claude-code",
"host_id": "tower-01",
"pid": 12345
},
"suggestion": "Use --force-lock to acquire (stale lock will be logged)"
}
With --force-lock:
1. Read existing lock metadata
2. Delete existing lock file
3. Create new lock
4. Emit lock_stolen audit event
7. Lock Stealing¶
7.1 When Allowed¶
Lock stealing is ONLY allowed when:
1. Lock is stale (heartbeat expired), AND
2. --force-lock flag is provided
Lock stealing is NEVER allowed for active (non-stale) locks.
7.2 Stolen Lock Audit Event¶
{
"event": "lock_stolen",
"request_id": "req_new456",
"timestamp": "2025-12-18T10:45:00Z",
"lock_name": "money-tracker-production",
"previous_lock": {
"request_id": "req_old123",
"actor": "claude-code",
"intent": "deploy-app",
"created_at": "2025-12-18T10:00:00Z",
"last_heartbeat_at": "2025-12-18T10:15:00Z",
"host_id": "tower-01",
"pid": 12345
},
"previous_lock_hash": "sha256:abc123...",
"reason": "stale_lock_forced"
}
8. Release Semantics¶
8.1 Normal Release¶
On successful completion or controlled abort:
- Remove lock file
- Emit
lock_releasedaudit event
{
"event": "lock_released",
"request_id": "req_abc123",
"timestamp": "2025-12-18T10:45:00Z",
"lock_name": "money-tracker-production",
"held_duration_seconds": 900,
"result": "success"
}
8.2 Release on Failure¶
If execution fails:
- Attempt rollback (if defined)
- Release lock regardless of rollback outcome
- Emit
lock_releasedwith failure info
{
"event": "lock_released",
"request_id": "req_abc123",
"timestamp": "2025-12-18T10:45:00Z",
"lock_name": "money-tracker-production",
"held_duration_seconds": 600,
"result": "failure",
"failure_step": "deploy_manifests"
}
8.3 Release Failure¶
If lock file cannot be removed:
- Log error with errno/message
- Emit
lock_release_failedaudit event - Continue with execution result reporting
{
"event": "lock_release_failed",
"request_id": "req_abc123",
"timestamp": "2025-12-18T10:45:00Z",
"lock_name": "money-tracker-production",
"lock_path": "/root/tower-fleet/logs/locks/money-tracker-production.lock",
"error": "EACCES: permission denied",
"action": "manual_cleanup_required"
}
9. Mode-Based Locking Requirements¶
9.1 Locking by Intent Mode¶
| Mode | Lock Required | Rationale |
|---|---|---|
observe |
MUST NOT lock | Read-only operations are safe to parallelize |
mutate |
MUST lock if controls.lock defined |
Mutations need serialization |
reconcile |
SHOULD lock | Prevents conflicting reconciliation |
9.2 Observe Mode Bypass¶
Intents with mode: observe:
- MUST NOT acquire locks
- MUST NOT be blocked by existing locks
- MAY read lock status for reporting
9.3 Missing Lock Configuration¶
For mode: mutate intents without controls.lock:
- Executor SHOULD warn: "Mutate intent without lock - concurrent execution possible"
- Execution MAY proceed (operator's choice)
10. Implementation Reference¶
10.1 Directory Setup¶
10.2 Lock Manager Script¶
#!/bin/bash
# /root/tower-fleet/scripts/lock-manager.sh
set -euo pipefail
LOCK_DIR="/root/tower-fleet/logs/locks"
DEFAULT_TTL=900
HEARTBEAT_INTERVAL=30
acquire_lock() {
local lock_name="$1"
local request_id="$2"
local intent="$3"
local intent_version="$4"
local force="${5:-false}"
local lock_path="${LOCK_DIR}/${lock_name}.lock"
# Check existing lock
if [[ -f "$lock_path" ]]; then
if is_lock_stale "$lock_path"; then
if [[ "$force" == "true" ]]; then
steal_lock "$lock_path" "$request_id"
else
echo "ERROR: Lock is stale. Use --force-lock to acquire." >&2
cat "$lock_path" >&2
return 1
fi
else
echo "ERROR: Lock is held by active process." >&2
cat "$lock_path" >&2
return 1
fi
fi
# Create lock atomically
local lock_json
lock_json=$(create_lock_json "$lock_name" "$request_id" "$intent" "$intent_version")
local temp_path="${lock_path}.${request_id}.tmp"
echo "$lock_json" > "$temp_path"
if ! mv -n "$temp_path" "$lock_path" 2>/dev/null; then
rm -f "$temp_path"
echo "ERROR: Failed to acquire lock (race condition)" >&2
return 1
fi
echo "Lock acquired: $lock_name"
return 0
}
release_lock() {
local lock_name="$1"
local request_id="$2"
local lock_path="${LOCK_DIR}/${lock_name}.lock"
if [[ ! -f "$lock_path" ]]; then
echo "WARN: Lock file not found: $lock_path" >&2
return 0
fi
# Verify we own the lock
local held_by
held_by=$(jq -r '.request_id' "$lock_path")
if [[ "$held_by" != "$request_id" ]]; then
echo "ERROR: Lock held by different request: $held_by" >&2
return 1
fi
rm -f "$lock_path"
echo "Lock released: $lock_name"
return 0
}
create_lock_json() {
local lock_name="$1"
local request_id="$2"
local intent="$3"
local intent_version="$4"
local now
now=$(date -u +%Y-%m-%dT%H:%M:%SZ)
jq -n \
--arg lock_version "v1" \
--arg lock_name "$lock_name" \
--arg request_id "$request_id" \
--arg actor "${ACTOR:-unknown}" \
--arg intent "$intent" \
--arg intent_version "$intent_version" \
--arg host_id "$(hostname)" \
--argjson pid "$$" \
--arg created_at "$now" \
--arg last_heartbeat_at "$now" \
--argjson ttl_seconds "$DEFAULT_TTL" \
'{
lock_version: $lock_version,
lock_name: $lock_name,
request_id: $request_id,
actor: $actor,
intent: $intent,
intent_version: $intent_version,
host_id: $host_id,
pid: $pid,
created_at: $created_at,
last_heartbeat_at: $last_heartbeat_at,
ttl_seconds: $ttl_seconds,
metadata: {}
}'
}
is_lock_stale() {
local lock_path="$1"
local last_heartbeat ttl last_epoch now_epoch age
last_heartbeat=$(jq -r '.last_heartbeat_at' "$lock_path")
ttl=$(jq -r '.ttl_seconds' "$lock_path")
last_epoch=$(date -d "$last_heartbeat" +%s 2>/dev/null || echo 0)
now_epoch=$(date +%s)
age=$((now_epoch - last_epoch))
[[ $age -gt $ttl ]]
}
# Export functions for use in executor
export -f acquire_lock release_lock is_lock_stale
11. Operational Commands¶
11.1 List Active Locks¶
# List all locks
ls -la /root/tower-fleet/logs/locks/*.lock 2>/dev/null
# Show lock details
for lock in /root/tower-fleet/logs/locks/*.lock; do
echo "=== $(basename "$lock") ==="
jq . "$lock"
done
11.2 Check Specific Lock¶
11.3 Force Release (Emergency)¶
# Manual force release (audit this!)
rm /root/tower-fleet/logs/locks/money-tracker-production.lock
# Prefer: use executor with --force-lock for proper audit trail
12. Security Considerations¶
12.1 Lock Directory Permissions¶
The lock directory SHOULD: - Be owned by the executor user - Have mode 700 (rwx------) - Not be world-writable
12.2 Lock File Integrity¶
Lock files:
- SHOULD be validated on read (JSON parse + schema check)
- MUST NOT be trusted blindly (validate request_id format, timestamps)
12.3 PID Verification¶
The pid field is informational only. Do NOT use it for:
- Process existence checks (PIDs can be recycled)
- Kill signals (dangerous across hosts)
Use heartbeat expiry as the authority for staleness.