Storage Verification Guide¶

This guide helps verify that persistent storage is correctly configured for stateful applications in Kubernetes.

Why This Matters¶

Some Kubernetes workloads use ephemeral storage (emptyDir) by default, which means: - ❌ Data is lost when pods restart - ❌ Data is lost when pods are rescheduled to different nodes - ❌ No persistence across pod lifecycles

For critical services like Prometheus, databases, and registries, you must use PersistentVolumeClaims (PVCs).

Common Issue: Duplicate YAML Keys¶

Problem: YAML files with duplicate keys at the root level will have the second key overwrite the first.

Example of BROKEN configuration:¶

# ❌ WRONG - Second prometheus: block overwrites the first!
prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storage: 50Gi

# ... other config ...

prometheus:
  service:
    type: ClusterIP  # This OVERWRITES everything above!

Correct configuration:¶

# ✅ CORRECT - Single prometheus: block with all settings
prometheus:
  service:
    type: ClusterIP

  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          storage: 50Gi

Verification Steps¶

Step 1: Check Helm Values Were Applied¶

# View what Helm actually applied
helm get values <release-name> -n <namespace>

# Example:
helm get values kube-prometheus-stack -n monitoring

What to look for: - Verify storageSpec or storage sections are present - Check that all expected configuration blocks exist - Watch for missing sections (indicates duplicate keys in values file)

Step 2: Check PVCs Exist¶

# List all PVCs in namespace
kubectl get pvc -n <namespace>

# Example for monitoring:
kubectl get pvc -n monitoring

Expected output for kube-prometheus-stack:

NAME                                        STATUS   VOLUME       CAPACITY
prometheus-...-prometheus-0                 Bound    pvc-xxx      30Gi
alertmanager-...-alertmanager-0             Bound    pvc-yyy      2Gi
kube-prometheus-stack-grafana               Bound    pvc-zzz      5Gi

Step 3: Check Pod Volume Mounts¶

# Describe pod to see volumes
kubectl describe pod <pod-name> -n <namespace>

# Check volumes section
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.volumes}' | jq '.'

What to look for: - persistentVolumeClaim - ✅ Good! Data persists - emptyDir: {} - ❌ Bad! Data is ephemeral

Step 4: Verify StatefulSet VolumeClaimTemplates¶

For StatefulSets (Prometheus, Alertmanager, databases):

# Check volumeClaimTemplates
kubectl get statefulset <name> -n <namespace> -o yaml | grep -A 20 volumeClaimTemplates

Note: StatefulSets do NOT update volumeClaimTemplates after creation. If you change Helm values after initial install, you must delete and recreate the StatefulSet.

Fixing Storage Issues¶

Issue 1: emptyDir Instead of PVC¶

Symptoms: - No PVCs created for stateful app - Data lost on pod restart - emptyDir: {} in pod volumes

Root Cause: - Helm values not applied correctly (duplicate keys) - Chart version doesn't support persistent storage - Storage disabled in values

Fix: 1. Correct the Helm values file (remove duplicate keys) 2. Uninstall and reinstall the Helm release:

helm uninstall <release> -n <namespace>
helm install <release> <chart> -n <namespace> --values <fixed-values.yaml>

3. Verify PVCs were created:

kubectl get pvc -n <namespace>

Issue 2: StatefulSet Not Using New Storage Config¶

Symptoms: - Updated Helm values - Ran helm upgrade - Still no PVCs or still using emptyDir

Root Cause: StatefulSets do NOT update volumeClaimTemplates on upgrade (Kubernetes limitation).

Fix:

# Option 1: Delete and recreate (data loss!)
kubectl delete statefulset <name> -n <namespace> --cascade=orphan
helm upgrade <release> <chart> -n <namespace> --values <values.yaml>

# Option 2: Uninstall and reinstall (recommended for fresh start)
helm uninstall <release> -n <namespace>
kubectl delete pvc --all -n <namespace>  # Clean up old PVCs
helm install <release> <chart> -n <namespace> --values <values.yaml>

Issue 3: Insufficient Storage¶

Symptoms: - PVC stuck in "Pending" - Volume shows "faulted" in Longhorn UI - Error: "precheck new replica failed: insufficient storage"

Root Cause: - Requested storage × replica count exceeds available space - Longhorn reserves 25% of disk by default - Previous PVCs using too much space

Fix: 1. Check available storage:

# Via kubectl
kubectl get nodes.longhorn.io -n longhorn-system -o yaml | grep storageAvailable

# Or via Longhorn UI
# http://10.89.97.210 → Node tab

Reduce PVC sizes in Helm values

Delete unnecessary PVCs:

kubectl delete pvc <name> -n <namespace>

Consider adjusting Longhorn settings:

# Reduce storage over-provisioning (default 100%)
kubectl patch setting storage-over-provisioning-percentage -n longhorn-system \
  --type='json' -p='[{"op": "replace", "path": "/value", "value": "200"}]'

# Reduce reserved percentage (default 25%)
kubectl patch setting storage-minimal-available-percentage -n longhorn-system \
  --type='json' -p='[{"op": "replace", "path": "/value", "value": "15"}]'

Storage Capacity Planning¶

Tower Fleet k3s Cluster¶

Hardware: - 3 nodes × 80GB = 240GB raw storage - Longhorn 2-replica = 120GB usable (50% overhead) - 25% reserved = 90GB available for allocation

Current Allocations:

Service                PVC Size    × Replicas   Total Used
---------------------------------------------------------------
Prometheus             30GB        × 2          60GB
Grafana                5GB         × 2          10GB
Alertmanager           2GB         × 2          4GB
Docker Registry        10GB        × 2          20GB
---------------------------------------------------------------
Total                                           94GB / 90GB usable

Recommendations: - Prometheus: 20-30GB for 14-day retention - Grafana: 1-5GB (dashboards are tiny) - Alertmanager: 1-2GB (alerts are small) - Docker Registry: 5-10GB (homelab with ~20 images) - PostgreSQL: 10-20GB per database - Redis: 1-5GB (if persisted)

Quick Reference¶

Check storage status:

# Longhorn capacity
kubectl get nodes.longhorn.io -n longhorn-system

# PVCs by namespace
kubectl get pvc -A

# Storage usage in Longhorn UI
open http://10.89.97.210

Verify persistence:

# Check pod volumes
kubectl describe pod <name> -n <namespace> | grep -A 10 "Volumes:"

# Should see persistentVolumeClaim, NOT emptyDir

Common fix commands:

# Reinstall Helm chart with correct values
helm uninstall <release> -n <namespace>
helm install <release> <chart> -n <namespace> --values <values.yaml>

# Verify PVCs created
kubectl get pvc -n <namespace>

# Check pod is using PVC
kubectl describe pod <name> -n <namespace>

Troubleshooting Guide - Pod and storage issues
Core Infrastructure - Initial setup
Longhorn Documentation