Storage Verification Guide¶
This guide helps verify that persistent storage is correctly configured for stateful applications in Kubernetes.
Why This Matters¶
Some Kubernetes workloads use ephemeral storage (emptyDir) by default, which means: - ❌ Data is lost when pods restart - ❌ Data is lost when pods are rescheduled to different nodes - ❌ No persistence across pod lifecycles
For critical services like Prometheus, databases, and registries, you must use PersistentVolumeClaims (PVCs).
Common Issue: Duplicate YAML Keys¶
Problem: YAML files with duplicate keys at the root level will have the second key overwrite the first.
Example of BROKEN configuration:¶
# ❌ WRONG - Second prometheus: block overwrites the first!
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storage: 50Gi
# ... other config ...
prometheus:
service:
type: ClusterIP # This OVERWRITES everything above!
Correct configuration:¶
# ✅ CORRECT - Single prometheus: block with all settings
prometheus:
service:
type: ClusterIP
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storage: 50Gi
Verification Steps¶
Step 1: Check Helm Values Were Applied¶
# View what Helm actually applied
helm get values <release-name> -n <namespace>
# Example:
helm get values kube-prometheus-stack -n monitoring
What to look for:
- Verify storageSpec or storage sections are present
- Check that all expected configuration blocks exist
- Watch for missing sections (indicates duplicate keys in values file)
Step 2: Check PVCs Exist¶
# List all PVCs in namespace
kubectl get pvc -n <namespace>
# Example for monitoring:
kubectl get pvc -n monitoring
Expected output for kube-prometheus-stack:
NAME STATUS VOLUME CAPACITY
prometheus-...-prometheus-0 Bound pvc-xxx 30Gi
alertmanager-...-alertmanager-0 Bound pvc-yyy 2Gi
kube-prometheus-stack-grafana Bound pvc-zzz 5Gi
Step 3: Check Pod Volume Mounts¶
# Describe pod to see volumes
kubectl describe pod <pod-name> -n <namespace>
# Check volumes section
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.volumes}' | jq '.'
What to look for:
- persistentVolumeClaim - ✅ Good! Data persists
- emptyDir: {} - ❌ Bad! Data is ephemeral
Step 4: Verify StatefulSet VolumeClaimTemplates¶
For StatefulSets (Prometheus, Alertmanager, databases):
# Check volumeClaimTemplates
kubectl get statefulset <name> -n <namespace> -o yaml | grep -A 20 volumeClaimTemplates
Note: StatefulSets do NOT update volumeClaimTemplates after creation. If you change Helm values after initial install, you must delete and recreate the StatefulSet.
Fixing Storage Issues¶
Issue 1: emptyDir Instead of PVC¶
Symptoms:
- No PVCs created for stateful app
- Data lost on pod restart
- emptyDir: {} in pod volumes
Root Cause: - Helm values not applied correctly (duplicate keys) - Chart version doesn't support persistent storage - Storage disabled in values
Fix: 1. Correct the Helm values file (remove duplicate keys) 2. Uninstall and reinstall the Helm release:
helm uninstall <release> -n <namespace>
helm install <release> <chart> -n <namespace> --values <fixed-values.yaml>
Issue 2: StatefulSet Not Using New Storage Config¶
Symptoms:
- Updated Helm values
- Ran helm upgrade
- Still no PVCs or still using emptyDir
Root Cause:
StatefulSets do NOT update volumeClaimTemplates on upgrade (Kubernetes limitation).
Fix:
# Option 1: Delete and recreate (data loss!)
kubectl delete statefulset <name> -n <namespace> --cascade=orphan
helm upgrade <release> <chart> -n <namespace> --values <values.yaml>
# Option 2: Uninstall and reinstall (recommended for fresh start)
helm uninstall <release> -n <namespace>
kubectl delete pvc --all -n <namespace> # Clean up old PVCs
helm install <release> <chart> -n <namespace> --values <values.yaml>
Issue 3: Insufficient Storage¶
Symptoms: - PVC stuck in "Pending" - Volume shows "faulted" in Longhorn UI - Error: "precheck new replica failed: insufficient storage"
Root Cause: - Requested storage × replica count exceeds available space - Longhorn reserves 25% of disk by default - Previous PVCs using too much space
Fix: 1. Check available storage:
# Via kubectl
kubectl get nodes.longhorn.io -n longhorn-system -o yaml | grep storageAvailable
# Or via Longhorn UI
# http://10.89.97.210 → Node tab
- Reduce PVC sizes in Helm values
-
Delete unnecessary PVCs:
-
Consider adjusting Longhorn settings:
# Reduce storage over-provisioning (default 100%) kubectl patch setting storage-over-provisioning-percentage -n longhorn-system \ --type='json' -p='[{"op": "replace", "path": "/value", "value": "200"}]' # Reduce reserved percentage (default 25%) kubectl patch setting storage-minimal-available-percentage -n longhorn-system \ --type='json' -p='[{"op": "replace", "path": "/value", "value": "15"}]'
Storage Capacity Planning¶
Tower Fleet k3s Cluster¶
Hardware: - 3 nodes × 80GB = 240GB raw storage - Longhorn 2-replica = 120GB usable (50% overhead) - 25% reserved = 90GB available for allocation
Current Allocations:
Service PVC Size × Replicas Total Used
---------------------------------------------------------------
Prometheus 30GB × 2 60GB
Grafana 5GB × 2 10GB
Alertmanager 2GB × 2 4GB
Docker Registry 10GB × 2 20GB
---------------------------------------------------------------
Total 94GB / 90GB usable
Recommendations: - Prometheus: 20-30GB for 14-day retention - Grafana: 1-5GB (dashboards are tiny) - Alertmanager: 1-2GB (alerts are small) - Docker Registry: 5-10GB (homelab with ~20 images) - PostgreSQL: 10-20GB per database - Redis: 1-5GB (if persisted)
Quick Reference¶
Check storage status:
# Longhorn capacity
kubectl get nodes.longhorn.io -n longhorn-system
# PVCs by namespace
kubectl get pvc -A
# Storage usage in Longhorn UI
open http://10.89.97.210
Verify persistence:
# Check pod volumes
kubectl describe pod <name> -n <namespace> | grep -A 10 "Volumes:"
# Should see persistentVolumeClaim, NOT emptyDir
Common fix commands:
# Reinstall Helm chart with correct values
helm uninstall <release> -n <namespace>
helm install <release> <chart> -n <namespace> --values <values.yaml>
# Verify PVCs created
kubectl get pvc -n <namespace>
# Check pod is using PVC
kubectl describe pod <name> -n <namespace>
Related Documentation¶
- Troubleshooting Guide - Pod and storage issues
- Core Infrastructure - Initial setup
- Longhorn Documentation