Longhorn Storage Monitoring¶
Complete guide for monitoring Longhorn storage usage via Prometheus and Grafana.
Last Updated: 2025-12-08
Quick Access¶
Grafana UI: http://10.89.97.211 Credentials: admin / admin Prometheus: Accessible via kubectl port-forward (ClusterIP only)
Current Storage Status (2025-12-08)¶
By Namespace (Actual Usage)¶
immich: 56Gi (largest consumer)
monitoring: 31Gi (Prometheus + Loki)
docker-registry: 10Gi
supabase-sandbox: 0Gi
supabase: 0Gi
romm: 0Gi
otterwiki: 0Gi
authentik: 0Gi
subtitleai: 0Gi
By Node (Disk Usage)¶
k3s-worker-1: 143Gi used (UNSCHEDULABLE - 93% full)
k3s-worker-2: 127Gi used (85% full)
k3s-master: 89Gi used (61% full)
Note: k3s-worker-1 is marked unschedulable due to disk pressure (276GB scheduled + 30% Longhorn reservation > 295GB capacity).
Prometheus Metrics¶
Longhorn exports metrics to Prometheus via ServiceMonitor:
# Check ServiceMonitor
kubectl get servicemonitor -n longhorn-system
# NAME: longhorn-prometheus-servicemonitor
# Available metrics
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/label/__name__/values' | \
jq -r '.data[] | select(startswith("longhorn"))'
Key Metrics¶
Volume Metrics:
- longhorn_volume_actual_size_bytes - Actual disk space used by volume
- longhorn_volume_capacity_bytes - Provisioned volume size
- longhorn_volume_state - Volume state (attached=2, detached=1, faulted=0)
- longhorn_volume_robustness - Volume health (healthy=2, degraded=1, faulted=0)
Node/Disk Metrics:
- longhorn_disk_capacity_bytes - Total disk capacity
- longhorn_disk_usage_bytes - Actual disk usage
- longhorn_disk_reservation_bytes - Reserved space for Longhorn overhead
- longhorn_disk_status - Disk schedulability (schedulable=1, unschedulable=0)
Replica Metrics:
- longhorn_replica_actual_size_bytes - Replica disk usage
- longhorn_replica_state - Replica state
Querying Prometheus¶
From Prometheus Pod¶
# Storage by namespace
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=sum(longhorn_volume_actual_size_bytes)by(pvc_namespace)' | \
jq -r '.data.result[] | "\(.metric.pvc_namespace): \((.value[1] | tonumber / 1024 / 1024 / 1024) | floor)Gi"'
# Disk usage by node
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=longhorn_disk_usage_bytes' | \
jq -r '.data.result[] | "\(.metric.node): \((.value[1] | tonumber / 1024 / 1024 / 1024) | floor)Gi used"'
# Unschedulable nodes
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=longhorn_disk_status{condition="schedulable"}==0' | \
jq -r '.data.result[] | .metric.node'
Via kubectl port-forward¶
# Port-forward Prometheus
kubectl port-forward -n monitoring prometheus-kube-prometheus-stack-prometheus-0 9090:9090 &
# Query from host
curl -s 'http://localhost:9090/api/v1/query?query=sum(longhorn_volume_actual_size_bytes)by(pvc_namespace)' | jq
# Stop port-forward
pkill -f 'kubectl port-forward.*prometheus'
Grafana Dashboards¶
Accessing Grafana¶
- Open Grafana: http://10.89.97.211
- Login: admin / admin
- Navigate: Dashboards → Browse
Import Longhorn Dashboard¶
Longhorn provides an official Grafana dashboard: https://grafana.com/grafana/dashboards/13032
Import steps:
- Open Grafana → Dashboards → New → Import
- Enter dashboard ID: 13032
- Click "Load"
- Select data source: Prometheus
- Click "Import"
Dashboard includes: - Volume capacity and usage - Node disk utilization - Replica status - I/O performance metrics - Snapshot usage
Useful Grafana Queries¶
Top 10 volumes by size:
Storage by namespace:
Disk usage percentage:
Faulted volumes:
Unschedulable nodes:
CLI Tools¶
Longhorn Storage Usage Script¶
Location: /root/scripts/longhorn-storage-usage.sh
#!/bin/bash
# Quick storage overview
echo "=== Storage by Namespace ==="
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=sum(longhorn_volume_actual_size_bytes)by(pvc_namespace)' | \
jq -r '.data.result[] | "\(.metric.pvc_namespace): \((.value[1] | tonumber / 1024 / 1024 / 1024) | floor)Gi"' | \
sort -t: -k2 -rn
echo ""
echo "=== Disk Usage by Node ==="
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=longhorn_disk_usage_bytes' | \
jq -r '.data.result[] | "\(.metric.node): \((.value[1] | tonumber / 1024 / 1024 / 1024) | floor)Gi used"'
echo ""
echo "=== Unschedulable Nodes ==="
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=longhorn_disk_status{condition="schedulable"}==0' | \
jq -r '.data.result[] | .metric.node' || echo "None"
Usage:
Quick Commands¶
# Total storage used
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=sum(longhorn_volume_actual_size_bytes)' | \
jq -r '.data.result[0].value[1] | tonumber / 1024 / 1024 / 1024 | floor'
# Largest volumes
kubectl get pvc -A --sort-by=.spec.resources.requests.storage | tail -10
# Node disk usage
kubectl get nodes.longhorn.io -n longhorn-system -o json | \
jq -r '.items[] | .metadata.name as $n | .status.diskStatus | to_entries[] |
"\($n): \(.value.storageScheduled / 1024 / 1024 / 1024 | floor)GB / \(.value.storageMaximum / 1024 / 1024 / 1024 | floor)GB"'
Alerting¶
Prometheus Alerts¶
Check if Longhorn alerts are configured:
Common alerts to configure:
- Disk Space Warning - Node >80% full
- Disk Space Critical - Node >90% full
- Volume Faulted - Any volume in faulted state
- Unschedulable Node - Node cannot schedule new volumes
Creating Custom Alert¶
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: longhorn-storage-alerts
namespace: longhorn-system
spec:
groups:
- name: longhorn-storage
interval: 30s
rules:
- alert: LonghornDiskSpaceCritical
expr: (longhorn_disk_usage_bytes / longhorn_disk_capacity_bytes) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "Longhorn disk space critical on {{ $labels.node }}"
description: "Disk usage is {{ $value }}% on node {{ $labels.node }}"
- alert: LonghornVolumeFaulted
expr: longhorn_volume_robustness == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Longhorn volume {{ $labels.volume }} is faulted"
description: "Volume {{ $labels.volume }} in namespace {{ $labels.pvc_namespace }} is in faulted state"
Troubleshooting¶
Issue: Prometheus Not Scraping Longhorn¶
Check ServiceMonitor:
Check if Prometheus can reach Longhorn:
kubectl logs -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus | grep longhorn
Issue: Missing Metrics in Grafana¶
Check Prometheus data source: 1. Grafana → Configuration → Data Sources 2. Select "Prometheus" 3. Click "Test" - should show "Data source is working"
Verify metrics exist:
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=up{job="longhorn-backend"}' | jq
Issue: Grafana Dashboard Shows No Data¶
Check time range: Ensure Grafana time range includes recent data (e.g., Last 1 hour)
Check query: Go to panel → Edit → View query inspector
Verify metric availability:
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
wget -qO- 'http://localhost:9090/api/v1/label/__name__/values' | grep longhorn_volume
Related Documentation¶
- Post-Reboot Recovery
- Longhorn Storage Usage Script
- Prometheus Operator Documentation
- Longhorn Monitoring Guide
Change Log¶
- 2025-12-08: Initial version with Prometheus queries and Grafana setup