Longhorn Storage Monitoring¶

Complete guide for monitoring Longhorn storage usage via Prometheus and Grafana.

Last Updated: 2025-12-08

Quick Access¶

Grafana UI: http://10.89.97.211 Credentials: admin / admin Prometheus: Accessible via kubectl port-forward (ClusterIP only)

Current Storage Status (2025-12-08)¶

By Namespace (Actual Usage)¶

immich:            56Gi  (largest consumer)
monitoring:        31Gi  (Prometheus + Loki)
docker-registry:   10Gi
supabase-sandbox:   0Gi
supabase:           0Gi
romm:               0Gi
otterwiki:          0Gi
authentik:          0Gi
subtitleai:         0Gi

By Node (Disk Usage)¶

k3s-worker-1:  143Gi used  (UNSCHEDULABLE - 93% full)
k3s-worker-2:  127Gi used  (85% full)
k3s-master:     89Gi used  (61% full)

Note: k3s-worker-1 is marked unschedulable due to disk pressure (276GB scheduled + 30% Longhorn reservation > 295GB capacity).

Prometheus Metrics¶

Longhorn exports metrics to Prometheus via ServiceMonitor:

# Check ServiceMonitor
kubectl get servicemonitor -n longhorn-system
# NAME: longhorn-prometheus-servicemonitor

# Available metrics
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/label/__name__/values' | \
  jq -r '.data[] | select(startswith("longhorn"))'

Key Metrics¶

Volume Metrics: - longhorn_volume_actual_size_bytes - Actual disk space used by volume - longhorn_volume_capacity_bytes - Provisioned volume size - longhorn_volume_state - Volume state (attached=2, detached=1, faulted=0) - longhorn_volume_robustness - Volume health (healthy=2, degraded=1, faulted=0)

Node/Disk Metrics: - longhorn_disk_capacity_bytes - Total disk capacity - longhorn_disk_usage_bytes - Actual disk usage - longhorn_disk_reservation_bytes - Reserved space for Longhorn overhead - longhorn_disk_status - Disk schedulability (schedulable=1, unschedulable=0)

Replica Metrics: - longhorn_replica_actual_size_bytes - Replica disk usage - longhorn_replica_state - Replica state

Querying Prometheus¶

From Prometheus Pod¶

# Storage by namespace
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/query?query=sum(longhorn_volume_actual_size_bytes)by(pvc_namespace)' | \
  jq -r '.data.result[] | "\(.metric.pvc_namespace): \((.value[1] | tonumber / 1024 / 1024 / 1024) | floor)Gi"'

# Disk usage by node
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/query?query=longhorn_disk_usage_bytes' | \
  jq -r '.data.result[] | "\(.metric.node): \((.value[1] | tonumber / 1024 / 1024 / 1024) | floor)Gi used"'

# Unschedulable nodes
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/query?query=longhorn_disk_status{condition="schedulable"}==0' | \
  jq -r '.data.result[] | .metric.node'

Via kubectl port-forward¶

# Port-forward Prometheus
kubectl port-forward -n monitoring prometheus-kube-prometheus-stack-prometheus-0 9090:9090 &

# Query from host
curl -s 'http://localhost:9090/api/v1/query?query=sum(longhorn_volume_actual_size_bytes)by(pvc_namespace)' | jq

# Stop port-forward
pkill -f 'kubectl port-forward.*prometheus'

Grafana Dashboards¶

Accessing Grafana¶

Open Grafana: http://10.89.97.211
Login: admin / admin
Navigate: Dashboards → Browse

Import Longhorn Dashboard¶

Longhorn provides an official Grafana dashboard: https://grafana.com/grafana/dashboards/13032

Import steps:

Open Grafana → Dashboards → New → Import
Enter dashboard ID: 13032
Click "Load"
Select data source: Prometheus
Click "Import"

Dashboard includes: - Volume capacity and usage - Node disk utilization - Replica status - I/O performance metrics - Snapshot usage

Useful Grafana Queries¶

Top 10 volumes by size:

topk(10, longhorn_volume_actual_size_bytes)

Storage by namespace:

sum(longhorn_volume_actual_size_bytes) by (pvc_namespace)

Disk usage percentage:

(longhorn_disk_usage_bytes / longhorn_disk_capacity_bytes) * 100

Faulted volumes:

count(longhorn_volume_robustness == 0)

Unschedulable nodes:

longhorn_disk_status{condition="schedulable"} == 0

CLI Tools¶

Longhorn Storage Usage Script¶

Location: /root/scripts/longhorn-storage-usage.sh

#!/bin/bash
# Quick storage overview

echo "=== Storage by Namespace ==="
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/query?query=sum(longhorn_volume_actual_size_bytes)by(pvc_namespace)' | \
  jq -r '.data.result[] | "\(.metric.pvc_namespace): \((.value[1] | tonumber / 1024 / 1024 / 1024) | floor)Gi"' | \
  sort -t: -k2 -rn

echo ""
echo "=== Disk Usage by Node ==="
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/query?query=longhorn_disk_usage_bytes' | \
  jq -r '.data.result[] | "\(.metric.node): \((.value[1] | tonumber / 1024 / 1024 / 1024) | floor)Gi used"'

echo ""
echo "=== Unschedulable Nodes ==="
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/query?query=longhorn_disk_status{condition="schedulable"}==0' | \
  jq -r '.data.result[] | .metric.node' || echo "None"

Usage:

chmod +x /root/scripts/longhorn-storage-usage.sh
/root/scripts/longhorn-storage-usage.sh

Quick Commands¶

# Total storage used
kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/query?query=sum(longhorn_volume_actual_size_bytes)' | \
  jq -r '.data.result[0].value[1] | tonumber / 1024 / 1024 / 1024 | floor'

# Largest volumes
kubectl get pvc -A --sort-by=.spec.resources.requests.storage | tail -10

# Node disk usage
kubectl get nodes.longhorn.io -n longhorn-system -o json | \
  jq -r '.items[] | .metadata.name as $n | .status.diskStatus | to_entries[] |
  "\($n): \(.value.storageScheduled / 1024 / 1024 / 1024 | floor)GB / \(.value.storageMaximum / 1024 / 1024 / 1024 | floor)GB"'

Alerting¶

Prometheus Alerts¶

Check if Longhorn alerts are configured:

kubectl get prometheusrules -A | grep longhorn

Common alerts to configure:

Disk Space Warning - Node >80% full
Disk Space Critical - Node >90% full
Volume Faulted - Any volume in faulted state
Unschedulable Node - Node cannot schedule new volumes

Creating Custom Alert¶

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: longhorn-storage-alerts
  namespace: longhorn-system
spec:
  groups:
  - name: longhorn-storage
    interval: 30s
    rules:
    - alert: LonghornDiskSpaceCritical
      expr: (longhorn_disk_usage_bytes / longhorn_disk_capacity_bytes) * 100 > 90
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Longhorn disk space critical on {{ $labels.node }}"
        description: "Disk usage is {{ $value }}% on node {{ $labels.node }}"

    - alert: LonghornVolumeFaulted
      expr: longhorn_volume_robustness == 0
      for: 2m
      labels:
        severity: critical
      annotations:
        summary: "Longhorn volume {{ $labels.volume }} is faulted"
        description: "Volume {{ $labels.volume }} in namespace {{ $labels.pvc_namespace }} is in faulted state"

Troubleshooting¶

Issue: Prometheus Not Scraping Longhorn¶

Check ServiceMonitor:

kubectl get servicemonitor -n longhorn-system longhorn-prometheus-servicemonitor -o yaml

Check if Prometheus can reach Longhorn:

kubectl logs -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus | grep longhorn

Issue: Missing Metrics in Grafana¶

Check Prometheus data source: 1. Grafana → Configuration → Data Sources 2. Select "Prometheus" 3. Click "Test" - should show "Data source is working"

Verify metrics exist:

kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/query?query=up{job="longhorn-backend"}' | jq

Issue: Grafana Dashboard Shows No Data¶

Check time range: Ensure Grafana time range includes recent data (e.g., Last 1 hour)

Check query: Go to panel → Edit → View query inspector

Verify metric availability:

kubectl exec -n monitoring prometheus-kube-prometheus-stack-prometheus-0 -c prometheus -- \
  wget -qO- 'http://localhost:9090/api/v1/label/__name__/values' | grep longhorn_volume

Change Log¶

2025-12-08: Initial version with Prometheus queries and Grafana setup