Expanding Cluster Storage¶

Guide for expanding k3s cluster storage capacity by resizing VM disks.

When to Expand¶

Expand storage when: - Current allocation approaches usable capacity - Planning to add new stateful applications - Longhorn shows low disk space warnings - PVCs fail to provision due to insufficient storage

Check current usage:

# Via Longhorn UI
open http://10.89.97.210
# Navigate to Node tab, check "Allocatable" vs "Reserved"

# Via kubectl
kubectl get nodes.longhorn.io -n longhorn-system -o yaml | \
  grep -E "storageAvailable|storageMaximum"

# Check PVC allocations
kubectl get pvc -A -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
SIZE:.spec.resources.requests.storage

Tower Fleet Current Configuration¶

After expansion (November 2025): - VM disk size: 150GB per node - Raw capacity: 450GB (3 nodes × 150GB) - Usable capacity: 225GB (2-replica = 50% overhead) - Reserved: 56GB (25% minimum free space) - Available for allocation: 169GB

Current allocations:

Service              Size    × Replicas   Total
─────────────────────────────────────────────────
Prometheus           30GB    × 2          60GB
Grafana              5GB     × 2          10GB
Alertmanager         2GB     × 2          4GB
Docker Registry      10GB    × 2          20GB
─────────────────────────────────────────────────
Total Used:                               94GB
Remaining:                                75GB

Expansion Process¶

Prerequisites¶

Proxmox VE access (root on host)
SSH access to all k3s nodes
Sufficient space in ZFS pool

Check ZFS availability:

pvesm status | grep local-zfs
# Should show plenty of "Available" space

Step 1: Resize VMs at Proxmox Level¶

This operation is instant (just metadata change) and safe (no downtime):

# Resize all three k3s VMs (example: add 70GB to current 80GB = 150GB)
qm resize 201 scsi0 +70G
qm resize 202 scsi0 +70G
qm resize 203 scsi0 +70G

# Verify new sizes
qm config 201 | grep scsi0
qm config 202 | grep scsi0
qm config 203 | grep scsi0

# Expected output:
# scsi0: local-zfs:vm-201-disk-0,size=150G
# scsi0: local-zfs:vm-202-disk-0,size=150G
# scsi0: local-zfs:vm-203-disk-0,size=150G

Notes: - Use +XG notation to add space (e.g., +70G) - VMs remain running during this operation - Pods and services continue running without interruption

Step 2: Extend Partitions Inside VMs¶

Now extend the filesystem to use the new space:

# Extend all three nodes
ssh root@10.89.97.201 'growpart /dev/sda 1 && resize2fs /dev/sda1'
ssh root@10.89.97.202 'growpart /dev/sda 1 && resize2fs /dev/sda1'
ssh root@10.89.97.203 'growpart /dev/sda 1 && resize2fs /dev/sda1'

# Verify expansion
ssh root@10.89.97.201 'df -h / | tail -1'
ssh root@10.89.97.202 'df -h / | tail -1'
ssh root@10.89.97.203 'df -h / | tail -1'

# Expected output (for 150GB disks):
# /dev/sda1       148G  8.0G  134G   6% /

Commands explained: - growpart /dev/sda 1 - Extends partition 1 to fill available space - resize2fs /dev/sda1 - Extends ext4 filesystem to fill partition - Both commands are online (no unmount required)

Step 3: Verify Longhorn Detected New Space¶

Longhorn automatically detects increased disk space within 1-2 minutes:

# Wait for Longhorn to detect (automatic)
sleep 30

# Check new capacity
kubectl get nodes.longhorn.io -n longhorn-system -o yaml | \
  grep -E "name:|storageAvailable:|storageMaximum:"

# Example output (after expanding to 150GB per node):
# name: k3s-master
#   storageAvailable: 149736652800  # ~139GB available
#   storageMaximum: 158312947712    # ~147GB total

Calculate usable capacity:

# Total raw: 147GB × 3 nodes = 441GB
# Usable (2-replica): 441GB ÷ 2 = 220GB
# Reserved (25%): 220GB × 0.25 = 55GB
# Available: 220GB - 55GB = 165GB

Check in Longhorn UI:

open http://10.89.97.210
# Navigate to: Node tab
# Verify "Allocatable" increased

Common Scenarios¶

Scenario 1: Double Current Capacity¶

From 80GB to 160GB per node:

# Add 80GB to each VM
qm resize 201 scsi0 +80G
qm resize 202 scsi0 +80G
qm resize 203 scsi0 +80G

# Extend filesystems
ssh root@10.89.97.201 'growpart /dev/sda 1 && resize2fs /dev/sda1'
ssh root@10.89.97.202 'growpart /dev/sda 1 && resize2fs /dev/sda1'
ssh root@10.89.97.203 'growpart /dev/sda 1 && resize2fs /dev/sda1'

Result: 480GB raw → 240GB usable

Scenario 2: Add 50GB to Each Node¶

qm resize 201 scsi0 +50G
qm resize 202 scsi0 +50G
qm resize 203 scsi0 +50G

ssh root@10.89.97.201 'growpart /dev/sda 1 && resize2fs /dev/sda1'
ssh root@10.89.97.202 'growpart /dev/sda 1 && resize2fs /dev/sda1'
ssh root@10.89.97.203 'growpart /dev/sda 1 && resize2fs /dev/sda1'

Scenario 3: Expand Single Node (Not Recommended)¶

While technically possible, uneven node sizes complicate capacity planning. Longhorn will only use the smallest common capacity for replica placement.

Better approach: Expand all nodes equally.

Troubleshooting¶

Issue: growpart Says "NOCHANGE"¶

Symptoms:

NOCHANGE: partition 1 is size X. it cannot be grown

Cause: Proxmox resize didn't complete or partition already expanded.

Fix:

# Check if Proxmox resize was applied
qm config <vmid> | grep scsi0

# Check current partition size
ssh root@<node-ip> 'parted /dev/sda print'

# If sizes match, filesystem might need extending
ssh root@<node-ip> 'resize2fs /dev/sda1'

Issue: Longhorn Not Detecting New Space¶

Symptoms: - Filesystem shows 150GB - Longhorn still reports old capacity

Fix:

# Check node-disk status
kubectl get nodes.longhorn.io -n longhorn-system

# Restart Longhorn manager (it will re-scan)
kubectl rollout restart daemonset longhorn-manager -n longhorn-system

# Wait 2 minutes for discovery
sleep 120

# Re-check capacity
kubectl get nodes.longhorn.io -n longhorn-system -o yaml | \
  grep storageMaximum

Issue: PVCs Still Fail After Expansion¶

Cause: Existing PVC requests might still exceed new capacity with replica overhead.

Fix: 1. Calculate total needed: PVC size × replica count 2. Ensure available capacity > total needed 3. Check Longhorn over-provisioning setting:

kubectl get setting storage-over-provisioning-percentage \
  -n longhorn-system -o jsonpath='{.value}'
# Default: 100 (allows 2× over-subscription)

Capacity Planning Guidelines¶

Homelab Scale (3-10 apps)¶

Recommended: 150-200GB per node - Raw: 450-600GB - Usable: 225-300GB - Comfortable for databases, registries, monitoring

Medium Scale (10-20 apps)¶

Recommended: 250-300GB per node - Raw: 750-900GB - Usable: 375-450GB - Multiple databases, large registries, extensive logging

Large Scale (20+ apps)¶

Recommended: 400GB+ per node or add more nodes - Consider adding 4th node for better redundancy - Or migrate large storage needs to NFS/external storage

Expansion History¶

November 10, 2025¶

Change: Expanded from 80GB to 150GB per node

Reason: - Initial 80GB allocation was 94GB (over capacity) - Insufficient room for upcoming applications - Supabase PostgreSQL needs 10-20GB - Future growth planning

Result: - Usable capacity increased from 90GB to 169GB - Comfortable headroom for 10+ applications - No downtime during expansion

Commands used:

qm resize 201 scsi0 +70G
qm resize 202 scsi0 +70G
qm resize 203 scsi0 +70G

ssh root@10.89.97.201 'growpart /dev/sda 1 && resize2fs /dev/sda1'
ssh root@10.89.97.202 'growpart /dev/sda 1 && resize2fs /dev/sda1'
ssh root@10.89.97.203 'growpart /dev/sda 1 && resize2fs /dev/sda1'

Storage Verification - Verify PVCs are persistent
Troubleshooting - Storage-related issues
Core Infrastructure - Initial storage setup