Skip to content

Monitoring

MeshStor exposes volume health through Kubernetes custom resources. This page explains what to watch and how to set up alerts.

Volume Health

The primary health indicator is the MeshStorVolume (msvol) custom resource:

kubectl get msvol
NAME            PHASE    MDSTATE    TOTAL   ACTIVE   FAILED   DOWN   SYNC     AGE
pvc-abc123..    Synced   active     2       2        0        0               1h
pvc-def456..    Syncing  recovering 2       1        0        1      45.2%    30m
pvc-ghi789..    Synced   degraded   3       2        1        0               2h

Key Fields

Field Healthy Value Alert When
Phase Synced Stuck in Requested, Syncing, Expanding, or Replacing for more than 10 minutes
MDState active or clean degraded, recovering, or missing
ActiveDevices Equal to TotalDevices Less than TotalDevices
FailedDevices 0 Greater than 0
DownDevices 0 Greater than 0
SyncPercentage Empty (fully synced) Present for extended periods (rebuild stalled)

Detailed Volume Inspection

For a specific volume, inspect the partition-level status:

kubectl get msvol <volume-name> -o yaml

Key fields in .status.partitions[]:

Field Description
nodeID Node hosting this partition
state Requested, Created, Syncing, Synced, Faulty, Missing, Replacing, Deleting
sizeBytes Partition size
updatedAt Last state change timestamp

Device Health

Monitor NVMe drive usage and free space via MeshStorNodeDevice (msnd):

kubectl get msnd
NAME              NODE    DEVICE    MODEL              SIZE      FREE       VOLS   AGE
node1-nvme0n1     node1   nvme0n1   Samsung 990 PRO    1.0TB     200.0GB    8      7d
node2-nvme0n1     node2   nvme0n1   Samsung 990 PRO    1.0TB     800.0GB    2      7d

What to Watch

Condition Action
FREE approaching 0 Add drives or rebalance volumes. New volumes will fail to provision if no free space exists.
VOLS much higher on one node Placement is skewed — check if node labels or network issues are limiting placement options.
Device disappears from msnd Node may have lost the drive or the node pod is not running.

Volume Conditions

CSI volume conditions are reported to Kubernetes and visible in PV events:

kubectl describe pv <pv-name>

MeshStor sets abnormal: true when the MD state is not active or clean, and includes details about failed or down devices.

Alerting Recommendations

Critical

  • Any volume with FailedDevices > 0 — a member has been permanently marked faulty
  • Any volume stuck in Replacing for more than 30 minutes — replacement may be blocked

Warning

  • Any volume with DownDevices > 0 — a member is temporarily unreachable
  • Any device with less than 10% free space — new volumes may fail to provision
  • Any volume in Syncing phase for more than 1 hour — rebuild may be stalled

Example: Watch All Volumes

kubectl get msvol -w

This streams changes as they happen — useful for monitoring during maintenance windows.

What's Next