Monitoring¶

MeshStor exposes volume health through Kubernetes custom resources. This page explains what to watch and how to set up alerts.

Volume Health¶

The primary health indicator is the MeshStorVolume (msvol) custom resource:

kubectl get msvol

NAME            PHASE    MDSTATE    TOTAL   ACTIVE   FAILED   DOWN   SYNC     AGE
pvc-abc123..    Synced   active     2       2        0        0               1h
pvc-def456..    Syncing  recovering 2       1        0        1      45.2%    30m
pvc-ghi789..    Synced   degraded   3       2        1        0               2h

Key Fields¶

Field	Healthy Value	Alert When
`Phase`	`Synced`	Stuck in `Requested`, `Syncing`, `Expanding`, or `Replacing` for more than 10 minutes
`MDState`	`active` or `clean`	`degraded`, `recovering`, or missing
`ActiveDevices`	Equal to `TotalDevices`	Less than `TotalDevices`
`FailedDevices`	`0`	Greater than `0`
`DownDevices`	`0`	Greater than `0`
`SyncPercentage`	Empty (fully synced)	Present for extended periods (rebuild stalled)

Detailed Volume Inspection¶

For a specific volume, inspect the partition-level status:

kubectl get msvol <volume-name> -o yaml

Key fields in .status.partitions[]:

Field	Description
`nodeID`	Node hosting this partition
`state`	`Requested`, `Created`, `Syncing`, `Synced`, `Faulty`, `Missing`, `Replacing`, `Deleting`
`sizeBytes`	Partition size
`updatedAt`	Last state change timestamp

Device Health¶

Monitor NVMe drive usage and free space via MeshStorNodeDevice (msnd):

kubectl get msnd

NAME              NODE    DEVICE    MODEL              SIZE      FREE       VOLS   AGE
node1-nvme0n1     node1   nvme0n1   Samsung 990 PRO    1.0TB     200.0GB    8      7d
node2-nvme0n1     node2   nvme0n1   Samsung 990 PRO    1.0TB     800.0GB    2      7d

What to Watch¶

Condition	Action
`FREE` approaching 0	Add drives or rebalance volumes. New volumes will fail to provision if no free space exists.
`VOLS` much higher on one node	Placement is skewed — check if node labels or network issues are limiting placement options.
Device disappears from `msnd`	Node may have lost the drive or the node pod is not running.

Volume Conditions¶

CSI volume conditions are reported to Kubernetes and visible in PV events:

kubectl describe pv <pv-name>

MeshStor sets abnormal: true when the MD state is not active or clean, and includes details about failed or down devices.

Alerting Recommendations¶

Critical¶

Any volume with FailedDevices > 0 — a member has been permanently marked faulty
Any volume stuck in Replacing for more than 30 minutes — replacement may be blocked

Warning¶

Any volume with DownDevices > 0 — a member is temporarily unreachable
Any device with less than 10% free space — new volumes may fail to provision
Any volume in Syncing phase for more than 1 hour — rebuild may be stalled

Example: Watch All Volumes¶

kubectl get msvol -w

This streams changes as they happen — useful for monitoring during maintenance windows.

What's Next¶

Self-Healing — automatic recovery from node and network failures
Volume Relocation — how volumes move between nodes
Volume Expansion — grow a volume online without downtime
Common Issues — troubleshoot problems you discover while monitoring