Skip to content

Common Issues

Symptom-driven troubleshooting for the most frequently encountered problems.

PVC Stuck in Pending

Symptom: PVC stays Pending and never binds to a PV.

Possible causes:

Cause Diagnosis Fix
No free space on any node kubectl get msnd — check FREE column Add drives or delete unused volumes
No nodes with matching NVMe drives kubectl get msnd — no resources listed Verify NVMe drives exist and node pods are running
StorageClass misconfigured kubectl describe pvc <name> — check events Verify provisioner is io.meshstor.csi.mesh
Controller not running kubectl -n meshstor get pods — controller pod missing or crash-looping Check controller logs: kubectl -n meshstor logs statefulset/meshstor-csi-controller -c csi-plugin

Volume Stuck in Requested Phase

Symptom: kubectl get msvol shows phase Requested for more than a few minutes.

Cause: Remote nodes cannot create the requested partitions. The controller selected nodes, but those nodes haven't acted on the request.

Diagnosis:

# Check which nodes were selected for partitions
kubectl get msvol <name> -o jsonpath='{range .status.partitions[*]}{.nodeID} {.state}{"\n"}{end}'

Fix:

  • Verify the target node pods are running: kubectl -n meshstor get pods -o wide | grep <node-name>
  • Check node pod logs for errors: kubectl -n meshstor logs <node-pod> -c csi-plugin | tail -50
  • Verify NVMe-oF kernel modules are loaded on the target node: ls /sys/kernel/config/nvmet/
  • Verify the target node has free space: kubectl get msnd | grep <node-name>

Volume Stuck in Syncing Phase

Symptom: kubectl get msvol shows phase Syncing with a syncPercentage that isn't progressing.

Cause: MD RAID rebuild is stalled, often due to a disconnected or slow member.

Diagnosis:

# Watch sync progress
kubectl get msvol <name> -w

# Check partition states — look for Missing or Faulty
kubectl get msvol <name> -o jsonpath='{range .status.partitions[*]}{.nodeID} {.state}{"\n"}{end}'

Fix:

  • If a partition is Missing: the remote node may be down or NVMe-oF connection dropped. Check node status and network connectivity.
  • If sync percentage is advancing slowly: this is normal for large volumes. MD rebuild speed depends on I/O load and network throughput.
  • If sync percentage is stuck at 0%: check if the remote partition is actually connected. Look for NVMe-oF errors in the node pod logs.

Mount Fails with Unavailable

Symptom: Pod cannot start, events show FailedMount with gRPC code Unavailable.

Cause: No partitions are reachable — neither local nor remote.

Diagnosis:

# Check volume partition states
kubectl get msvol <name> -o yaml | grep -A5 partitions

# Check node connectivity
kubectl -n meshstor logs <node-pod> -c csi-plugin | grep -i "nvme\|connect\|import"

Fix:

  • Verify NVMe-oF node annotations are set: kubectl get node <name> -o jsonpath='{.metadata.annotations}'
  • Verify ports 4420/4421 are open between nodes
  • Verify the remote node's NVMe-oF target is running: ls /sys/kernel/config/nvmet/subsystems/

Volume Expansion Fails

Symptom: PVC resize request is rejected or stays in Expanding phase.

Possible causes:

Cause Diagnosis Fix
drivesPerCopy > 1 Check StorageClass parameters Expansion is only supported when drivesPerCopy=1. This is a design limitation.
Not enough free space after partition kubectl get msnd — check FREE on the partition's node Free space on the same drive must be contiguous and large enough for the growth
Volume not yet synced kubectl get msvol <name> — phase is not Synced Wait for the volume to finish syncing before expanding

Pod Evicted from Node

Symptom: Pod is unexpectedly evicted with a MeshStor-related event.

Cause: MeshStor evicts pods to enforce single-node-writer semantics. This happens when:

  • A volume is mounted on a node, but the volume's owner node changes (e.g., during node failure and recovery)
  • An unmanaged pod (no controller like Deployment/StatefulSet) is using a volume that needs to be unmounted

Fix:

  • For managed pods (Deployments, StatefulSets): the pod will be rescheduled automatically
  • For unmanaged pods: recreate the pod after the volume has been safely unmounted

Gathering Diagnostics

When filing a bug report, collect:

# Cluster state
kubectl get nodes -o wide
kubectl get -A pods,pvc,pv,msvol,msnd

# Controller logs
kubectl -n meshstor logs statefulset/meshstor-csi-controller -c csi-plugin --tail=200

# Node logs (for the affected node)
kubectl -n meshstor logs <node-pod> -c csi-plugin --tail=200

# Volume detail
kubectl get msvol <volume-name> -o yaml

# Events
kubectl get events --sort-by='.lastTimestamp' | grep -i meshstor

What's Next