Volume Relocation¶
MeshStor can move volume data between nodes without downtime. Relocation happens automatically during node drain and member replacement. This page covers all relocation scenarios, how to observe them, and how to troubleshoot common issues.
How Relocation Is Possible¶
Even with numberOfCopies=1, MeshStor creates a 2-slot RAID1 array: one slot holds the active partition, the other is a placeholder ("missing"). This placeholder slot allows data migration — a new partition fills the empty slot, syncs from the existing member, and then the original member is removed.
With numberOfCopies>=2, the array already has multiple active members. A new partition is added as a spare, and mdadm --replace gracefully swaps it in without any degraded window.
Consumer Node Drain¶
When you drain the node that hosts the MD RAID device (the consumer node), MeshStor migrates the volume to whichever node Kubernetes reschedules the pod to.
What Happens¶
sequenceDiagram
participant K as Kubernetes
participant Old as Old Node
participant New as New Node
participant R as Reconciler (Old)
K->>Old: kubectl drain (evicts pod)
Old->>Old: NodeUnstageVolume: clear NodeName, stop MD
K->>New: Schedule pod on new node
New->>New: NodeStageVolume (attempt 1): no remote partitions yet, set NodeName=New, fail
R->>R: Sees NodeName=New, exports partition via NVMe-oF
New->>New: NodeStageVolume (retry): connect remote, assemble MD, create local partition, add to MD
New->>New: MD syncs data from remote to local
New->>New: Reconciler: detect excess member, remove old partition
kubectl drainevicts the pod from the consumer node.- NodeUnstageVolume on the old node clears
NodeNamein the CR and stops the MD device. The partition remains on disk. - NodeStageVolume on the new node connects to the old partition via NVMe-oF, assembles the MD device from it, creates a new local partition, and adds it to the array. MD begins syncing data.
- The reconciler detects more members than expected, waits for all to be in sync, then removes the old remote partition.
numberOfCopies=1 vs numberOfCopies>=2¶
| Aspect | numberOfCopies=1 | numberOfCopies>=2 |
|---|---|---|
| MD state during migration | 2 active members (was [local, missing]) | numberOfCopies+1 members (local added as spare) |
| Removal method | mdadm --fail + mdadm --remove (instant) |
mdadm --replace with spare (graceful) |
| Degraded window | Brief moment during fail+remove | None — spare syncs before old member is removed |
| Final state | [local, missing] — same as before drain | numberOfCopies active members |
numberOfCopies=1 State Diagram¶
flowchart LR
subgraph "Before Drain"
A1["Node A: [local, missing]<br/>Pod running"]
end
subgraph "During Migration"
B1["Node B: [remote(A), local(B)]<br/>MD syncing"]
end
subgraph "After Cleanup"
C1["Node B: [local, missing]<br/>Pod running<br/>Node A partition removed"]
end
A1 -->|"kubectl drain A"| B1
B1 -->|"sync complete"| C1
Note
The new node is selected by Kubernetes pod scheduling, not by MeshStor. MeshStor creates a local partition on whichever node the pod lands on.
Member Replacement¶
When a partition is permanently lost — node offline, drive failure, or persistent network partition — MeshStor automatically replaces it after a configurable timeout. See Self-Healing: Automatic Replacement for the full flow, state transitions, and tuning options.
Provider Node Drain¶
Draining a node that only provides remote partitions (not the MD device host) has no immediate effect on volumes:
kubectl drain --ignore-daemonsetskeeps the MeshStor DaemonSet pod running.- The DaemonSet continues exporting partitions via NVMe-oF.
- The consumer node maintains its connections normally.
If the provider node is later shut down or decommissioned, the exported partitions become unreachable. This is handled by the self-healing flow after memberMissingTimeout.
Observing Relocation¶
Watch Volume Phases¶
During a drain migration, you will see:
NAME PHASE MDSTATE SYNC
my-volume Synced active
my-volume Syncing recovering 12.5%
my-volume Replacing recovering 67.3%
my-volume Synced active
Inspect Partition Details¶
kubectl get msvol my-volume -o jsonpath='{range .status.partitions[*]}{.nodeID}{"\t"}{.state}{"\n"}{end}'
During migration (both old and new partitions visible):
After cleanup:
Check Sync Progress¶
Troubleshooting¶
Volume Stuck in Replacing¶
The old partition is not being removed after drain migration.
- Check sync progress: the old member is only removed after the new member finishes syncing. Watch
kubectl get msvol -wfor sync progress. - Check NVMe-oF connectivity: verify the consumer node can reach the replacement node on the NVMe-oF port (TCP 4420 or RDMA 4421).
- Check the reconciler is running: verify the MeshStor DaemonSet pod is healthy on the consumer node:
kubectl get pods -n meshstor -o wide.
No Replacement Node Available¶
All nodes already host a partition for this volume, or no node has sufficient free space.
- Check available capacity:
kubectl get msnd— look for nodes with free space. - Add capacity: add a new node with NVMe drives, or free space on existing nodes by deleting unused volumes.
Partition Stuck in Missing¶
The partition is marked Missing but replacement has not started.
- Check the timeout:
kubectl get msvol my-volume -o jsonpath='{.spec.memberMissingTimeout}'— replacement waits for this timeout to expire. - Check the timestamp:
kubectl get msvol my-volume -o yaml— inspect.status.partitions[].updatedAtto see when the partition was marked Missing. - Check the consumer DaemonSet: the reconciler runs on the consumer node. If that pod is unhealthy, replacement will not trigger.
What's Next¶
- Self-Healing — automatic recovery from node and network failures
- Volume Expansion — grow a volume while the pod is running
- Monitoring — observe relocation progress and health
- Replication — how MD RAID replication enables non-disruptive moves