Skip to content

Volume Relocation

MeshStor can move volume data between nodes without downtime. Relocation happens automatically during node drain and member replacement. This page covers all relocation scenarios, how to observe them, and how to troubleshoot common issues.

How Relocation Is Possible

Even with numberOfCopies=1, MeshStor creates a 2-slot RAID1 array: one slot holds the active partition, the other is a placeholder ("missing"). This placeholder slot allows data migration — a new partition fills the empty slot, syncs from the existing member, and then the original member is removed.

With numberOfCopies>=2, the array already has multiple active members. A new partition is added as a spare, and mdadm --replace gracefully swaps it in without any degraded window.

Consumer Node Drain

When you drain the node that hosts the MD RAID device (the consumer node), MeshStor migrates the volume to whichever node Kubernetes reschedules the pod to.

What Happens

sequenceDiagram
    participant K as Kubernetes
    participant Old as Old Node
    participant New as New Node
    participant R as Reconciler (Old)

    K->>Old: kubectl drain (evicts pod)
    Old->>Old: NodeUnstageVolume: clear NodeName, stop MD
    K->>New: Schedule pod on new node
    New->>New: NodeStageVolume (attempt 1): no remote partitions yet, set NodeName=New, fail
    R->>R: Sees NodeName=New, exports partition via NVMe-oF
    New->>New: NodeStageVolume (retry): connect remote, assemble MD, create local partition, add to MD
    New->>New: MD syncs data from remote to local
    New->>New: Reconciler: detect excess member, remove old partition
  1. kubectl drain evicts the pod from the consumer node.
  2. NodeUnstageVolume on the old node clears NodeName in the CR and stops the MD device. The partition remains on disk.
  3. NodeStageVolume on the new node connects to the old partition via NVMe-oF, assembles the MD device from it, creates a new local partition, and adds it to the array. MD begins syncing data.
  4. The reconciler detects more members than expected, waits for all to be in sync, then removes the old remote partition.

numberOfCopies=1 vs numberOfCopies>=2

Aspect numberOfCopies=1 numberOfCopies>=2
MD state during migration 2 active members (was [local, missing]) numberOfCopies+1 members (local added as spare)
Removal method mdadm --fail + mdadm --remove (instant) mdadm --replace with spare (graceful)
Degraded window Brief moment during fail+remove None — spare syncs before old member is removed
Final state [local, missing] — same as before drain numberOfCopies active members

numberOfCopies=1 State Diagram

flowchart LR
    subgraph "Before Drain"
        A1["Node A: [local, missing]<br/>Pod running"]
    end

    subgraph "During Migration"
        B1["Node B: [remote(A), local(B)]<br/>MD syncing"]
    end

    subgraph "After Cleanup"
        C1["Node B: [local, missing]<br/>Pod running<br/>Node A partition removed"]
    end

    A1 -->|"kubectl drain A"| B1
    B1 -->|"sync complete"| C1

Note

The new node is selected by Kubernetes pod scheduling, not by MeshStor. MeshStor creates a local partition on whichever node the pod lands on.

Member Replacement

When a partition is permanently lost — node offline, drive failure, or persistent network partition — MeshStor automatically replaces it after a configurable timeout. See Self-Healing: Automatic Replacement for the full flow, state transitions, and tuning options.

Provider Node Drain

Draining a node that only provides remote partitions (not the MD device host) has no immediate effect on volumes:

  • kubectl drain --ignore-daemonsets keeps the MeshStor DaemonSet pod running.
  • The DaemonSet continues exporting partitions via NVMe-oF.
  • The consumer node maintains its connections normally.

If the provider node is later shut down or decommissioned, the exported partitions become unreachable. This is handled by the self-healing flow after memberMissingTimeout.

Observing Relocation

Watch Volume Phases

kubectl get msvol -w

During a drain migration, you will see:

NAME        PHASE       MDSTATE      SYNC
my-volume   Synced      active
my-volume   Syncing     recovering   12.5%
my-volume   Replacing   recovering   67.3%
my-volume   Synced      active

Inspect Partition Details

kubectl get msvol my-volume -o jsonpath='{range .status.partitions[*]}{.nodeID}{"\t"}{.state}{"\n"}{end}'

During migration (both old and new partitions visible):

node-a    Synced
node-b    Created

After cleanup:

node-b    Synced

Check Sync Progress

kubectl get msvol my-volume -o wide
NAME        PHASE     MDSTATE      TOTAL   ACTIVE   FAILED   DOWN   SYNC      AGE
my-volume   Syncing   recovering   2       1        0        0      45.2%     1h

Troubleshooting

Volume Stuck in Replacing

The old partition is not being removed after drain migration.

  • Check sync progress: the old member is only removed after the new member finishes syncing. Watch kubectl get msvol -w for sync progress.
  • Check NVMe-oF connectivity: verify the consumer node can reach the replacement node on the NVMe-oF port (TCP 4420 or RDMA 4421).
  • Check the reconciler is running: verify the MeshStor DaemonSet pod is healthy on the consumer node: kubectl get pods -n meshstor -o wide.

No Replacement Node Available

All nodes already host a partition for this volume, or no node has sufficient free space.

  • Check available capacity: kubectl get msnd — look for nodes with free space.
  • Add capacity: add a new node with NVMe drives, or free space on existing nodes by deleting unused volumes.

Partition Stuck in Missing

The partition is marked Missing but replacement has not started.

  • Check the timeout: kubectl get msvol my-volume -o jsonpath='{.spec.memberMissingTimeout}' — replacement waits for this timeout to expire.
  • Check the timestamp: kubectl get msvol my-volume -o yaml — inspect .status.partitions[].updatedAt to see when the partition was marked Missing.
  • Check the consumer DaemonSet: the reconciler runs on the consumer node. If that pod is unhealthy, replacement will not trigger.

What's Next

  • Self-Healing — automatic recovery from node and network failures
  • Volume Expansion — grow a volume while the pod is running
  • Monitoring — observe relocation progress and health
  • Replication — how MD RAID replication enables non-disruptive moves