Skip to content

Volume Expansion

MeshStor supports online volume expansion — you can grow a PVC while the pod is running, with no downtime. The driver grows each partition on every node, then expands the MD RAID array and XFS filesystem in place.

How It Works

  1. You patch the PVC with a larger storage request.
  2. The controller validates the request, updates spec.capacityBytes, and sets the volume phase to Expanding.
  3. The consumer node (where the PVC mounted) grows its local partition.
  4. Each remote node (where replica lives) grows its partition independently via the reconciler.
  5. Once all partitions are at the new size, the consumer node runs mdadm --grow and xfs_growfs to expand the MD array and filesystem.
  6. The volume phase returns to Synced.
Sequence diagram
sequenceDiagram
    participant User
    participant API as Kubernetes API
    participant csi-resizer
    participant Controller
    participant Remote as Remote Node
    participant kubelet
    participant Consumer as Consumer Node

    User->>API: Patch PVC with larger size
    API-->>csi-resizer: csi-resizer watch API changes
    csi-resizer->>Controller: ControllerExpandVolume
    Controller->>API: Update CR capacityBytes, set phase=Expanding
    API-->>Remote: kubelet watch API changes
    Remote->>Remote: Reconciler: grow replica partition
    Remote->>API: Update partition SizeBytes in CR
    API-->>kubelet: kubelet watch API changes
    kubelet->>Consumer: NodeExpandVolume
    Consumer->>Consumer: Grow local partition in-place
    Consumer->>Consumer: Verify all partitions expanded
    Consumer->>Consumer: mdadm --grow (expand MD array)
    Consumer->>Consumer: xfs_growfs (expand filesystem)
    Consumer->>API: Set phase=Synced

In-Place Growth vs Replacement

MeshStor first tries to grow each partition in place by extending it into adjacent free space on the same drive. This is fast and requires no data sync.

If a drive lacks adjacent free space, MeshStor creates a replacement partition on another drive on the same node, swaps it into the MD array via mdadm --replace, and removes the old partition. This path is slower because the replacement must sync all data from the other members.

Expansion-replacement shares the state machine that drives ordinary faulty-member replacement. While it runs, the volume's .status.partitions[] can temporarily hold two entries for the same node — the original (still-small) partition and the new grown one — disambiguated by their nvmeofNamespaceID. The old entry is only removed once the new one has finished syncing, so the volume keeps full redundancy throughout the replacement.

Issuing a further resize while an expansion is still in flight is accepted: the driver compares the new target against the current per-partition sizes and converges toward it. Each xfs_growfs is gated on the MD array having all members at the new size, so the filesystem never races ahead of the underlying partitions.

Prerequisites

  • StorageClass must have allowVolumeExpansion: true. All default examples except RAID10 have this enabled.
  • stripeWidth must be 1. Expansion is not supported for striped volumes (stripeWidth > 1).
  • Sufficient free space on the drives hosting partitions. Each partition must be able to grow by the requested delta — either in place, or on another drive of the same node via replacement.

Expanding a Volume

Edit the PVC to request a larger size:

kubectl patch pvc my-pvc -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'

Or use kubectl edit:

kubectl edit pvc my-pvc
# Change spec.resources.requests.storage to the new size

Note

The actual allocated size may be slightly larger than requested due to 1 MiB alignment.

Observing Expansion

Watch Volume Phases

kubectl get msvol -w

Typical output during expansion:

NAME               PHASE       MDSTATE   READY   DEGRADED   SYNC   NODE       AGE
pvc-cd1038a7-...   Expanding   active    2/2     0                 mf-01-02   1h
pvc-cd1038a7-...   Synced      active    2/2     0                 mf-01-02   1h

If replacement partitions are needed (no adjacent free space), you will see a sync phase:

NAME               PHASE       MDSTATE      READY   DEGRADED   SYNC    NODE       AGE
pvc-cd1038a7-...   Expanding   active       2/2     0                  mf-01-02   1h
pvc-cd1038a7-...   Expanding   recovering   1/2     1          12.5%   mf-01-02   1h
pvc-cd1038a7-...   Expanding   recovering   1/2     1          67.3%   mf-01-02   1h
pvc-cd1038a7-...   Expanding   active       2/2     0                  mf-01-02   1h
pvc-cd1038a7-...   Synced      active       2/2     0                  mf-01-02   1h

Check Partition Sizes

Verify that all partitions have been expanded:

kubectl get msvol my-volume -o jsonpath='{range .status.partitions[*]}{.nodeID}{"\t"}{.sizeBytes}{"\t"}{.state}{"\n"}{end}'
node-a    21474836480    Synced
node-b    21474836480    Synced

Verify Filesystem Size

From inside the pod:

df -h /mount/path

The filesystem size should reflect the new capacity (minus a small overhead for XFS metadata).

Constraints and Limitations

Constraint Detail
stripeWidth > 1 Expansion is not supported for striped (RAID10) volumes. Set allowVolumeExpansion: false in the StorageClass.
Shrinking Volume shrinking is not supported. Requests for a smaller size are rejected.
Consecutive expansion A second resize issued before the first has reached Synced is accepted; the driver converges partitions toward the latest requested size.
Replacement sync time If in-place growth fails on any partition, the replacement path requires a full data sync. Duration depends on volume size and network throughput.

Troubleshooting

Expansion Stuck in Expanding Phase

The volume stays in Expanding and never transitions to Synced.

Check partition sizes in the CR:

kubectl get msvol my-volume -o yaml

Look at status.partitions[*].sizeBytes. If any partition still shows the old size, that node has not completed its expansion.

Check the node logs for the lagging partition:

# Find which node has the undersized partition, then check its logs:
kubectl logs -n meshstor daemonset/meshstor-csi-node --selector=... | grep -i expand

Common causes:

  • No free space on the drive. The partition cannot grow in place and no other drive has enough room for a replacement. Check available capacity with kubectl get msnd.
  • Node is down. The remote node's reconciler cannot run. Verify the DaemonSet pod is healthy on that node.
  • NVMe-oF connectivity issue. The consumer node cannot reach the remote node to import the replacement partition. Check network connectivity on TCP port 4420 (or RDMA port 4421).

Expansion Rejected

The kubectl patch command returns an error.

  • allowVolumeExpansion not set: The StorageClass must have allowVolumeExpansion: true.
  • stripeWidth > 1: Expansion is only supported for stripeWidth=1. RAID10 expansion is part of the roadmap.
  • Size not larger: The requested size must be strictly larger than the current size.

Filesystem Size Does Not Match

After expansion completes (phase=Synced), df inside the pod shows a smaller size than expected. XFS metadata overhead accounts for a small percentage of the raw capacity. Difference under 10% is normal.

What's Next