Skip to content

Disk Cleanup

LLM-generated draft — not proofread

This page was drafted by an LLM and has not been reviewed by a human. Treat every claim as unverified until a maintainer signs off.

meshstor-cleanup is a node-local helper bundled in the CSI image that retires meshstor partitions whose MeshStorVolume CR is gone (or no longer claims them).

Safety First

MeshStor never automatically deletes user data. No background reconciler ever erases an on-disk partition. The only path that removes a partition's GPT entry is an operator running meshstor-cleanup --apply on a specific node. The defaults make it hard to lose data by accident:

  • Dry-run by default. Without --apply, the tool only prints a plan; nothing on disk is touched.
  • Label-gated. Only partitions with the meshstor label prefix are inspected. Pre-existing partitions on the same drive — your OS partition, foreign filesystems, anything you put there yourself — are never even read.
  • Local only. The tool acts only on drives owned by the pod's NODE_NAME. It cannot affect other nodes.
  • TOCTOU-guarded. On --apply, every candidate is re-classified against a fresh MeshStorVolume list immediately before retire, so a partition the controller has just re-adopted is skipped.
  • Holder-safe. Tear-down aborts a partition if any kernel holder remains after the MD/nvmet teardown attempt, rather than corrupting state.

Always inspect before applying

The tool's classification tells you whether a MeshStorVolume CR currently references the partition — not whether the data on it is still valuable. After a Kubernetes reinstall or a CR loss, every partition on disk will look unknown even though the filesystems still hold real application data. Mount each candidate read-only and check the contents (see Inspect Before Removing) before passing --apply.

When to Run

Signal Cause
UNKNOWNPARTITIONS > 0 in kubectl get msnd A drive holds ex-MeshStor partitions whose volume CR is gone. See Monitoring.
A previously offline node returns after member replacement The original partition is still on disk but no longer referenced by the volume. See Volume Relocation: Member Replacement.
Kubernetes was reinstalled on the same hardware The new cluster has no MeshStorVolume CRs, but the drives still carry partitions from the previous cluster. Every meshstor partition will classify as unknown — review carefully and recover any data you still need before applying.
Volumes deleted while a node was offline The volume CR was removed cluster-wide, but the partition entry stayed on the offline node's GPT.
reclaimPolicy: Retain orphans Drives still carry the partitions after the PV/PVC objects are gone.

The dry-run plan is safe to run at any time — it is a no-op when nothing matches, and never modifies state.

How Classification Works

The tool runs on the node it is exec'd into (it reads NODE_NAME from the pod). For every meshstor-labeled partition on every local MeshStorNodeDevice drive, it classifies the partition into one of four buckets:

Bucket Meaning Default action With --force
claimed A MeshStorVolume CR's .status.partitions[] lists this (nodeID, partitionUUID) pair leave leave
unknown No MeshStorVolume CR matches the label's volume prefix — the volume is gone remove remove
stale A MeshStorVolume CR exists for this volume prefix, but does not claim this (node, UUID) leave (skipped) remove
malformed The partition label does not parse as meshstor-NNNN-pvc-… remove remove

claimed is never retired. stale is left alone by default because a CR-driven flow may still be working on it; --force is the override for cases where you have already verified the volume's state.

Classification is a CR-state question, not a data-value question — see the warning above.

Inspect Before Removing

Before applying cleanup, mount each candidate read-only and confirm there is nothing on it that you still need. This is especially important after a Kubernetes reinstall, where every meshstor partition will look unknown even though the on-disk filesystems are intact.

The MeshStor node pod ships with mdadm, mount, and xfsprogs already installed and runs privileged with the host's block devices visible — exec into it to inspect a partition without touching the host directly:

kubectl exec -it -n <meshstor-namespace> -c csi-plugin \
  pod/<meshstor-node-pod> -- bash

Inside the pod, for each candidate partition (replace nvme0n1p3 with the partition name from the cleanup plan):

# 1. Confirm the MD metadata — array UUID, member count, role
mdadm --examine /dev/nvme0n1p3

# 2. Assemble the array read-only. --run starts it even though the array is degraded
#    (a single RAID1 member is enough to read the filesystem).
mkdir -p /mnt/inspect
mdadm --assemble --readonly --run /dev/md/inspect /dev/nvme0n1p3

# 3. Mount the XFS filesystem read-only. norecovery skips log replay so nothing is written.
mount -o ro,norecovery /dev/md/inspect /mnt/inspect

# 4. Look at the contents. The PVC name embedded in the partition label points back to
#    the workload that originally owned the volume.
ls -la /mnt/inspect/

# 5. Always tear down the inspection setup, even if you decide to keep the data.
umount /mnt/inspect
mdadm --stop /dev/md/inspect

If the data is still needed, copy it out (tar, rsync, etc.) before removing the partition, or leave the partition in place and recover it through your application's normal restore flow. Only proceed to --apply once every partition you plan to remove has been confirmed empty or expendable.

RAID10 volumes (stripeWidth > 1)

A single partition holds only one stripe of the data. Inspecting an isolated stripe member produces unreadable XFS. To recover data from a RAID10 volume, all partitions of one mirror set must be assembled together — which only works if enough members survived. If they did not, the data is unrecoverable from the leftovers and the partition is safe to remove.

Running the Tool

Find the MeshStor node pod for the affected node:

kubectl get pods -n <meshstor-namespace> -o wide -l app.kubernetes.io/component=node

Dry-Run (default)

kubectl exec -n <meshstor-namespace> -c csi-plugin \
  pod/<meshstor-node-pod> -- meshstor-cleanup

Sample output:

node: mf-01-03, drives from MeshStorNodeDevice: [nvme0n1 nvme1n1]

drive nvme0n1
  nvme0n1p1  uuid=…  label=meshstor-0001-pvc-cd1038a7-…  status=claimed   action=leave
  nvme0n1p2  uuid=…  label=meshstor-0002-pvc-2af5b9e3-…  status=unknown   action=will-remove

drive nvme1n1
  (no meshstor partitions)

summary: 2 partitions scanned, 1 claimed, 1 unknown, 0 stale, 0 malformed
plan: would remove 1 partition(s) (use --apply to commit)

Apply

After inspecting every will-remove partition (see Inspect Before Removing) and confirming no needed data remains, re-run with --apply:

kubectl exec -n <meshstor-namespace> -c csi-plugin \
  pod/<meshstor-node-pod> -- meshstor-cleanup --apply

Before each removal the tool re-lists volumes and re-classifies the partition against the fresh CR state. If the partition has become claimed (or stale without --force) since the plan was printed, it is skipped and reported in the summary. This guards against deleting a partition that a controller has just re-adopted.

Force (Stale Partitions)

To also retire stale partitions, add --force:

kubectl exec -n <meshstor-namespace> -c csi-plugin \
  pod/<meshstor-node-pod> -- meshstor-cleanup --force            # plan only
kubectl exec -n <meshstor-namespace> -c csi-plugin \
  pod/<meshstor-node-pod> -- meshstor-cleanup --force --apply    # commit

Warning

--force removes partitions whose volume CR exists but does not currently claim this node. That can race with reconciliation work. Verify the volume's .status.partitions[] does not include this node before forcing — otherwise the reconciler may be in the middle of re-using the partition.

What Tear-Down Does

For each retire, the tool runs:

  1. NVMe-oF unexport — removes the nvmet namespace exporting this partition (no-op if not exported).
  2. MD stop — for any md* device listed as a kernel holder of the partition, mdadm --stop is called. Without this, the partition's GPT entry cannot be removed.
  3. Holder re-check — sysfs is re-read; if any holder remains, the tool aborts this partition with partition still has holders after tear-down and continues to the next one.
  4. GPT remove — the partition entry is deleted from the parent drive's GPT table. The partition's data blocks are not zeroed; they are simply no longer addressable through GPT.

Exit Codes

Code Meaning
0 Plan succeeded, or apply removed all queued partitions
1 At least one partition failed to retire on apply (the rest were attempted)
2 Usage error — NODE_NAME unset, kube client failed to build, or list calls returned an error

Troubleshooting

partition still has holders after tear-down

Something other than the MD array the tool stopped is still holding the partition (e.g. a stuck dm-mapper device, an external mdadm assembly, or a leftover device-mapper). Inspect the holders directly:

ls /sys/block/<partition>/holders/

Resolve the holder manually (stop the MD device, remove the dm target, etc.), then re-run meshstor-cleanup --apply.

nvmet unexport: …

The kernel's configfs nvmet tree could not be modified. Confirm the node was set up for NVMe-oF (the Prerequisites page covers what the CSI driver needs in /sys/kernel/config/nvmet). If the node has been re-purposed and nvmet is gone, the failure is harmless — re-run the tool from a node that still has the namespace.

re-classified as claimed after fresh CR list

Between plan and apply, the controller (or another node) re-attached the partition to a volume. This is the TOCTOU guard doing its job — the partition is left in place. Re-run the dry-run plan to see the current state.

What's Next

  • Monitoring — how to see unknown-volume counts per drive
  • Volume Relocation — when a returned node leaves leftover partitions
  • Self-Healing — the controller-driven flow that produces unknown partitions in the first place