Disk Cleanup¶
LLM-generated draft — not proofread
This page was drafted by an LLM and has not been reviewed by a human. Treat every claim as unverified until a maintainer signs off.
meshstor-cleanup is a node-local helper bundled in the CSI image that retires meshstor partitions whose MeshStorVolume CR is gone (or no longer claims them).
Safety First¶
MeshStor never automatically deletes user data. No background reconciler ever erases an on-disk partition. The only path that removes a partition's GPT entry is an operator running meshstor-cleanup --apply on a specific node. The defaults make it hard to lose data by accident:
- Dry-run by default. Without
--apply, the tool only prints a plan; nothing on disk is touched. - Label-gated. Only partitions with the meshstor label prefix are inspected. Pre-existing partitions on the same drive — your OS partition, foreign filesystems, anything you put there yourself — are never even read.
- Local only. The tool acts only on drives owned by the pod's
NODE_NAME. It cannot affect other nodes. - TOCTOU-guarded. On
--apply, every candidate is re-classified against a freshMeshStorVolumelist immediately before retire, so a partition the controller has just re-adopted is skipped. - Holder-safe. Tear-down aborts a partition if any kernel holder remains after the MD/nvmet teardown attempt, rather than corrupting state.
Always inspect before applying
The tool's classification tells you whether a MeshStorVolume CR currently references the partition — not whether the data on it is still valuable. After a Kubernetes reinstall or a CR loss, every partition on disk will look unknown even though the filesystems still hold real application data. Mount each candidate read-only and check the contents (see Inspect Before Removing) before passing --apply.
When to Run¶
| Signal | Cause |
|---|---|
UNKNOWNPARTITIONS > 0 in kubectl get msnd |
A drive holds ex-MeshStor partitions whose volume CR is gone. See Monitoring. |
| A previously offline node returns after member replacement | The original partition is still on disk but no longer referenced by the volume. See Volume Relocation: Member Replacement. |
| Kubernetes was reinstalled on the same hardware | The new cluster has no MeshStorVolume CRs, but the drives still carry partitions from the previous cluster. Every meshstor partition will classify as unknown — review carefully and recover any data you still need before applying. |
| Volumes deleted while a node was offline | The volume CR was removed cluster-wide, but the partition entry stayed on the offline node's GPT. |
reclaimPolicy: Retain orphans |
Drives still carry the partitions after the PV/PVC objects are gone. |
The dry-run plan is safe to run at any time — it is a no-op when nothing matches, and never modifies state.
How Classification Works¶
The tool runs on the node it is exec'd into (it reads NODE_NAME from the pod). For every meshstor-labeled partition on every local MeshStorNodeDevice drive, it classifies the partition into one of four buckets:
| Bucket | Meaning | Default action | With --force |
|---|---|---|---|
claimed |
A MeshStorVolume CR's .status.partitions[] lists this (nodeID, partitionUUID) pair |
leave | leave |
unknown |
No MeshStorVolume CR matches the label's volume prefix — the volume is gone |
remove | remove |
stale |
A MeshStorVolume CR exists for this volume prefix, but does not claim this (node, UUID) |
leave (skipped) | remove |
malformed |
The partition label does not parse as meshstor-NNNN-pvc-… |
remove | remove |
claimed is never retired. stale is left alone by default because a CR-driven flow may still be working on it; --force is the override for cases where you have already verified the volume's state.
Classification is a CR-state question, not a data-value question — see the warning above.
Inspect Before Removing¶
Before applying cleanup, mount each candidate read-only and confirm there is nothing on it that you still need. This is especially important after a Kubernetes reinstall, where every meshstor partition will look unknown even though the on-disk filesystems are intact.
The MeshStor node pod ships with mdadm, mount, and xfsprogs already installed and runs privileged with the host's block devices visible — exec into it to inspect a partition without touching the host directly:
Inside the pod, for each candidate partition (replace nvme0n1p3 with the partition name from the cleanup plan):
# 1. Confirm the MD metadata — array UUID, member count, role
mdadm --examine /dev/nvme0n1p3
# 2. Assemble the array read-only. --run starts it even though the array is degraded
# (a single RAID1 member is enough to read the filesystem).
mkdir -p /mnt/inspect
mdadm --assemble --readonly --run /dev/md/inspect /dev/nvme0n1p3
# 3. Mount the XFS filesystem read-only. norecovery skips log replay so nothing is written.
mount -o ro,norecovery /dev/md/inspect /mnt/inspect
# 4. Look at the contents. The PVC name embedded in the partition label points back to
# the workload that originally owned the volume.
ls -la /mnt/inspect/
# 5. Always tear down the inspection setup, even if you decide to keep the data.
umount /mnt/inspect
mdadm --stop /dev/md/inspect
If the data is still needed, copy it out (tar, rsync, etc.) before removing the partition, or leave the partition in place and recover it through your application's normal restore flow. Only proceed to --apply once every partition you plan to remove has been confirmed empty or expendable.
RAID10 volumes (stripeWidth > 1)
A single partition holds only one stripe of the data. Inspecting an isolated stripe member produces unreadable XFS. To recover data from a RAID10 volume, all partitions of one mirror set must be assembled together — which only works if enough members survived. If they did not, the data is unrecoverable from the leftovers and the partition is safe to remove.
Running the Tool¶
Find the MeshStor node pod for the affected node:
Dry-Run (default)¶
Sample output:
node: mf-01-03, drives from MeshStorNodeDevice: [nvme0n1 nvme1n1]
drive nvme0n1
nvme0n1p1 uuid=… label=meshstor-0001-pvc-cd1038a7-… status=claimed action=leave
nvme0n1p2 uuid=… label=meshstor-0002-pvc-2af5b9e3-… status=unknown action=will-remove
drive nvme1n1
(no meshstor partitions)
summary: 2 partitions scanned, 1 claimed, 1 unknown, 0 stale, 0 malformed
plan: would remove 1 partition(s) (use --apply to commit)
Apply¶
After inspecting every will-remove partition (see Inspect Before Removing) and confirming no needed data remains, re-run with --apply:
kubectl exec -n <meshstor-namespace> -c csi-plugin \
pod/<meshstor-node-pod> -- meshstor-cleanup --apply
Before each removal the tool re-lists volumes and re-classifies the partition against the fresh CR state. If the partition has become claimed (or stale without --force) since the plan was printed, it is skipped and reported in the summary. This guards against deleting a partition that a controller has just re-adopted.
Force (Stale Partitions)¶
To also retire stale partitions, add --force:
kubectl exec -n <meshstor-namespace> -c csi-plugin \
pod/<meshstor-node-pod> -- meshstor-cleanup --force # plan only
kubectl exec -n <meshstor-namespace> -c csi-plugin \
pod/<meshstor-node-pod> -- meshstor-cleanup --force --apply # commit
Warning
--force removes partitions whose volume CR exists but does not currently claim this node. That can race with reconciliation work. Verify the volume's .status.partitions[] does not include this node before forcing — otherwise the reconciler may be in the middle of re-using the partition.
What Tear-Down Does¶
For each retire, the tool runs:
- NVMe-oF unexport — removes the nvmet namespace exporting this partition (no-op if not exported).
- MD stop — for any
md*device listed as a kernel holder of the partition,mdadm --stopis called. Without this, the partition's GPT entry cannot be removed. - Holder re-check — sysfs is re-read; if any holder remains, the tool aborts this partition with
partition still has holders after tear-downand continues to the next one. - GPT remove — the partition entry is deleted from the parent drive's GPT table. The partition's data blocks are not zeroed; they are simply no longer addressable through GPT.
Exit Codes¶
| Code | Meaning |
|---|---|
0 |
Plan succeeded, or apply removed all queued partitions |
1 |
At least one partition failed to retire on apply (the rest were attempted) |
2 |
Usage error — NODE_NAME unset, kube client failed to build, or list calls returned an error |
Troubleshooting¶
partition still has holders after tear-down¶
Something other than the MD array the tool stopped is still holding the partition (e.g. a stuck dm-mapper device, an external mdadm assembly, or a leftover device-mapper). Inspect the holders directly:
Resolve the holder manually (stop the MD device, remove the dm target, etc.), then re-run meshstor-cleanup --apply.
nvmet unexport: …¶
The kernel's configfs nvmet tree could not be modified. Confirm the node was set up for NVMe-oF (the Prerequisites page covers what the CSI driver needs in /sys/kernel/config/nvmet). If the node has been re-purposed and nvmet is gone, the failure is harmless — re-run the tool from a node that still has the namespace.
re-classified as claimed after fresh CR list¶
Between plan and apply, the controller (or another node) re-attached the partition to a volume. This is the TOCTOU guard doing its job — the partition is left in place. Re-run the dry-run plan to see the current state.
What's Next¶
- Monitoring — how to see unknown-volume counts per drive
- Volume Relocation — when a returned node leaves leftover partitions
- Self-Healing — the controller-driven flow that produces
unknownpartitions in the first place