Custom Kernel Patches¶
MeshStor maintains a small set of out-of-tree patches to the Linux MD driver. The driver runs on a stock kernel — these patches are not required for correctness — but each one closes a specific performance or operability gap that matters for the local-NVMe + NVMe-oF topology MeshStor uses. The patches are tracked separately on the upstream submission path; this page summarises what each one does and what changes for MeshStor when it is loaded.
For supported distributions and stock kernel versions, see Compatibility.
Latency-aware read balance (md-latency-ewma)¶
Linux branch: wip/md-raid1-raid10-latency
Files: drivers/md/md.h, drivers/md/raid1.c, drivers/md/raid10.c, drivers/md/md.c
Adds a per-rdev exponentially weighted moving average of read completion latency (latency_ewma_ns) and uses it as the cost function for read selection in choose_best_rdev (RAID1) and the equivalent path in RAID10:
The disk with the lowest cost wins. A 12.5% tiebreak falls back to the existing closest-distance heuristic to keep symmetric mirrors stable. Sample latencies are clamped at 10 ms before blending. The EWMA uses α = 1/16 (shift 4), and the first completed read on each rdev seeds the EWMA directly so the cost function works from the very first IO — no convergence window. Sequential-read short-circuit, WriteMostly handling, and resync round-robin paths are unchanged. Per-rdev latency_ewma_ns is exposed via sysfs for debugging.
Effect on MeshStor. With local NVMe (~10 µs) and remote NVMe-oF (~30 µs RDMA, ~100–200 µs TCP), the cost ratio directs steady-state reads to the local replica and overflows to remote only when the local queue depth grows enough that pending × local_latency exceeds remote_latency. Without this patch, MD's stock heuristic for non-rotational arrays balances reads by nr_pending alone and splits read traffic roughly evenly across replicas regardless of fabric — making remote NVMe-oF latency directly visible to the pod.
In-place RAID1 → RAID10 takeover (md-raid1-to-raid10-takeover)¶
Linux branch: wip/md-raid1-to-raid10-takeover
Files: drivers/md/raid10.c, drivers/md/md.c
Adds raid10_takeover_raid1() — a zero-copy personality swap from a healthy RAID1 to RAID10 with near_copies = N, N geometry. For near_copies == raid_disks the on-disk byte layout under RAID1 and RAID10-near is provably identical: every byte lives at the same physical offset on every disk. Under that invariant no data moves and no resync runs — the takeover is a pure personality swap committed via level_store (writing "raid10" to /sys/block/mdX/md/level).
Strict preconditions are enforced before any state mutation — v1.x metadata, no in-flight reshape, no external_size, at least two disks, not degraded, no WriteMostly member. Each failed precondition emits a distinct pr_warn so operators can identify which check fired without a debugger.
Effect on MeshStor. Enables in-place expansion from replicaCount=N, stripeWidth=1 (RAID1) to replicaCount=N, stripeWidth>1 (RAID10) without rebuilding the array. This is the kernel-side dependency for the RAID1→RAID10 reshape feature listed under Project Status → Available to paid customers.
Per-bucket resync barriers in RAID10 (per-bucket-arrays)¶
Linux branch: per-bucket-arrays
Files: drivers/md/raid10.c, drivers/md/raid1-10.c
Replaces RAID10's single global barrier / nr_pending / nr_waiting / nr_queued scalars with arrays of BARRIER_BUCKETS_NR atomic_t elements (one bucket per 64 MiB region, sector-hashed via sector_to_idx()). RAID1 already uses per-bucket barriers; this change ports the same mechanism to RAID10. The seqlock-based fast path is preserved. freeze_array / unfreeze_array continue to drain the whole array via a separate array_freeze_pending flag; raid10_quiesce uses new raise_barrier_all / lower_barrier_all helpers.
Effect on MeshStor. Resync of one 64 MiB region only blocks application I/O hitting the same region, instead of stalling the entire array. On RAID10 volumes that are degraded and resyncing — common during member replacement and drain migration — foreground I/O continues unimpeded for sectors outside the active resync window. Without this patch, a single in-flight resync stalls every concurrent application read or write to the array.
What's Next¶
- Compatibility — supported distributions and stock kernel versions
- Architecture — where the EWMA-driven read selection shows up in the data path
- Project Status — roadmap items that depend on these patches