Why Your RAID 5 Rebuild Took 36 Hours and ZFS Wouldn’t
You’ve read the RAID rebuild math. You’ve maybe even looked at RAID 50 and RAID 60 nesting — stacking mdadm arrays to reduce rebuild risk. Both answer the same problem: drives are huge now, rebuilds are slow, and the failure exposure window has stretched from hours into days.
Here’s the thing. mdadm does RAID as a block-layer abstraction. The filesystem has no idea parity is happening. It writes, mdadm shuffles bits, life goes on — until a power blip mid-write leaves parity inconsistent. Write hole. Now you’re praying the UPS held.
ZFS takes a different approach. The filesystem is the RAID layer. Checksums, snapshots, copy-on-write, and parity all live in the same engine. That changes what’s actually possible.
Let’s talk about RAID-Z, why it’s not just RAID 5 with a rebrand, and when to reach for dRAID instead.
Full example: Clone the working files at github.com/KingPin/sumguy-examples/linux/raid-z-and-draid
RAID-Z vs mdadm Parity: Three Key Differences
Before we get into Z1/Z2/Z3, there are three things that genuinely separate RAID-Z from mdadm RAID 5/6. Not marketing — actual architectural differences.
1. Variable-width stripes (no read-modify-write)
Traditional RAID 5 has a fixed stripe width. When you write a chunk smaller than that stripe, the controller has to read the existing stripe, modify it in memory, then write the whole stripe back — plus update parity. That’s three I/Os for what looked like one write. That’s the write penalty.
RAID-Z uses variable-width stripes. Every write is its own complete stripe, sized to exactly the data being written. If you write a 4KB block, the stripe is 4KB of data plus the parity. ZFS never does read-modify-write. The write penalty you’ve been warned about with RAID 5 does not exist here.
2. No write hole (COW + transactional writes)
The write hole — where a power failure during a RAID 5 stripe write leaves parity inconsistent — can’t happen in RAID-Z. ZFS writes are transactional. Data lands in new blocks first (copy-on-write), then the metadata is atomically updated to point to the new blocks. Either the whole transaction commits, or nothing does. Old blocks stay untouched until the transaction succeeds.
You still want a UPS. But you’re not losing sleep over write-hole corruption.
3. Per-block checksums
Every block in ZFS has a checksum. When you read data, ZFS verifies the checksum. If it doesn’t match, ZFS knows it’s corrupt. If you have redundancy (parity or mirror), ZFS heals it automatically from the good copy. It logs what it found and fixed.
mdadm knows nothing about this. If a bit flips on a RAID 5 drive and the read comes back wrong, mdadm serves you the corrupted data and calls it a day. Silent corruption is real, and it’s why a monthly zpool scrub is worth more than a year of crossed fingers.
RAID-Z1, Z2, Z3
These map roughly to RAID 5, 6, and 7:
| Level | Parity drives | Can survive | Min drives |
|---|---|---|---|
| RAID-Z1 | 1 | 1 drive failure | 3 |
| RAID-Z2 | 2 | 2 simultaneous drive failures | 4 |
| RAID-Z3 | 3 | 3 simultaneous drive failures | 5 |
“Roughly” is doing real work there. Unlike RAID 6, RAID-Z2 doesn’t have a fixed parity position — it’s distributed across variable-width stripes. The logical behavior is the same (lose two drives, still alive), but the on-disk layout is completely different.
For home labs and small businesses, Z2 is the sweet spot. Z3 is for large arrays where the math on simultaneous failures starts looking less academic.
The Expansion Problem (Historic)
For a long time — and I mean a decade-plus — RAID-Z had a notorious limitation: you could not add drives to an existing vdev. Once you created a 6-drive RAID-Z2 vdev, that was your geometry forever.
Want more space? Add a new vdev. Your pool now has two separate RAID-Z2 vdevs. Pool size doubled, but so did your vdev count, and your data was striped across both. You couldn’t consolidate. You couldn’t take six drives and turn them into seven without destroying and recreating the pool.
This was a genuine adoption killer for home labs. “I’ll just add a drive when I need space” is how people think about their NAS. ZFS said no. You plan your vdev geometry at creation and you live with it.
For years, the workaround was: plan your vdev sizes carefully upfront, or buy a bunch of drives at once and build the right-sized vdev. Not ideal. Definitely annoying.
RAID-Z Expansion in OpenZFS 2.3
In 2024, OpenZFS 2.3 shipped RAID-Z expansion — the ability to add a single drive to an existing RAID-Z vdev. This was years in the making and a genuinely big deal.
Here’s how it works:
# Check current pool status before expansionzpool status mypool
# Add a drive to an existing RAID-Z2 vdev# This converts a 6-drive Z2 into a 7-drive Z2zpool attach mypool raidz2-0 /dev/sdg
# Monitor the expansion progresszpool status mypool pool: mypool state: ONLINEstatus: One or more devices is currently being resilvered. scan: resilver in progress since Tue Jul 8 09:15:00 2026 3.68T scanned at 1.12G/s, 821G issued at 250M/s config:
NAME STATE READ WRITE CKSUM mypool ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 sda ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 sdg ONLINE 0 0 0 (expanding)Here’s the caveat nobody tells you upfront: existing data keeps the old parity ratio until it’s rewritten. If you had a 6-drive Z2 vdev and you add a 7th drive, the blocks that were written under the 6-drive geometry stay as-is. New writes use the 7-drive geometry. As data gets rewritten naturally over time (or you force a scrub-then-rewrite cycle), the old blocks migrate to the new layout.
This means usable capacity increases incrementally — not all at once the moment the expansion finishes. Check zpool status for the reflow progress if you’re impatient.
The expand-raidz.sh script in the example repo walks through this with loop devices so you can test it without touching real hardware. See the link at the top of this article.
vdev Sizing Rules of Thumb
Pool IOPS in ZFS doesn’t scale with drive count — it scales with vdev count. This is the single most important thing to understand about ZFS pool design, and it trips up people coming from mdadm.
In mdadm RAID 6, adding two more drives to your array increases your read throughput because all drives participate in every read. In ZFS, a single RAID-Z2 vdev with 10 drives has roughly the same random IOPS as one with 6 drives. You’re getting more capacity, not more parallelism.
Want more IOPS? Add more vdevs. Two 6-drive Z2 vdevs in a pool will give you roughly double the IOPS of one 12-drive Z2 vdev. This is the ZFS equivalent of going from RAID 6 to RAID 60 — the same concept we covered in the RAID 50/60 article, but baked into how ZFS pools are designed rather than stacked mdadm arrays.
Practical sizing guide for RAID-Z2:
| Drives per vdev | Usable | Notes |
|---|---|---|
| 4 drives | 50% | Tight. Small arrays only. |
| 6 drives | 67% | Sweet spot. |
| 8 drives | 75% | Good. Rebuild time starts to grow on large drives. |
| 12 drives | 83% | Split into two 6-drive vdevs instead. |
The sweet spot is 4–8 drives per Z2 vdev. Above 8, you’re better off splitting into multiple smaller vdevs for IOPS and shorter rebuild windows.
12 drives? Two 6-drive RAID-Z2 vdevs. Double the IOPS, shorter rebuilds, same usable space.
dRAID: Declustered Parity
dRAID is what happens when the ZFS team looked at large arrays — 24, 48, 60+ drives — and said “the traditional resilver model is going to take a week, that’s unacceptable, let’s rethink this.”
In a standard RAID-Z pool, when a drive fails, the resilver (rebuild) reads from every other drive in the failed drive’s vdev and writes to a hot spare. One hot spare, reading from N-1 drives. The bottleneck is that the hot spare is doing all the writing, and it’s absorbing all the work.
dRAID distributes the data, parity, and spares differently from the start. Spare capacity is spread across all drives in the array — not sitting idle on dedicated spare drives. When a drive fails, every other drive in the pool participates in the rebuild simultaneously. The work fans out instead of funneling through a single spare.
The result: resilver times that used to take days drop to hours. On a 60-drive array, the difference is dramatic.
dRAID was introduced in OpenZFS 2.1 (2021). The command syntax reflects the more complex layout:
# dRAID2:8d:1s = 2 parity, 8 data drives per stripe, 1 distributed spare# This needs at least 11 drives (8 data + 2 parity + 1 spare)zpool create mydraidpool draid2:8d:1s \ /dev/sda /dev/sdb /dev/sdc /dev/sdd \ /dev/sde /dev/sdf /dev/sdg /dev/sdh \ /dev/sdi /dev/sdj /dev/sdk
zpool status mydraidpool pool: mydraidpool state: ONLINEconfig:
NAME STATE READ WRITE CKSUM mydraidpool ONLINE 0 0 0 draid2:8d:1s-0 ONLINE 0 0 0 sda ONLINE 0 0 0 sdb ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 sdk ONLINE 0 0 0
spares draid2-0-0 AVAILNote that draid2-0-0 shows as an available spare even though there’s no physically separate spare drive — it’s virtual capacity distributed across all 11 drives. When a real drive fails, that distributed spare capacity activates and all surviving drives contribute to the rebuild.
Where dRAID shines:
- 16+ drives in a single pool
- 24+ drives where traditional resilver times are measured in days
- Enterprise-ish deployments where rebuild windows matter for SLA reasons
- Home labs with large drive counts (that 24-bay Supermicro chassis someone in your homelab Discord definitely has)
Where dRAID is overkill:
- 4–12 drive home arrays — the complexity outweighs the benefit
- Pools where you’ll be adding drives frequently — layout is fixed at creation
- Anywhere that RAID-Z2 rebuild times are already comfortable
RAID-Z vs dRAID: Which One
Here’s the honest decision guide. No hedging.
| RAID-Z2 | dRAID2 | |
|---|---|---|
| Drive count | 4–12 drives | 16+ drives |
| Rebuild speed | Slow on large arrays | Dramatically faster |
| Hot spares | Separate spare drive(s) | Distributed in layout |
| Complexity | Low — easy to reason about | Higher — layout geometry matters |
| vdev expansion | Supported (OpenZFS 2.3+) | Not supported |
| Random IOPS | Scales with vdev count | Same |
| Comparable mdadm | RAID 6 or RAID 60 | No direct mdadm equivalent |
| Best for | Home labs, small NAS | Large arrays, data centers |
If you have a 6-bay or 8-bay NAS, RAID-Z2 is your answer. It’s simpler, well-understood, and the rebuild times on 8TB drives are uncomfortable but survivable. Two 6-drive Z2 vdevs if you want more IOPS — same concept as RAID 60 from the RAID 50/60 article, but done in ZFS natively.
If you’re building something with 24+ drives — maybe you’re running TrueNAS or OMV on real server hardware — dRAID2 is worth understanding seriously. The rebuild time difference at that scale is not academic. A 36-hour resilver window on a 24-drive pool running degraded is a 36-hour window where a second failure ends everything. dRAID shrinks that window to hours.
When NOT to Use ZFS
ZFS is great. ZFS is also not for everyone.
RAM constraints. The ARC cache is not optional. Rule of thumb: 4–8GB ARC handles most home pools fine, but if you’re running a 4GB server where every gigabyte fights for space, ZFS is going to lose that fight. Btrfs or ext4+mdadm will serve you better. The ZFS vs Btrfs article has the full breakdown.
Single-drive systems. ZFS without redundancy is just ZFS with extra RAM usage and a learning curve. Use ext4.
Distros with kernel licensing anxiety. ZFS is CDDL-licensed, Linux is GPL, these are arguably incompatible. Ubuntu ships ZFS anyway. Some distros don’t. If ZFS requires significant manual installation work on your system and you’re not committed to learning it, don’t bother.
You just want a quick NAS. If you’re spinning up a two-drive mirror for Plex media and you don’t want to learn pool management, mdadm RAID 1 + ext4 gets the job done in five minutes. ZFS is a commitment. Make sure you want to make it before you format six drives.
Real Talk
ZFS isn’t magic. It’s a specific set of trade-offs: more RAM, more concepts to learn, in exchange for checksums on every block, no write hole, COW snapshots, and replication that actually works. If those trade-offs match your use case, ZFS is genuinely excellent. If they don’t, it’s a forklift for your couch.
RAID-Z2 is the right default for 4–12 drive home lab arrays. dRAID is the right answer when traditional resilver times stop being theoretical and start being your Saturday problem.
Snapshots and replication are where ZFS really pulls away from mdadm. zfs send | zfs receive is one of the most elegant backup primitives in Linux. Covered in the ZFS replication with Syncoid and Sanoid article.
Picking a NAS OS to run ZFS — TrueNAS Scale, OpenMediaVault, Unraid — is its own rabbit hole. The TrueNAS vs OMV vs Unraid comparison sorts it out.
And as always: RAID is not a backup. RAID-Z2 survives two simultaneous drive failures. It does nothing when you rm -rf the wrong directory, ransomware encrypts your pool, or the server falls off a shelf. Run ZFS. Run replication. Run an offsite backup.
Your 2 AM self will appreciate the distinction.