RAID-Z is one of the finest tools which would ensure that your data lives as error free as possible on even the cheapest collection of disks. It’s a part of OpenZFS. You can understand the basics of OpenZFS in this brief article if you haven’t heard of it before. It is an open source, enterprise grade file system available on Linux, FreeBSD, Mac OS X, SmartOS, Illumos and other major OSes.
But first.. what is RAID?
RAID stands for Redundant Array of Independent (Inexpensive) Disks. This refers to the industry wide practice of storing data not just on one disk but across multiple disks so that even when there’s a disk failure the data can be reconstructed from other disks. The way data is spread across disks is different for different types of redundancies accordingly they are named RAID 0, RAID 1, etc. We are not going to be dealing with them here. We would focus on a RAIDZ which is specific to OpenZFS.
RAID (and also RAID-Z) is not the same as writing copies of data to a backup disk. When you have two or more disks set up in RAID the data is written to them simultaneously and all the disks are active and online. This is the reason why RAID is different from backups and more importantly why RAID is not a substitute for backups. If your entire server burns out, then all the online disks could go with the server, but backups will save your day. Similarly, if there’s single disk failure and something was not backed up, because you can’t do it everyday, then RAID can help you retrieve that information.
Backups are periodically taken copies of relevant data and RAID is a real-time redundancy. There are several ways in which data is stored in traditional RAID systems, but we will not go into them here. Here, we would dive deep into RAIDZ which is one of the coolest features of OpenZFS.
One last thing before we get started, traditional RAID sometimes encourages using dedicated hardware devices to do the RAID. This leaves the operating system and file system unaware of the RAID mechanisms that are in place. But often the RAID card (the dedicated hardware) itself encounters a failure leaving your entire disk array essentially useless.
To avoid this, you must always try to use OpenZFS without any hardware RAID controller.
RAID-Z1, RAID-Z2, RAID-Z3
ZFS combines the tasks of volume manager and file systems. This means you can specify the device nodes for your disks while creating a new pool and ZFS will combine them into one logical pool and then you can create datasets for different uses like /home, /usr, etc on top of that volume.
Setting up RAID-Z would require at least 3 or more disks. You can’t use less than three disks. The storage provider can be something else too network attached storage, virtual block device, etc, but let’s stick to three disks of equal sizes as a simple example.
The three disks can be combined into a virtual device (vdev). This is the building block of a zpool. If you are starting with only 3 disks, you have 1 vdev in your zpool. You can have 2 vdevs with 6 disks and so on.
Suppose you have a 1GB file which you want to store on this pool. RAID-Z splits it into two equal chunks of 512MB and then performs a mathematical operation between them which generates a third chunk of 512MB (called the parity block). The three chunks then get written into three separate vdevs. So the file ends up taking 1.5GB of space in total.
The advantage however is that, if one of the disk fails, say the first chunk is lost, then the second chunk and the parity block can be used recreate the first one. Similarly, if the second chunk is lost, the first and third can be used to recreate second one.
Your files use 50% more space than necessary but you can withstand the failure of one disk per vdev. This is RAID-Z1.
But a ZFS pool can grow and eventually you will need more space. Well you can’t add more disks directly to a vdev (that feature is proposed and could very well be under development right now). However, you can add a vdev. This means you can add disks in sets of three and treat each new set as a single logical vdev.
You can now tolerate a single disk failure in this new vdev and a single disk failure in the older one. But if more than one disk fail within a single vdev, that’s not recoverable. Your entire pool is rendered useless even the healthier vdevs.
This is a really over-simplified model. Files are never split exactly in halves but data is treated as blocks of fixed lengths. Moreover, you can use more than 3 disks (but 3 is the minimum) per vdev and RAID-Z1 will ensure that each unique block of data is written such that it can recover from failure of any single disk in per vdev. Thankfully, you don’t have to worry about these internal details. That’s ZFS’ responsibility. Once the pool is setup, data is automatically spread across it in the most optimal way.
The failure tolerance is still limited to one disk failure per vdev. To go beyond that we need to go to RAID-Z2. RAID-Z2 works in a similar way but it creates two parity blocks and two data blocks from a single piece of information. This allows it to withstand upto 2 disk failures per vdev. Also a vdev must have at least 4 disks if it is going to implement a RAID-Z2 setup.
Similarly, RAID-Z3 requires at least 5 disks per vdev and can withstand the failure of 3 of them. RAID-Z3 is not nearly as space efficient as RAID-Z2 which is not as efficient in terms of space as RAID-Z1.
Conclusion
With RAID-Z we see a trade-off between the usable space offered by individual disks and the reliability that the collection of such disks may offer. With greater number of disks, the probability of multiple disks failing simultaneously also increases.
The best way to counter it is using an effective RAID-Z strategy that offers reliability as well as the best bang for your buck. Let us know if you found this tutorial useful or if you have any questions with regards to RAID-Z!