What is RAID? Unveiling the Pros and Cons of 15 RAID Configurations!
|
|
Time to read 13 min
|
|
Time to read 13 min
Storage space is vital with the increasing amount of data. Nowadays, AAA games may require hundreds of gigabytes of storage space, while a Blu-ray 4K HD movie may reach 70/80 gigabytes. Even the HDR images and videos we take with our cell phones take up a lot of storage space.
We usually buy a hard drive when our storage space is insufficient. However, we may face the problem of hard disk partitions when there are more than one hard disk. For important data, we may need to manually perform file backups to another hard drive to increase data security.
So, how can we make use of multiple hard drives? Using RAID allows us to use multiple hard drives to increase storage capacity and data security.
Proposed by Prof. D.A. Patterson of Berkeley, California in 1988, RAID (redundant array of inexpensive disks) is known as redundant array of independent disks.
RAID is the combination of multiple independent disks into a large-capacity disk group, using individual disks to provide data generated by the additive effect to enhance the performance of the entire disk system, and storing redundant data to increase its fault tolerance.
In short, it has many benefits as it combines multiple independent hard disks into a large-capacity hard disk group to significantly increase read and write speeds with data protection features.
RAID has different levels and variations, including RAID 0, RAID 1, RAID 2, RAID 3, RAID 4, RAID 5, RAID 6, RAID F1, RAIDZ, JBOD, SHR, SHR2, and hybrid RAID.
Image from www.ufsexplorer.com
RAID can be categorized into two main types -- hardware RAID and software RAID.
Hardware RAID comes in two forms. An external RAID enclosure is particularly expensive and is usually used for enterprise storage. Another form is adding a RAID controller card to the computer.
Image from Wikipedia
RAID controller cards offer faster speeds and greater stability due to the cache on the card, which can improve read and write speeds. Stability is enhanced as the quality controller cards are equipped with batteries. The controller card's battery can ensure the data is written to the hard disk drive before it loses power, thus providing data security. New controller cards are more expensive than old cards, which are suitable for those on a budget.
Image from Wikipedia
Despite the many advantages of hardware RAID, one potential pitfall we must avoid is the RAID feature that comes with a motherboard. This is because the array may be lost in the event of a motherboard problem, such as failed overclocking or a dead battery. The RAID that comes with a motherboard is not even close to the second type of RAID - software RAID.
Software RAID is the use of software to simulate RAID. Early RAID versions are not very stable, and the speed is not as fast as hardware RAID. However, there is no longer a large performance gap between software RAID and hardware RAID with the continuous optimization of technology. Software RAID is a common choice in many NAS devices for home use. Click to read more: what is nas storage?
The RAID level starts at 0 and goes all the way up to 7. Let's start with RAID 0. Imagine hard disks as buckets, and read/write operations as the process of filling these buckets with water. This analogy makes it easier to understand RAID.
Short facts on RAID 0
Pros: Fast read/write speeds (no random read/write speeds); utilizes all hard drive space
Cons: No data protection
Tip: RAID 0 cannot be used to store important data
The read/write speed is limited by the interface of the hard disk when there is only one hard disk, similar to the size of a bucket limiting the speed of filling water. You can fill water into both buckets at the same time, and the read/write speed is twice as fast as a single hard disk. This is the RAID 0 storage level.
RAID 0 combines two or more disks to form one large logical disk with a total capacity equivalent to all the hard drive capacities. When data is written, it is segmented and stored in separate disks, allowing multiple disks to handle read and write operations simultaneously.
RAID 0 has a major drawback despite having the fastest speed and largest capacity. Its excellent speed results in a lack of redundancy and fault tolerance. All data will be lost and unrecoverable when one of the hard disks in the array is damaged. Since data is stored in segments, damage to any one of the hard disks will result in incomplete data recovery. Therefore, it is not recommended to use RAID 0 for storing important data.
Image from Wikipedia
Short facts on RAID 1
Pros: High security, no data loss even if one of the disks is damaged
Cons: Low disk utilization, no increase in write speeds
Which RAID is suitable for storing important data? RAID 1 is the most secure, using two hard disks that mirror each other to store the same data on each disk. The entire data can be read as long as one of the hard disks in the array is not damaged. RAID 1 has the same read speed of RAID 0, but the read speed can increase if several hard disks are used. However, the write speed is the same as a single hard disk and cannot be increased.
A damaged hard disk in RAID 1 can be unplugged and the array will automatically restore the data to the newly inserted hard disk, a process known as rebuilding the array.
The problem with RAID 1 is the relatively low price-performance ratio. Even if you use 100 hard disks in RAID 1, the final capacity is only equivalent to the capacity of one hard disk. If the size of each hard disk is different, the final capacity will be based on the capacity of the smallest hard disk. The overall utilization rate of RAID 1 is the lowest among all RAID levels. RAID 0 and RAID 1 can be regarded as two extremes, with RAID 0 offering ultra-fast speeds while RAID 1 is ultra-secure.
You can consider RAID 2, 3, 4, 5 or 6 if you are looking to increase capacity and security, but not concerned with speed.
Image from Wikipedia
RAID 2, 3, and 4 were designed for specific applications, but they are rarely used and many controller cards do not support these levels due to various shortcomings.
RAID 2 requires at least three hard disks. When reading and writing data, it is necessary to encode the data in real time and write the segmented data to different hard disks. The total amount of data obtained will be larger than the original data. Moreover, RAID 2 requires real-time data checksums during read/write operations. The hardware overhead is higher as the checksum algorithm used is more complex.
RAID 3 requires at least three hard disks and has a relatively low hardware overhead due to simpler algorithms. During read and write operations, data is written to different hard disks in segments, while checksums are stored separately on another hard disk. However, the checksum of a disk needs to be accessed for every read and write operation as it gets easily damaged under high load for a long period. Data cannot be recovered when the checksum of a disk is damaged.
Image from Wikipedia
RAID 4 is similar to RAID 3, but the data is segmented in a different way. RAID 4 also stores parity data on a separate hard disk. Unlike RAID 3, RAID 4 is segmented by blocks of data, the size of which is determined by the system and is usually much larger than a bit. Thus, the writing of small files will be faster in RAID 4 than RAID 3. However, the probability of data recovery in a corrupted disk without checksum is lower than RAID 3. Neither RAID 3 nor RAID 4 can recover data when the checksum of a disk is corrupted.
Image from Wikipedia
RAID 5 is similar to RAID 3, except RAID 3 stores parity data in a single hard disk, while RAID 5 scatters parity data across hard disks. Data in other hard disks and checksum can work together for data recovery when one hard disk is damaged, unlike in RAID 3 where a corrupted checksum causes the entire array to fail.
Image from Wikipedia
RAID 5 requires at least three hard disks. One-third of the space is used to store redundant information, or checksum, while two-thirds are used to store raw data. The read speed of RAID 5 is similar to that of RAID 0, but the write speed may not be as fast as RAID 0. However, data and checksums in the other hard disks can be used to achieve full data recovery even if one of the hard disks in the array is corrupted, since one-third of the space is used for storing checksums. Thus, RAID 5 is more secure than RAID 0.
In a corporate environment, File Server often considers RAID 5 as an option for its storage system. A File Server is the central repository for all the company's shared files, such as business documents, financial reports, and project files. It needs to ensure that these files are accessible to authorized employees at all times. RAID 5's ability to recover from a single - disk failure makes it an attractive choice for File Server. When a hard disk fails in the File Server's RAID 5 array, the server can continue to operate and provide access to most of the files while the array is rebuilding.
RAID 5 has higher security, but it also has disadvantages. First, there is a very low probability that a mechanical hard disk will encounter an unrecoverable read error (URE) when reading data, which may occur once every 12TB of data. It only takes one URE error for RAID 5 to think that there is something wrong with the data, and it starts rebuilding the array.
Multiple rebuilds can cause the hard disks to operate under high load for long periods. If the disks were bought at the same time and one disk fails, the state of the other disks may also become unstable, leading to further disk damage.
Since RAID 5 only allows one hard disk to be damaged, data will not be saved if another hard disk is damaged during the rebuilding process. It is not recommended to use RAID 5, as it has a low success rate in rebuilding arrays and poor security compared to RAID 6.
RAID levels are an essential factor when choosing a storage solution. For NAS devices, whether it’s a 2-bay or 4-bay configuration, the appropriate RAID setup must be considered. Understanding these key factors will help you make an informed decision. Explore the details of 2-bay and 4-bay NAS storage options.
RAID F1 has an array layout based on RAID 5, and is designed for solid-state drives (SSDs). Optimized for SSD write wear, RAID F1 stores checksums in one SSD to minimize the amount of writes to other SSDs. The system supports automatic data transfer when the SSD is approaching its write limit. However, RAID F1 is usually not accessible to the public due to high SSD prices.
RAID 6 and RAID 5 are two common RAID levels. Compared to RAID 5, RAID 6 adds redundant hard disk space for parity data and requires at least four hard disks.
RAID 6 has high data security because two different checksum algorithms are used, ensuring full data recovery even when two hard disks are corrupted. In comparison, RAID 5 is less secure. However, the amount of checksums in RAID 6 is twice as large as RAID 5 because the former uses two checksum algorithms that are more computationally intensive. This means that RAID 6 cannot read or write as fast as RAID 5.
Image from Wikipedia
If you are not satisfied with RAID 0 to 6, and want to know if there are any RAID types that feature the advantages of all levels, then read on to learn more about hybrid RAID.
RAID 10 is the most common hybrid RAID. Combining the RAID 1 and RAID 0 modes, RAID 10 ensures data security and dramatically increases read and write speeds, but the available capacity is only half of its total capacity.
RAID 10 requires at least four hard disk drives, two of which will be used to form RAID 1, and another two will be used to form RAID 0.
In addition to RAID 10, there are RAID 50 and RAID 60 arrays. RAID 50 is a combination of RAID 5 and RAID 0, which uses RAID 5 to store data and parity information on multiple hard disks, and combines these hard disks to form a large striped storage space using RAID 0, thus increasing storage capacity and read/write performance. RAID 60 is a combination of RAID 6 and RAID 0, and adopts a similar approach to increase storage capacity and read/write performance while providing better data redundancy and security.
JBOD, which stands for just a bunch of disks, uses a special data storage model. In JBOD, data is stored sequentially on disks, starting with the first disk. Only one huge partition containing the capacity of all the hard disks is visible in the system. However, data will be inaccessible if any of the hard disk drives fails. Even worse, the entire array fails if the first hard disk is damaged because it is the only storage location for data segmentation.
The advantage of JBOD is that it treats multiple hard disks as a whole, with the disks' total capacities available for use. In addition, only one drive is used per write operation. This means that the other drives are idle during read and write operations and overuse does not cause damage. However, JBOD has significant drawbacks. Its data security is relatively low, as failure of any hard disk can lead to data loss. Second, the read and write speeds have yet to improve and remain the same as that of a single hard disk. Hence, JBOD is not recommended for applications that require high data security and read/write performance.
Unraid is a Linux-based operating system similar to JBOD, and supports data redundancy. In Unraid, one or two hard disks can be set up for data verification. A larger capacity is required to recover data if one or two hard disks fail.
Unraid features convenient capacity expansion, as users only need to insert a new hard disk into the system to expand storage space. Even if more than one hard disk is broken, only data on the corrupted hard disk will be lost. The other disks in the array can still operate normally.
However, Unraid has two major drawbacks that limit its use. First, it is chargeable, prices range from 59-129, which is relatively inexpensive. Similar to JBOD, Unraid has slower write speeds because the extra checksum operation increases the writing time. In fact, Unraid write speeds may even be slower than JBOD, the slowest of all arrays.
Nonetheless, Unraid offers a lot of storage space, making it ideal for users who don't require high performance.
Synology Hybrid RAID (SHR) is a unique array mode, which is mainly for new users who don't know anything about arrays. SHR can automatically determine and use the appropriate RAID mode according to the number and capacity of hard disk drives.
In SHR, the capacity of one hard disk is used to store verification data by default. If each hard disk has the same capacity and when only one hard disk is used, SHR is equivalent to a normal hard disk without any data protection. When two hard drives are used, SHR adopts a mode similar to RAID 1. SHR is similar to RAID 5 when three hard disks are used. SHR 2 stores checksums in two hard drives and requires four hard drives, similar to RAID 6.
SHR makes it easy to upgrade one to two redundant arrays of disks, providing more flexibility than conventional RAID. However, data recovery can only be done in GroupHi since SHR is a dedicated mode for GroupHi. Specialized software is needed for data recovery as hard drives used in other computers may not be able to directly read data. However, such software has certain limitations on data recovery.
RAID Z is a software RAID based on the ZFS system. ZFS is a 128-bit file system that supports advanced features, including:
RAID Z is one of the features of ZFS, where additional software or hardware is not needed to realize RAID. There are three types of RAID Z: RAID Z1, RAID Z2 and RAID Z3.
Coupled with the features of ZFS, RAID Z is an excellent option. However, RAID Z also has some drawbacks, including high memory usage. ZFS needs to use a lot of memory for caching, each T of space is best to correspond to 1G of memory, or performance will be affected. It is recommended to start with at least 8G. It is also vital to use error correction code memory, or it will result in a low probability of data errors.
The second disadvantage is expanding capacity. Another group of up to six hard disks needs to be added if a third hard disk has been formed to RAID Z1. It is not as convenient as RAID 5 and RAID 6, where a new hard disk can be used to expand capacity.
In this article, we outlined the concept of RAIDs, as well as the advantages and disadvantages of 15 different RAID levels. Understanding the characteristics of various RAID levels allows users to choose a storage solution that suits their needs and improve data security and reliability.