Understanding Disk Fault Tolerance and RAID Features
This article explains disk fault tolerance mechanisms, detailing how RAID levels (1, 5, 6, 10, 50, 60) provide redundancy, the concepts of consistency checking, hot‑spare and emergency‑spare, rebuild processes, read/write policies, power‑loss protection, striping, mirroring, external configurations, energy‑saving, and JBOD functionality.
Disk fault tolerance ensures data integrity and processing capability when a subsystem experiences hard‑disk errors or failures. RAID controllers achieve this on RAID 1, 5, 6, 10, 50, and 60 by using redundant disk groups.
In RAID 1, data is mirrored on paired disks, so a failure of one disk does not cause data loss. RAID 5 tolerates one disk failure, while RAID 6 tolerates two.
For multi‑sub‑group RAID configurations, RAID 10 and RAID 50 allow a number of failed disks equal to the number of sub‑groups, with each sub‑group limited to one failed disk. RAID 60 permits twice the number of sub‑groups as failed disks, with each sub‑group limited to two failed disks.
RAID 0 does not support fault tolerance; a disk failure causes the entire array to fail and data to be lost.
Fault‑tolerance improves system availability, allowing continued operation during disk failures, making it a critical feature in recovery processes.
1. Consistency Check
For RAID levels with redundancy (1, 5, 6, 10, 50, 60), the RAID controller can perform consistency checks on the disk data, comparing it with redundant copies and automatically repairing any inconsistencies while logging errors. RAID 0 does not support consistency checks.
2. Hot Spare
The hot‑spare feature is implemented via hot‑spare disks and emergency‑spare functionality.
Hot Spare
A hot‑spare is an independent disk that automatically replaces a failed member disk in a RAID group and reconstructs the data onto the spare.
In the controller’s management interface or CLI, an idle disk with equal or greater capacity, the same media type, and interface as the member disks can be designated as a hot‑spare.
Two types of hot‑spares are supported:
Global hot‑spare: shared by all configured RAID groups on the controller; a single controller can configure one or more global hot‑spares. Any member disk failure in any RAID group can be automatically replaced by a global hot‑spare.
Local hot‑spare: dedicated to a specific RAID group; each RAID group can configure one or more local hot‑spares, which replace failed member disks only within that group.
Characteristics of hot‑spares:
Applicable only to RAID groups with redundancy (RAID 1, 5, 6, 10, 50, 60).
Hot‑spares replace only failed disks on the same RAID controller.
Emergency Spare
If a RAID group with redundancy experiences a member‑disk failure and no hot‑spare is assigned, an idle disk under the controller will automatically replace the failed disk and start reconstruction, preventing data loss. The replacement disk must have capacity equal to or greater than the member disk and the same media type.
3. RAID Rebuild
When a disk in a RAID group fails, the controller’s data‑reconstruction function can rebuild the lost data onto a new disk. Rebuild is only applicable to RAID levels with redundancy (1, 5, 6, 10, 50, 60).
If a hot‑spare is configured, it automatically replaces the failed member and starts reconstruction. Without a hot‑spare, reconstruction can only begin after a new disk is manually installed. During reconstruction, the failed member is marked as removable. If the system powers off during reconstruction, the controller resumes the task on restart.
The rebuild rate (percentage of CPU resources allocated to reconstruction) can be set from 0 % to 100 %; 0 % means reconstruction runs only when the system is idle, while 100 % uses all CPU resources. Users should choose an appropriate value based on system load.
4. Virtual Disk Read/Write Policies
When creating a virtual disk, a read/write policy must be defined to control how data is accessed.
Read Policy
The controller supports two read policies:
Read‑Ahead (pre‑read) : Options such as “Always Read Ahead”, “Read Ahead”, or “Ahead” cause the controller to read subsequent data into cache when a read request is issued, reducing seek time and improving read speed. This requires the controller to have power‑loss protection; otherwise, a capacitor failure may cause data loss.
Non‑Read‑Ahead : Data is read from the virtual disk only when the controller receives a read command, without pre‑fetching.
Write Policy
The controller supports several write policies:
Write‑Back : Data is first written to cache; when the cache accumulates enough data, it is flushed to the virtual disk, improving write performance. This also requires power‑loss protection.
Write‑Through (Direct Write) : Data is written directly to the virtual disk without passing through cache, which does not require power‑loss protection but offers lower write speed.
Write‑Back with BBU : When a Battery Backup Unit (BBU) is present and healthy, writes go through cache (write‑back). If the BBU is absent or faulty, the controller automatically switches to write‑through.
Write‑Back Enforce : Forces write‑back mode even if the controller lacks a capacitor or the capacitor is damaged; not recommended because data may be lost on unexpected power loss.
5. Data Power‑Loss Protection
Principle
Data is written to the controller’s high‑speed cache faster than to the disks. Enabling the cache boosts overall write performance, but in the event of sudden power loss, any data residing only in cache can be lost.
Enabling cache improves write performance; when the cache fills or write pressure decreases, data is flushed to disks.
However, sudden power loss can cause loss of cached data.
To safeguard cached data, a super‑capacitor module can be added. Upon unexpected power loss, the capacitor supplies power to transfer cached data to an internal NAND flash, preserving it permanently.
Super‑Capacitor Calibration
The controller automatically calibrates the capacitor through a three‑stage charge‑discharge cycle to maintain a stable voltage range and extend capacitor life.
Charge the capacitor to its maximum voltage.
Automatically start a calibration process that fully discharges the capacitor.
Recharge until the maximum voltage is reached again.
During calibration, the write policy switches to “Write‑Through” to ensure data integrity, which may reduce performance. Calibration duration depends on the capacitor’s charge‑discharge speed.
6. Disk Striping
When multiple processes access a disk simultaneously, contention can occur due to limits on I/O operations per second and data transfer rates. Striping distributes I/O load across multiple physical disks, dividing a continuous data stream into smaller segments stored on different disks, enabling parallel access and improving performance.
In a four‑disk RAID 0 array, for example, the first data block is written to disk 1, the second to disk 2, and so on, allowing concurrent writes that greatly boost system performance, though without redundancy.
Key striping concepts:
Stripe Width : Number of disks participating in striping (e.g., 4 for a four‑disk group).
RAID Group Stripe Size : Size of the data chunk written simultaneously to all disks in the group.
Disk Stripe Size : Size of the data chunk written to each individual disk.
Example: If a RAID group has a 1 MB stripe size and each disk receives 64 KB chunks, the group stripe size is 1 MB and the per‑disk stripe size is 64 KB.
7. Disk Mirroring
Mirroring, applicable to RAID 1 and RAID 10, writes identical data to two disks simultaneously, achieving 100 % redundancy. If one disk fails, the other continues to provide uninterrupted data access.
While mirroring guarantees data safety, it doubles storage cost because each piece of data requires a dedicated backup disk.
8. External Configuration
External configuration refers to RAID metadata that exists on a disk but is not currently managed by the active RAID controller. It appears in the UI as “Foreign Configuration”.
Scenarios that generate external configurations include:
A newly installed physical disk contains RAID metadata.
Replacing the RAID controller causes the new controller to detect existing RAID metadata.
Hot‑plugging a member disk that carries RAID metadata.
Administrators can delete unwanted external configurations or import them into a new controller to retain previous RAID settings.
9. Disk Power‑Saving
The controller’s disk‑power‑saving feature spins down idle SAS or SATA mechanical disks based on configuration and I/O activity.
When enabled, idle disks and idle hot‑spares enter a low‑power state; any operation that requires the disk (e.g., creating a RAID group or hot‑spare) wakes it.
10. Disk Pass‑Through (JBOD)
JBOD (Just a Bunch Of Disks) allows direct command pass‑through to disks without RAID processing, enabling the OS or management software to access raw disks.
Enabling pass‑through lets the OS use the physical disk as an installation medium, whereas without JBOD only configured virtual disks are visible.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.