Improvement in Performance Via Parallelism

Redundant Arrays of Inexpensive Disks – disk organization techniques that take advantage of utilizing large numbers of inexpensive, mass-market disks.

Originally a cost-effective alternative to large, expensive disks

Today RAIDs are used for their higher reliability and bandwidth, rather than for economic reasons. Hence the "I" is interpreted as independent, instead of inexpensive.

Improvement in Performance via Parallelism

Two main goals of parallelism in a disk system:

1. Load balance multiple small accesses to increase throughput

2. Parallelize large accesses to reduce response time

Improve transfer rate by striping data across multiple disks.

Bit-level striping – split the bits of each byte across multiple disks

– In an array of eight disks, write bit i of each byte to disk i.

– Each access can read data at eight times the rate of a single disk.

– But seek/access time worse than for a single disk.

Block-level striping – with n disks, block i of a file goes to disk ( i mod n) + 1.

RAID Levels

Schemes to provide redundancy at lower cost by using disk striping combined with parity bits

Different RAID organizations, or RAID levels, have differing cost, performance and reliability characteristics

RAID 0

In a RAID 0 system, data are split up in blocks that get written across all the drives in the array. By using multiple disks (at least 2) at the same time, RAID 0 offers superior I/O performance. This performance can be enhanced further by using multiple controllers, ideally one controller per disk.

Advantages

RAID 0 offers great performance, both in read and write operations. There is no overhead caused by parity controls.

All storage capacity can be used, there is no disk overhead.

The technology is easy to implement.

Disadvantages

RAID 0 is not fault-tolerant. If one disk fails, all data in the RAID 0 array are lost. It should not be used on mission-critical systems.

Ideal use

RAID 0 is ideal for non-critical storage of data that have to be read/written at a high speed, e.g. on a PhotoShop image retouching station.

RAID 1: mirroring

Data are stored twice by writing them to both the data disk (or set of data disks) and a mirror disk (or set of disks) . If a disk fails, the controller uses either the data drive or the mirror drive for data recovery and continues operation. You need at least 2 disks for a RAID 1 array.

RAID 1 systems are often combined with RAID 0 to improve performance. Such a system is sometimes referred to by the combined number: a RAID 10 system.

Advantages

RAID 1 offers excellent read speed and a write-speed that is comparable to that of a single disk.

In case a disk fails, data do not have to be rebuild, they just have to be copied to the replacement disk.

RAID 1 is a very simple technology.

Disadvantages

The main disadvantage is that the effective storage capacity is only half of the total disk capacity because all data get written twice.

Software RAID 1 solutions do not always allow a hot swap of a failed disk (meaning it cannot be replaced while the server keeps running). Ideally a hardware controller is used.

Ideal use

RAID-1 is ideal for mission critical storage, for instance for accounting systems. It is also suitable for small servers in which only two disks will be used.

RAID -2

Description: Level 2 is the "black sheep" of the RAID family, because it is the only RAID level that does not use one or more of the "standard" techniques of mirroring, striping and/or parity. RAID 2 uses something similar to striping with parity, but not the same as what is used by RAID levels 3 to 7. It is implemented by splitting data at the bit level and spreading it over a number of data disks and a number of redundancy disks. The redundant bits are calculated using Hamming codes, a form of error correcting code (ECC). Each time something is to be written to the array these codes are calculated and written along side the data to dedicated ECC disks; when the data is read back these ECC codes are read as well to confirm that no errors have occurred since the data was written. If a single-bit error occurs, it can be corrected "on the fly". If this sounds similar to the way that ECC is used within hard disks today, that's for a good reason: it's pretty much exactly the same. It's also the same concept used for ECC protection of system memory.

Level 2 is the only RAID level of the ones defined by the original Berkeley document that is not used today, for a variety of reasons. It is expensive and often requires many drives. The controller required was complex, specialized and expensive. The performance of RAID 2 is also rather substandard in transactional environments due to the bit-level striping. But most of all, level 2 was obviated by the use of ECC within a hard disk; essentially, much of what RAID 2 provides you now get for "free" within each hard disk, with other RAID levels providing protection above and beyond ECC.

Due to its cost and complexity, level 2 never really "caught on". Therefore, much of the information below is based upon theoretical analysis, not empirical evidence.

RAID 3

Bit-Interleaved Parity; a single parity bit can be used for error correction, not just detection.

– When writing data, parity bit must also be computed and written

– Faster data transfer than with a single disk, but fewer I/Os per second since every disk has to participate in every I/O.

– Subsumes Level 2 (provides all its benefits, at lower cost).

On RAID 3 systems, data blocks are subdivided (striped) and written in parallel on two or more drives. An additional drive stores parity information. You need at least 3 disks for a RAID 3 array.

Since parity is used, a RAID 3 stripe set can withstand a single disk failure without losing data or access to data.

Advantages

RAID-3 provides high throughput (both read and write) for large data transfers.

Disk failures do not significantly slow down throughput.

Disadvantages

This technology is fairly complex and too resource intensive to be done in software.

Performance is slower for random, small I/O operations.

Ideal use

RAID 3 is not that common in prepress

RAID 4

Block-Interleaved Parity; uses block-level striping, and keeps a parity block on a separate disk for corresponding blocks from N other disks.

– Provides higher I/O rates for independent block reads than Level 3 (block read goes to a single disk, so blocks stored on different disks can be read in parallel)

– Provides high transfer rates for reads of multiple blocks

– However, parity block becomes a bottleneck for independent block writes since every block write also writes to parity disk

Description: RAID 4 improves performance by striping data across many disks in blocks, and provides fault tolerance through a dedicated parity disk. This makes it in some ways the "middle sibling" in a family of close relatives, RAID levels 3, 4 and 5. It is like RAID 3 except that it uses blocks instead of bytes for striping, and like RAID 5 except that it uses dedicated parity instead of distributed parity. Going from byte to block striping improves random access performance compared to RAID 3, but the dedicated parity disk remains a bottleneck, especially for random write performance. Fault tolerance, format efficiency and many other attributes are the same as for RAID 3 and RAID 5.

RAID5

RAID 5 is the most common secure RAID level. It is similar to RAID-3 except that data are transferred to disks by independent read and write operations (not in parallel). The data chunks that are written are also larger. Instead of a dedicated parity disk, parity information is spread across all the drives. You need at least 3 disks for a RAID 5 array.

A RAID 5 array can withstand a single disk failure without losing data or access to data. Although RAID 5 can be achieved in software, a hardware controller is recommended. Often extra cache memory is used on these controllers to improve the write performance.

Advantages

Read data transactions are very fast while write data transaction are somewhat slower (due to the parity that has to be calculated).

Disadvantages

Disk failures have an effect on throughput, although this is still acceptable.

Like RAID 3, this is complex technology.

Ideal use

RAID 5 is a good all-round system that combines efficient storage with excellent security and decent performance. It is ideal for file and application servers.

RAID 6

P+Q Redundancy scheme; similar to Level 5, but stores extra redundant information to guard against multiple disk failures. Better reliability than Level 5 at a higher cost; not used as widely.

Fig : RAID Levels