30
Redundant Array of Inexpensive Disks: Data Protection and Downtime Elimination Christopher Lambert December 2, 2000 P. Alexander Management Information Systems University of North Alabama

RAID Research Paper

Embed Size (px)

DESCRIPTION

Redundant Array of Inexpensive Disks:Data Protection and Downtime Elimination

Citation preview

Page 1: RAID Research Paper

Redundant Array

of Inexpensive Disks:

Data Protection and Downtime Elimination

Christopher Lambert

December 2, 2000

P. Alexander

Management Information Systems

University of North Alabama

Page 2: RAID Research Paper

As information technology shifts into the future, one

aspect continues to remain the same: the importance and

value of data. Unlike hardware and software components,

datum cannot be easily replaced; therefore, for a business

to be successful, measures to adequately protect data are

essential. It is the duty of IT management to instigate

procedures for protecting the business’ data. Because

hardware and software failures are imminent and arbitrary,

in the past, information technology departments have made

due by implementing simple backup activities into daily

standard operating procedures. However, solely backing data

up onto tapes has become an insufficient means of data

protection since restoration requires downtime of

information systems, and downtime can be devastating to a

business depending upon the importance of the inaccessible

data. It has been stated “94% of businesses that had

suffered a catastrophic non-recoverable failure in their

corporate IT storage systems went out of business within 2

years.” In attempt to counter these unpredictable and

serious situations, RAID solutions should be implemented in

addition to periodic backups.

RAID is an acronym that stands for redundant array of

inexpensive disks. David A. Patterson, Garth Gibson, and

Randy H. Katz are credited with theorizing this technology

2

Page 3: RAID Research Paper

in 1987. RAID not only offers protection of data, but,

also, grants businesses higher levels of data integrity and

hardware/software fault tolerance. The benefits from using

RAID include increased system uptime and system performance,

two extremely important issues for IT managers, which cannot

be ignored in today’s business place. However, the benefits

from RAID extend far beyond these two issues because of the

impact it has on the entire business and, more directly, the

faculty of IT departments. Because of increased system

uptime, IT employees can utilize their, sometimes expensive,

time to other important business issues, and, when

supporting the business 24 hours a day by being on call,

data loss crises are minimized and so is after hours

support. Furthermore, businesses in continuous operation

are not obliged to schedule downtime to restore or backup

data when a RAID technology is implemented. Since the

utilization of a RAID technology is extremely advantageous

to a business with great importance on their data, the

question of whether or not to implement the technology is

easy to answer. That being obvious, the next question to be

answered is which of the many levels and variations of RAID

is suitable for the business.

There are many different applications of RAID, all with

different levels of protection and with a different focus on

3

Page 4: RAID Research Paper

the protection of the information system. The most basic

version, known as RAID 0, and more advanced versions, such

as RAID 5 and 7, may be implemented depending upon the level

and focus of protection required by the business. Each

level offers different performance, fault tolerance, and

cost. Managers must be aware of each type of RAID, the

advantages and disadvantages of each, and be able to make an

educated decision on which one best suits their business.

In addition to the different levels of RAID, there are

software and hardware versions of most of them. The

variations between the two are very different and these

options should also be taken into consideration.

Nonetheless, the first decision should be to determine which

level, or levels, would be the best practice for the

organization.

RAID 0 was created in the early advancements of the

RAID technology. This level is also known as striping.

Taking more than two disk drives, preferably five, and

striping them together to create one virtual disk will

accomplish this level of RAID. Data is then written to what

is known as the stripe set and is spanned across the volume,

where each drive operates parallel of the others. RAID 0 is

commonly used in environments where files are large and the

data is sequential. The benefits of RAID 0 are fairly

4

Page 5: RAID Research Paper

straightforward. Data access performance is increased

because data request queues are shortened for each disk

drive. Disk utilization is decreased because there are more

drives to help take on the load of data access. This is

achieved by writing data sequentially across the drive set

so the data can later be retrieved by each drive

simultaneously.

However, the increased performance of RAID 0 only applies to

applications using sequential access because it involves no

indexing of the data. Furthermore, striping the drives

together does nothing to protect the information stored on

the drives; therefore there is no data redundancy. In spite

of this, RAID 0 can be combined with other levels of RAID to

not only increase performance, but also employ data

redundancy and fault tolerance.

RAID 1 encompasses the potential for data redundancy

and is commonly known as mirroring, which is the response

for the reliability issues of RAID 0. In lieu of writing

the data across the set of drives, as in RAID 0, mirroring

duplicates the data across the set. For example, in the

most simple of cases, a system may have two hard disk drives

5

Page 6: RAID Research Paper

operating on the same controller. The same data written to

disk 0 would we simultaneously written to disk 1.

The RAID 1 scenario grants the user data protection, in that

when one drive fails, there is a replica, which can be

immediately brought online, depending upon the

sophistication of the environment, to eliminate any

downtime. Additionally, the failed disk drive can be

replaced during a more convenient time. Common uses for

RAID 1 include very sensitive data or data that is mandatory

for a system to operate, such as the boot drive, and where

data is not sequential. Because data is written twice as

often with RAID 1, it may seem that writes to the drive set

would take twice as long, but this is a myth. In

opposition, writes to a mirrored set generally take only 15%

to 20% longer than writes to a single member. Some write

performance to the mirrored array may be lost; however, as

in RAID 0, lowering disk utilization increases performance.

One other fallback to implementing RAID 1 is the higher

costs it demands, since disk drive requirements double.

Implementing RAID 1 and RAID 0 is a fairly simple task, but

6

Page 7: RAID Research Paper

they only lay the groundwork for the absolute potential of

RAID.

Within the next couple variations of the RAID

technology, specifically RAID 3 and RAID 5, a new concept is

introduced known as parity. RAID 3 uses the same theory of

RAID 0, but adds an extra drive to the array, which

maintains parity information about the data in the stripe

set. It divides the data across the stripe as in RAID 0,

and extra information is written to the additional disk in

corresponding blocks, which is the computed parity for the

data blocks residing on each of members of the stripe unit.

Given this parity information and all but one of the blocks

of data, the destroyed or failed drive can be re-computed or

derived. This technology adds fault tolerance to the stripe

set, which is the “ability of a system to continue to

perform reads and writes in the event of a hard disk

failure.” Although this protection is not as great as

having a full mirror of the data, it does reduce the amount

of expensive downtime. RAID 3 is frequently used in

situations where large amounts of data are accessed

sequentially. On the other hand, this RAID level does not

work well with database management systems since they

usually exercise random access. The reason RAID 3 and RAID

0 operate more efficiently in circumstances where large

7

Page 8: RAID Research Paper

quantities of data are being read is because of the physical

arrangement of the drive set. Every write to a drive in

these types of RAID require a write to the parity drive;

therefore, seek time is maximized when large amounts of data

are being requested and is minimized when small amounts are

being requested.

RAID 5, on the other hand, solves the problem RAID 3

has with over utilizing the parity drive. In this level of

RAID, sometimes referred to as rotated parity, the parity

information is shared across the stripe set in consecutive,

yet different, locations. By doing this, the parity and the

data functions are shared by each member in the set.

RAID 5 performs just as well as RAID 3 when it comes to

sequential reads, and RAID 5, also, outperforms the random

read performance of RAID 0. Furthermore, the write

performance suffers because RAID 5 adds some data integrity,

or the ability to ensure data is written correctly, into

data management. This data integrity is accomplished by a

series of short steps. First, the members of the drive

array are read and the parity information is computed. In

this step, each member disk is read in parallel. After the

8

Page 9: RAID Research Paper

new parity block is figured, the data, parity, and block

identities are written to a log, which is completed in one

input/output operation. The data and parity information is

then written to the member disks in parallel after the log

has been updated in case of a catastrophe, such as a power

outage. Finally, the data associated with the write

operation is removed from the log. This process is often

compared to the commit function of a DMBS. If a disaster

occurs during the write of the data, information exists in

the log to ensure the data was, or was not, written

correctly, thus the term data integrity.

This protection in RAID 5, however, does not come

without a price. After the read for the parity information,

the parity computation, the two writes for the two-phase

commit log, the write for the parity, and one or two writes

for the data members, the write performance is greatly

diminished, predictably about 60%. The benefit for trading

off this write performance is data integrity, read

performance, and data protection. If one of the drives in

the RAID 5 configuration fails, the missing data can be

assimilated on the fly. This is known as degraded mode.

For every read and every write in degraded mode, each disk

drive in the array must be accessed to compute the missing

data. Depending upon the size of the array, this can result

9

Page 10: RAID Research Paper

in an astounding amount of overhead. Therefore, it is a

common practice to limit the number of drives in a RAID 5

arrangement to six (6) drives to safeguard performance

during degraded mode. RAID 5 is one of the more confusing

and complex levels of RAID, yet it is still the most common

and works well in most environments. Still, other options

exist if the previous, most common, levels of RAID do not

suffice the needs of the organization.

RAID 7 is a fairly unconventional level of RAID that

has been copyrighted by Storage Computer Corporation. This

level of RAID, although proprietary, is used significantly

in the market and is worthy of being explained along side

the others. RAID 7 takes advantage of the framework of RAID

3 and RAID 4, which is not much different than RAID 3, but

greatly improves on their shortcomings. The greatest

difference with RAID 7 is its heavy use of cache, or a

technique to buffer data in attempt to supply a provisional

storage area that will allow a faster disk drive to operate

without being hindered by a slower device. Through the use

of large amounts of cache, RAID 7 allows many, functions to

be performed simultaneously greatly improving performance

while continuing to support fault tolerance. As in RAID 3

and 4, RAID 7 has a dedicated parity drive, yet does not

suffer from the same dilemma as the other levels using a

10

Page 11: RAID Research Paper

dedicated drive for correspondence because of the

asynchronous I/O transfers it supports. It has been

reported that RAID 7 performance is 1.5 to 6 times better

than the other levels of RAID, and is write performance is

25% to 90% better than using a single member. The downsides

to RAID 7 include extremely high cost per megabyte of

storage, it is not user serviceable, and does not exploit

the two-phase commitment of RAID 5. Because of the high

cache usage of RAID 7, it is recommended to implement a UPS,

or uninterruptible power supply, as well. This is one of

the most expensive implementation of the RAID technology and

comparable results can be attained through the

implementation of a combined RAID technology, known as RAID

0+1, which will be discussed later.

Concluding the single technology RAID levels, RAID 10

combines high performance with high reliability. RAID 10 is

a combination of RAID 1 and RAID 0, but is not the same as

RAID 0+1. In this scenario, two RAID 1 arrays are striped.

This level of RAID offers the same protection as RAID 1;

however, striping the array boosts performance.

Furthermore, under some circumstances, RAID 10 is known to

maintain uptime in the event of multiple drive failures.

11

Page 12: RAID Research Paper

Nonetheless, RAID 10 carries with it the same expensive

quality as RAID 1 because the number of disk double.

Because it stripes two RAID 1 arrays, the four (4) drives is

the minimum needed to implement this level. Ultimately, the

actual data space available is actually 25% of the total

drive space. To avoid the high costs of the upper, more

complex levels of RAID, the IT department may opt to simply

combine the lower levels of RAID, as in RAID 0+1.

RAID 0+1 is very similar to RAID 10, but is the direct

opposite. Instead of striping the mirrored arrays as in RAID

10, RAID 0+1 mirrors two striped sets. In this

configuration, there are actually four RAID 0 arrays. Two

of the arrays are striped, and then they are mirrored.

12

Page 13: RAID Research Paper

In contrast to RAID 10, this level has the same fault

tolerance as RAID 5, but has higher I/O rates as a result of

the multiple stripe sets. However, if one drive fails in

either set, this configuration will, in essence, break the

mirror capability and become a RAID 0 array, which only

supports striping. This RAID solution is excellent for

organizations that need higher performance than RAID 5, but

do not need the extended reliability.

It is fairly obvious many variations and configurations

for RAID exist. In fact, there are other levels not

mentioned here, such as RAID 53, a combination of,

surprisingly, 0 and 3, and RAID 6, 5+1, 1+5, 5+0, and 0+5,

all of which have specific advantages and disadvantages,

however rare they may be. Making the determination of which

variation to implement is only the first decision for the

management in putting RAID into action. Following are some

of the other items of interest managers should observe when

implementing RAID, as they are important components and

technologies.

Merely implementing a RAID technology does not

eliminate the fact drives will fail even though it does

reduce the pains of data recovery. To further address the

issue of device MTBF, or mean-time-between-failures, hot-

swapping has became a favorable practice. This technology,

13

Page 14: RAID Research Paper

also known as hot spares, is quite impressive. In addition

to the drives in the drive array, additional spare disks are

attached to the system, albeit inactive, waiting for one of

the active drives to fail. The spare drive, in the event of

a failure and given the environment supports some type of

autorecovery, will immediately take the place of the seized

disk. In the event the environment does not support

autorecovery, some employee intervention would be required,

but the same result of instant repair is achieved. These

spares, in some cases known as a pool, can either be

dedicated or non-dedicated, meaning their role has either

been pre-defined or their use is left up to the system in

the event of a failure. This technology can, fundamentally,

be applied to other components in the storage system and, in

this sense, is known as duplexing.

Duplexing involves adding redundant pieces of equipment

outside the RAID array. More specifically, it is common

practice and is recommended to duplex array controllers, or

adapters. These components are the array’s interface to the

I/O structure of the system. If a system is only configured

with one controller and it expires, the efforts to maintain

high data availability are in vain. This is not acceptable

in the IT industry. Duplexing the array controllers

counteract this possible disaster, and, in addition,

14

Page 15: RAID Research Paper

increase the system performance by expanding the bus for

data input/output. The term duplexing generally refers to

array adapters, yet can be applied to any component in the

system, including the system itself! Other common

components to duplex include fans, power supplies, and

processors.

Adding these extra, but highly important, devices can,

in some cases, as much as double to total cost of

implementing a RAID solution. However, without adapter

duplexing and making hot spare drives available, the RAID

solution selected is still weak to failure. If they cannot

be afforded, though, a basic RAID implementation is better

than no performance boost or data protection at all. That

being said, there is one more decision must be made before

installing a RAID solution, which relates to overall cost,

future total cost of ownership, and data availability, and

that is whether to install a hardware or software based

solution.

Hardware based RAID solutions always offer greater data

availability and serviceability over software based

solutions. When protection of data and data availability is

imperative, a hardware-based solution is the only viable

choice. These type of solutions usually have the ability to

detect more bit errors than software solutions, thus

15

Page 16: RAID Research Paper

increasing the systems data integrity, which is important to

most all organizations. Furthermore, hardware based

solutions typically offer more robust fault tolerance

measures. It is not uncommon for a hardware-based solution

to come standard with hot spare disk pools and duplexed

controllers. Another major advantage hardware solutions

have over software is their ability to take advantage of

bootable arrays. Not necessarily the most important part of

a storage system, it is an essential component, and having

the ability to stripe or mirror the boot drive is a great

advantage to uptime. Hardware solutions, more often than

not, are capable of automatically detecting a disk failure.

This automatic detection advantage can avert hours of

downtime, depending upon the time of the failure. Also

acquainted with hardware solutions are reduction in CPU

interrupts and in main PCI bus traffic, which in turn grants

over all better system performance. Hardware based

solutions, unexpectedly, have a lower total cost of

ownership than software based solutions in spite of the

higher costs that accompany hardware RAID solutions.

Finally, if the management team select any RAID

configuration other than RAID 0 or RAID 1, a hardware-based

solution is the only choice, because of the extreme demand

levels 3, 5, 7, 10, and 0+1 require of a system.

16

Page 17: RAID Research Paper

If one of the lower levels of RAID suffice the needs of

the organization and the IT department, a software-based

solution may be appropriate. The major benefit of software

RAID solutions is their costs. In some cases, software-

based solutions are free, as with Windows NT Server and some

network operating systems. The low front-end costs portray

the low system performance and limited functionalities of

software-based solutions. Error protection and bit error

detection are performed by the systems CPU, which takes

processing power away from business applications.

Furthermore, software-based solutions are not capable of

correcting data errors. Software RAID solutions can only

detect errors. In order to detect the bit errors, the

software-based solution relies upon the functionality of the

adapter itself, thus decreasing I/O performance. Because of

the high level of operator involvement, the total cost of

ownership is actually higher than that of hardware-based

solutions, in the long run.

One of the main considerations management should keep

in mind is the type of applications the RAID solution will

be supporting, I/O bound or CPU bound applications, for

example. As with hardware-based solutions, the number of

CPU interrupts is much less than that of software-based

solutions, which frees the CPU to perform computational-

17

Page 18: RAID Research Paper

intensive functions. In addition, by minimizing the I/O of

the PCI bus, other activities, such as network traffic, can

be processed much more efficiently. In a CPU bound

environment, a hardware-based solution is much more

appropriate than a software-based solution, because RAID 5

parity checks and secondary writes in RAID 1 are offloaded

onto a RAID coprocessor. Moreover, software based

solutions, such as those incorporated into the Windows NT

and Novell Netware operating systems, do not support the

advantage of setting the priority for drive spares which

speed up array reconstruction. With the determination

hardware-based solutions are more advantageous, although

more expensive, IT departments would be wise to implement

hardware RAID solutions over software solutions when

possible. As expected, there are many solutions available to

IT departments from many different vendors.

The two solutions to be compared and contrasted here

are Compaq’s RAID Array 4100 and Dell’s Powervault 650F.

Both of these systems are targeted toward medium sized

businesses with at least 400 employees. These two systems

interface into an existing network via the fiber channel

interface. This interface is fairly new technology designed

to overcome some difficulties with existing interfaces into

network storage devices. It operates at 100MB per second,

18

Page 19: RAID Research Paper

200MB per second given existing network full duplex

capacity, is designed on the SCSI framework by individuals

who knew the shortcomings of storage interfaces, is a serial

interface, is 2.5 times faster than the existing UltraSCSI

interface (40MB), and can be connected by either twisted

pair cable or fiber optics. The two systems, furthermore,

come with management software to ease the process of

installation and maintenance. The interface required by

both systems is PCI, yet an EISA interface is available for

the Compaq solution. Given these similarities, there

differences should be significantly considered.

Compaq Computer Corporation’s RAID Array 4100 is the

newest model in the RAID Array family, replacing the RAID

Array 4000.

This system has 64MB total cache memory, which is comprised

of 16MB ECC protected read and 48MB battery assisted user

selectable read/write memory. This system supports up to

twelve (12) one inch Ultra2 universal drives with support

for both Wide-Ultra SCSI3 and Fast-Wide SCSI2 drive

interfaces. It supports RAID levels 0, 1, 4, and 5. In

19

Page 20: RAID Research Paper

regard to high availability features, the RAID Array 4100

supports hot-pluggable, redundant power supplies, redundant

fans, and hot-pluggable hard drives. This feature grants

the system operator the ability to swap failed drives at the

time of failure without having to schedule downtime.

Furthermore, the redundant, hot-pluggable power supplies

grants further protection against black outs and brownouts,

and, if one power supply fails, the other is more than

capable of keeping the system up until such time a new one

can be installed. There is a standard Compaq one-year

warranty accompanying this product; however, extended

support may be purchased. This solution is redundantly

supported by the following operating systems: Microsoft

Windows NT® 4.0, Windows NT® Enterprise Edition, and

Microsoft Cluster Server. In addition, Novell NetWare

versions 3.12 to 5.1, Novell’s IntraNetWare, SCO OpenServer,

UnixWare 2.1, UniWare 7, Banyan Vines 6.x and 7.x, OS/2 SMP

2.11, and the OS/2 Warp Server Family non-redundantly

support it.

Contrasting, Dell’s Powervault 650F, also, has many

impressive features.

20

Page 21: RAID Research Paper

The Powervault has a remarkable total cache memory capacity

of 512MB to support the 400Mhz processors onboard the RAID

controllers. It has the capacity for 10 fiber channel disk

drives, not to mention a pre-configured expansion unit

available for an additional 10 drives. Without the

expansion unit, the Powervault 650F has a maximum capacity

of 4Tb, or 4000 gigabytes. Drive form factors supported by

this system include not only the one-inch, but, also, the

1.6 inch variations. Data protection presented with this

system includes RAID levels 0, 1, 5, and 10. As with

Compaq’s RAID Array 4100, the Powervault 650F has hot

swappable drives, redundant, hot-swappable power supplies,

and redundant cooling fans. This Dell solution comes with a

limited three-year warranty and a one-year warranty for

parts replacement.

Overall, these two systems would make good solutions

for IT departments supporting sites with less than 400 end

users. The overall reliability of the two exceeds that of

most others reviewed; however, cost information was

unavailable for the Compaq RA4100 without a RFQ. The Dell

Powervault 650F does offer great scalability as far as

21

Page 22: RAID Research Paper

potential RAID levels and drive space expandability.

Likewise, the Compaq RA4100 is supported by many industry

standard operating systems. Truly, a sales representative

should be contacted and questioned before making a purchase

of this magnitude.

Summing up, RAID technology has proven its importance

to the information technology industry time and time again

since its theorization in 1987. The management of the

information systems staff should take the advantages of each

level of RAID to heart and consider the benefit of adding

the corresponding components to further increase the uptime

and protection of the data. Failing hardware will be an

issue to be dealt with for many years into the future;

therefore, RAID will continue to be a popular option to

offset the peril of MTBF.

22

Page 23: RAID Research Paper

Bibliography

Adaptec, Inc. “ABC’s of RAID.”

http://www.adaptec.com/products/guide/abcraid.html

November 20, 2000.

Adaptec, Inc. “Hardware v. Software RAID.”

http://www.adaptec.com/technology/whitepapers/

raid_hw_sw01.html

November 20, 2000.

Angel, Jonathan. Network Magazine. “Lesson 144: RAID.”

San Franciso. July 2000. Vol 15 Issue 7. Pg. 34.

Compaq Computer Corp Web Site. “Compaq RA4100.”

www.compaq.com November 30, 2000.

Dell, Inc. Web Site. “Powervault 650F”

http://support.dell.com/docs/systems/sjade/650F/

5867c0.pdf November 30, 2000.

Dell, Inc. Web Site. “RAID Technology.”

http://www.dell.com/us/en/biz/topics/vectors_1999-

raid.htm

November 30, 2000.

23

Page 24: RAID Research Paper

Grigonis, Richard. Computer Telephony. “American ProImage

‘RAIDs’ the Industry.” San Francisco. October 1999. Vol

7 Issue 10. Pg. 141.

Patterson, David A. Gibson, Garth, Katz, Randy. “A Case for

Redundant Arrays of Inexpensive Disks.” Berkely. 1987.

Planet IT Web Site. “Right RAID For You.”

http://www.planetit.com/techcenters/docs/Storage/

expert/PIT19990113S0013

November 28, 2000.

RAID7 Web Site. “RAID 7 Architecture.”

http://www.raid7.com/wp_raid7afa.html

November 19, 2000.

Rapaport, Lowell. Imaging & Document Solutions. “RAID

today…and tomorrow.” San Francisco. January 1999. Vol 8

Issue 1. Pg. 55.

Soran, Phil. Inform. “RAID is the answer for fast, secure

storage. Silver Spring. April 1999. Vol 13 Issue 4. Pg.

8-9.

24

Page 25: RAID Research Paper

”Wong, Brian. “RAID: What does it mean to me?”

http://www.sunworld.com/sunworldonline/swol-09-1995/

swol-09-raid5_p.html.

Novemer 29, 2000.

Yager, Tom. Unix Review’s Performance Computing. “RAID!”

San Francisco. April 1999. Vol 17 Issue 4. Pg. 21-24.

25