Upload
c-lambert
View
791
Download
0
Embed Size (px)
DESCRIPTION
Redundant Array of Inexpensive Disks:Data Protection and Downtime Elimination
Citation preview
Redundant Array
of Inexpensive Disks:
Data Protection and Downtime Elimination
Christopher Lambert
December 2, 2000
P. Alexander
Management Information Systems
University of North Alabama
As information technology shifts into the future, one
aspect continues to remain the same: the importance and
value of data. Unlike hardware and software components,
datum cannot be easily replaced; therefore, for a business
to be successful, measures to adequately protect data are
essential. It is the duty of IT management to instigate
procedures for protecting the business’ data. Because
hardware and software failures are imminent and arbitrary,
in the past, information technology departments have made
due by implementing simple backup activities into daily
standard operating procedures. However, solely backing data
up onto tapes has become an insufficient means of data
protection since restoration requires downtime of
information systems, and downtime can be devastating to a
business depending upon the importance of the inaccessible
data. It has been stated “94% of businesses that had
suffered a catastrophic non-recoverable failure in their
corporate IT storage systems went out of business within 2
years.” In attempt to counter these unpredictable and
serious situations, RAID solutions should be implemented in
addition to periodic backups.
RAID is an acronym that stands for redundant array of
inexpensive disks. David A. Patterson, Garth Gibson, and
Randy H. Katz are credited with theorizing this technology
2
in 1987. RAID not only offers protection of data, but,
also, grants businesses higher levels of data integrity and
hardware/software fault tolerance. The benefits from using
RAID include increased system uptime and system performance,
two extremely important issues for IT managers, which cannot
be ignored in today’s business place. However, the benefits
from RAID extend far beyond these two issues because of the
impact it has on the entire business and, more directly, the
faculty of IT departments. Because of increased system
uptime, IT employees can utilize their, sometimes expensive,
time to other important business issues, and, when
supporting the business 24 hours a day by being on call,
data loss crises are minimized and so is after hours
support. Furthermore, businesses in continuous operation
are not obliged to schedule downtime to restore or backup
data when a RAID technology is implemented. Since the
utilization of a RAID technology is extremely advantageous
to a business with great importance on their data, the
question of whether or not to implement the technology is
easy to answer. That being obvious, the next question to be
answered is which of the many levels and variations of RAID
is suitable for the business.
There are many different applications of RAID, all with
different levels of protection and with a different focus on
3
the protection of the information system. The most basic
version, known as RAID 0, and more advanced versions, such
as RAID 5 and 7, may be implemented depending upon the level
and focus of protection required by the business. Each
level offers different performance, fault tolerance, and
cost. Managers must be aware of each type of RAID, the
advantages and disadvantages of each, and be able to make an
educated decision on which one best suits their business.
In addition to the different levels of RAID, there are
software and hardware versions of most of them. The
variations between the two are very different and these
options should also be taken into consideration.
Nonetheless, the first decision should be to determine which
level, or levels, would be the best practice for the
organization.
RAID 0 was created in the early advancements of the
RAID technology. This level is also known as striping.
Taking more than two disk drives, preferably five, and
striping them together to create one virtual disk will
accomplish this level of RAID. Data is then written to what
is known as the stripe set and is spanned across the volume,
where each drive operates parallel of the others. RAID 0 is
commonly used in environments where files are large and the
data is sequential. The benefits of RAID 0 are fairly
4
straightforward. Data access performance is increased
because data request queues are shortened for each disk
drive. Disk utilization is decreased because there are more
drives to help take on the load of data access. This is
achieved by writing data sequentially across the drive set
so the data can later be retrieved by each drive
simultaneously.
However, the increased performance of RAID 0 only applies to
applications using sequential access because it involves no
indexing of the data. Furthermore, striping the drives
together does nothing to protect the information stored on
the drives; therefore there is no data redundancy. In spite
of this, RAID 0 can be combined with other levels of RAID to
not only increase performance, but also employ data
redundancy and fault tolerance.
RAID 1 encompasses the potential for data redundancy
and is commonly known as mirroring, which is the response
for the reliability issues of RAID 0. In lieu of writing
the data across the set of drives, as in RAID 0, mirroring
duplicates the data across the set. For example, in the
most simple of cases, a system may have two hard disk drives
5
operating on the same controller. The same data written to
disk 0 would we simultaneously written to disk 1.
The RAID 1 scenario grants the user data protection, in that
when one drive fails, there is a replica, which can be
immediately brought online, depending upon the
sophistication of the environment, to eliminate any
downtime. Additionally, the failed disk drive can be
replaced during a more convenient time. Common uses for
RAID 1 include very sensitive data or data that is mandatory
for a system to operate, such as the boot drive, and where
data is not sequential. Because data is written twice as
often with RAID 1, it may seem that writes to the drive set
would take twice as long, but this is a myth. In
opposition, writes to a mirrored set generally take only 15%
to 20% longer than writes to a single member. Some write
performance to the mirrored array may be lost; however, as
in RAID 0, lowering disk utilization increases performance.
One other fallback to implementing RAID 1 is the higher
costs it demands, since disk drive requirements double.
Implementing RAID 1 and RAID 0 is a fairly simple task, but
6
they only lay the groundwork for the absolute potential of
RAID.
Within the next couple variations of the RAID
technology, specifically RAID 3 and RAID 5, a new concept is
introduced known as parity. RAID 3 uses the same theory of
RAID 0, but adds an extra drive to the array, which
maintains parity information about the data in the stripe
set. It divides the data across the stripe as in RAID 0,
and extra information is written to the additional disk in
corresponding blocks, which is the computed parity for the
data blocks residing on each of members of the stripe unit.
Given this parity information and all but one of the blocks
of data, the destroyed or failed drive can be re-computed or
derived. This technology adds fault tolerance to the stripe
set, which is the “ability of a system to continue to
perform reads and writes in the event of a hard disk
failure.” Although this protection is not as great as
having a full mirror of the data, it does reduce the amount
of expensive downtime. RAID 3 is frequently used in
situations where large amounts of data are accessed
sequentially. On the other hand, this RAID level does not
work well with database management systems since they
usually exercise random access. The reason RAID 3 and RAID
0 operate more efficiently in circumstances where large
7
quantities of data are being read is because of the physical
arrangement of the drive set. Every write to a drive in
these types of RAID require a write to the parity drive;
therefore, seek time is maximized when large amounts of data
are being requested and is minimized when small amounts are
being requested.
RAID 5, on the other hand, solves the problem RAID 3
has with over utilizing the parity drive. In this level of
RAID, sometimes referred to as rotated parity, the parity
information is shared across the stripe set in consecutive,
yet different, locations. By doing this, the parity and the
data functions are shared by each member in the set.
RAID 5 performs just as well as RAID 3 when it comes to
sequential reads, and RAID 5, also, outperforms the random
read performance of RAID 0. Furthermore, the write
performance suffers because RAID 5 adds some data integrity,
or the ability to ensure data is written correctly, into
data management. This data integrity is accomplished by a
series of short steps. First, the members of the drive
array are read and the parity information is computed. In
this step, each member disk is read in parallel. After the
8
new parity block is figured, the data, parity, and block
identities are written to a log, which is completed in one
input/output operation. The data and parity information is
then written to the member disks in parallel after the log
has been updated in case of a catastrophe, such as a power
outage. Finally, the data associated with the write
operation is removed from the log. This process is often
compared to the commit function of a DMBS. If a disaster
occurs during the write of the data, information exists in
the log to ensure the data was, or was not, written
correctly, thus the term data integrity.
This protection in RAID 5, however, does not come
without a price. After the read for the parity information,
the parity computation, the two writes for the two-phase
commit log, the write for the parity, and one or two writes
for the data members, the write performance is greatly
diminished, predictably about 60%. The benefit for trading
off this write performance is data integrity, read
performance, and data protection. If one of the drives in
the RAID 5 configuration fails, the missing data can be
assimilated on the fly. This is known as degraded mode.
For every read and every write in degraded mode, each disk
drive in the array must be accessed to compute the missing
data. Depending upon the size of the array, this can result
9
in an astounding amount of overhead. Therefore, it is a
common practice to limit the number of drives in a RAID 5
arrangement to six (6) drives to safeguard performance
during degraded mode. RAID 5 is one of the more confusing
and complex levels of RAID, yet it is still the most common
and works well in most environments. Still, other options
exist if the previous, most common, levels of RAID do not
suffice the needs of the organization.
RAID 7 is a fairly unconventional level of RAID that
has been copyrighted by Storage Computer Corporation. This
level of RAID, although proprietary, is used significantly
in the market and is worthy of being explained along side
the others. RAID 7 takes advantage of the framework of RAID
3 and RAID 4, which is not much different than RAID 3, but
greatly improves on their shortcomings. The greatest
difference with RAID 7 is its heavy use of cache, or a
technique to buffer data in attempt to supply a provisional
storage area that will allow a faster disk drive to operate
without being hindered by a slower device. Through the use
of large amounts of cache, RAID 7 allows many, functions to
be performed simultaneously greatly improving performance
while continuing to support fault tolerance. As in RAID 3
and 4, RAID 7 has a dedicated parity drive, yet does not
suffer from the same dilemma as the other levels using a
10
dedicated drive for correspondence because of the
asynchronous I/O transfers it supports. It has been
reported that RAID 7 performance is 1.5 to 6 times better
than the other levels of RAID, and is write performance is
25% to 90% better than using a single member. The downsides
to RAID 7 include extremely high cost per megabyte of
storage, it is not user serviceable, and does not exploit
the two-phase commitment of RAID 5. Because of the high
cache usage of RAID 7, it is recommended to implement a UPS,
or uninterruptible power supply, as well. This is one of
the most expensive implementation of the RAID technology and
comparable results can be attained through the
implementation of a combined RAID technology, known as RAID
0+1, which will be discussed later.
Concluding the single technology RAID levels, RAID 10
combines high performance with high reliability. RAID 10 is
a combination of RAID 1 and RAID 0, but is not the same as
RAID 0+1. In this scenario, two RAID 1 arrays are striped.
This level of RAID offers the same protection as RAID 1;
however, striping the array boosts performance.
Furthermore, under some circumstances, RAID 10 is known to
maintain uptime in the event of multiple drive failures.
11
Nonetheless, RAID 10 carries with it the same expensive
quality as RAID 1 because the number of disk double.
Because it stripes two RAID 1 arrays, the four (4) drives is
the minimum needed to implement this level. Ultimately, the
actual data space available is actually 25% of the total
drive space. To avoid the high costs of the upper, more
complex levels of RAID, the IT department may opt to simply
combine the lower levels of RAID, as in RAID 0+1.
RAID 0+1 is very similar to RAID 10, but is the direct
opposite. Instead of striping the mirrored arrays as in RAID
10, RAID 0+1 mirrors two striped sets. In this
configuration, there are actually four RAID 0 arrays. Two
of the arrays are striped, and then they are mirrored.
12
In contrast to RAID 10, this level has the same fault
tolerance as RAID 5, but has higher I/O rates as a result of
the multiple stripe sets. However, if one drive fails in
either set, this configuration will, in essence, break the
mirror capability and become a RAID 0 array, which only
supports striping. This RAID solution is excellent for
organizations that need higher performance than RAID 5, but
do not need the extended reliability.
It is fairly obvious many variations and configurations
for RAID exist. In fact, there are other levels not
mentioned here, such as RAID 53, a combination of,
surprisingly, 0 and 3, and RAID 6, 5+1, 1+5, 5+0, and 0+5,
all of which have specific advantages and disadvantages,
however rare they may be. Making the determination of which
variation to implement is only the first decision for the
management in putting RAID into action. Following are some
of the other items of interest managers should observe when
implementing RAID, as they are important components and
technologies.
Merely implementing a RAID technology does not
eliminate the fact drives will fail even though it does
reduce the pains of data recovery. To further address the
issue of device MTBF, or mean-time-between-failures, hot-
swapping has became a favorable practice. This technology,
13
also known as hot spares, is quite impressive. In addition
to the drives in the drive array, additional spare disks are
attached to the system, albeit inactive, waiting for one of
the active drives to fail. The spare drive, in the event of
a failure and given the environment supports some type of
autorecovery, will immediately take the place of the seized
disk. In the event the environment does not support
autorecovery, some employee intervention would be required,
but the same result of instant repair is achieved. These
spares, in some cases known as a pool, can either be
dedicated or non-dedicated, meaning their role has either
been pre-defined or their use is left up to the system in
the event of a failure. This technology can, fundamentally,
be applied to other components in the storage system and, in
this sense, is known as duplexing.
Duplexing involves adding redundant pieces of equipment
outside the RAID array. More specifically, it is common
practice and is recommended to duplex array controllers, or
adapters. These components are the array’s interface to the
I/O structure of the system. If a system is only configured
with one controller and it expires, the efforts to maintain
high data availability are in vain. This is not acceptable
in the IT industry. Duplexing the array controllers
counteract this possible disaster, and, in addition,
14
increase the system performance by expanding the bus for
data input/output. The term duplexing generally refers to
array adapters, yet can be applied to any component in the
system, including the system itself! Other common
components to duplex include fans, power supplies, and
processors.
Adding these extra, but highly important, devices can,
in some cases, as much as double to total cost of
implementing a RAID solution. However, without adapter
duplexing and making hot spare drives available, the RAID
solution selected is still weak to failure. If they cannot
be afforded, though, a basic RAID implementation is better
than no performance boost or data protection at all. That
being said, there is one more decision must be made before
installing a RAID solution, which relates to overall cost,
future total cost of ownership, and data availability, and
that is whether to install a hardware or software based
solution.
Hardware based RAID solutions always offer greater data
availability and serviceability over software based
solutions. When protection of data and data availability is
imperative, a hardware-based solution is the only viable
choice. These type of solutions usually have the ability to
detect more bit errors than software solutions, thus
15
increasing the systems data integrity, which is important to
most all organizations. Furthermore, hardware based
solutions typically offer more robust fault tolerance
measures. It is not uncommon for a hardware-based solution
to come standard with hot spare disk pools and duplexed
controllers. Another major advantage hardware solutions
have over software is their ability to take advantage of
bootable arrays. Not necessarily the most important part of
a storage system, it is an essential component, and having
the ability to stripe or mirror the boot drive is a great
advantage to uptime. Hardware solutions, more often than
not, are capable of automatically detecting a disk failure.
This automatic detection advantage can avert hours of
downtime, depending upon the time of the failure. Also
acquainted with hardware solutions are reduction in CPU
interrupts and in main PCI bus traffic, which in turn grants
over all better system performance. Hardware based
solutions, unexpectedly, have a lower total cost of
ownership than software based solutions in spite of the
higher costs that accompany hardware RAID solutions.
Finally, if the management team select any RAID
configuration other than RAID 0 or RAID 1, a hardware-based
solution is the only choice, because of the extreme demand
levels 3, 5, 7, 10, and 0+1 require of a system.
16
If one of the lower levels of RAID suffice the needs of
the organization and the IT department, a software-based
solution may be appropriate. The major benefit of software
RAID solutions is their costs. In some cases, software-
based solutions are free, as with Windows NT Server and some
network operating systems. The low front-end costs portray
the low system performance and limited functionalities of
software-based solutions. Error protection and bit error
detection are performed by the systems CPU, which takes
processing power away from business applications.
Furthermore, software-based solutions are not capable of
correcting data errors. Software RAID solutions can only
detect errors. In order to detect the bit errors, the
software-based solution relies upon the functionality of the
adapter itself, thus decreasing I/O performance. Because of
the high level of operator involvement, the total cost of
ownership is actually higher than that of hardware-based
solutions, in the long run.
One of the main considerations management should keep
in mind is the type of applications the RAID solution will
be supporting, I/O bound or CPU bound applications, for
example. As with hardware-based solutions, the number of
CPU interrupts is much less than that of software-based
solutions, which frees the CPU to perform computational-
17
intensive functions. In addition, by minimizing the I/O of
the PCI bus, other activities, such as network traffic, can
be processed much more efficiently. In a CPU bound
environment, a hardware-based solution is much more
appropriate than a software-based solution, because RAID 5
parity checks and secondary writes in RAID 1 are offloaded
onto a RAID coprocessor. Moreover, software based
solutions, such as those incorporated into the Windows NT
and Novell Netware operating systems, do not support the
advantage of setting the priority for drive spares which
speed up array reconstruction. With the determination
hardware-based solutions are more advantageous, although
more expensive, IT departments would be wise to implement
hardware RAID solutions over software solutions when
possible. As expected, there are many solutions available to
IT departments from many different vendors.
The two solutions to be compared and contrasted here
are Compaq’s RAID Array 4100 and Dell’s Powervault 650F.
Both of these systems are targeted toward medium sized
businesses with at least 400 employees. These two systems
interface into an existing network via the fiber channel
interface. This interface is fairly new technology designed
to overcome some difficulties with existing interfaces into
network storage devices. It operates at 100MB per second,
18
200MB per second given existing network full duplex
capacity, is designed on the SCSI framework by individuals
who knew the shortcomings of storage interfaces, is a serial
interface, is 2.5 times faster than the existing UltraSCSI
interface (40MB), and can be connected by either twisted
pair cable or fiber optics. The two systems, furthermore,
come with management software to ease the process of
installation and maintenance. The interface required by
both systems is PCI, yet an EISA interface is available for
the Compaq solution. Given these similarities, there
differences should be significantly considered.
Compaq Computer Corporation’s RAID Array 4100 is the
newest model in the RAID Array family, replacing the RAID
Array 4000.
This system has 64MB total cache memory, which is comprised
of 16MB ECC protected read and 48MB battery assisted user
selectable read/write memory. This system supports up to
twelve (12) one inch Ultra2 universal drives with support
for both Wide-Ultra SCSI3 and Fast-Wide SCSI2 drive
interfaces. It supports RAID levels 0, 1, 4, and 5. In
19
regard to high availability features, the RAID Array 4100
supports hot-pluggable, redundant power supplies, redundant
fans, and hot-pluggable hard drives. This feature grants
the system operator the ability to swap failed drives at the
time of failure without having to schedule downtime.
Furthermore, the redundant, hot-pluggable power supplies
grants further protection against black outs and brownouts,
and, if one power supply fails, the other is more than
capable of keeping the system up until such time a new one
can be installed. There is a standard Compaq one-year
warranty accompanying this product; however, extended
support may be purchased. This solution is redundantly
supported by the following operating systems: Microsoft
Windows NT® 4.0, Windows NT® Enterprise Edition, and
Microsoft Cluster Server. In addition, Novell NetWare
versions 3.12 to 5.1, Novell’s IntraNetWare, SCO OpenServer,
UnixWare 2.1, UniWare 7, Banyan Vines 6.x and 7.x, OS/2 SMP
2.11, and the OS/2 Warp Server Family non-redundantly
support it.
Contrasting, Dell’s Powervault 650F, also, has many
impressive features.
20
The Powervault has a remarkable total cache memory capacity
of 512MB to support the 400Mhz processors onboard the RAID
controllers. It has the capacity for 10 fiber channel disk
drives, not to mention a pre-configured expansion unit
available for an additional 10 drives. Without the
expansion unit, the Powervault 650F has a maximum capacity
of 4Tb, or 4000 gigabytes. Drive form factors supported by
this system include not only the one-inch, but, also, the
1.6 inch variations. Data protection presented with this
system includes RAID levels 0, 1, 5, and 10. As with
Compaq’s RAID Array 4100, the Powervault 650F has hot
swappable drives, redundant, hot-swappable power supplies,
and redundant cooling fans. This Dell solution comes with a
limited three-year warranty and a one-year warranty for
parts replacement.
Overall, these two systems would make good solutions
for IT departments supporting sites with less than 400 end
users. The overall reliability of the two exceeds that of
most others reviewed; however, cost information was
unavailable for the Compaq RA4100 without a RFQ. The Dell
Powervault 650F does offer great scalability as far as
21
potential RAID levels and drive space expandability.
Likewise, the Compaq RA4100 is supported by many industry
standard operating systems. Truly, a sales representative
should be contacted and questioned before making a purchase
of this magnitude.
Summing up, RAID technology has proven its importance
to the information technology industry time and time again
since its theorization in 1987. The management of the
information systems staff should take the advantages of each
level of RAID to heart and consider the benefit of adding
the corresponding components to further increase the uptime
and protection of the data. Failing hardware will be an
issue to be dealt with for many years into the future;
therefore, RAID will continue to be a popular option to
offset the peril of MTBF.
22
Bibliography
Adaptec, Inc. “ABC’s of RAID.”
http://www.adaptec.com/products/guide/abcraid.html
November 20, 2000.
Adaptec, Inc. “Hardware v. Software RAID.”
http://www.adaptec.com/technology/whitepapers/
raid_hw_sw01.html
November 20, 2000.
Angel, Jonathan. Network Magazine. “Lesson 144: RAID.”
San Franciso. July 2000. Vol 15 Issue 7. Pg. 34.
Compaq Computer Corp Web Site. “Compaq RA4100.”
www.compaq.com November 30, 2000.
Dell, Inc. Web Site. “Powervault 650F”
http://support.dell.com/docs/systems/sjade/650F/
5867c0.pdf November 30, 2000.
Dell, Inc. Web Site. “RAID Technology.”
http://www.dell.com/us/en/biz/topics/vectors_1999-
raid.htm
November 30, 2000.
23
Grigonis, Richard. Computer Telephony. “American ProImage
‘RAIDs’ the Industry.” San Francisco. October 1999. Vol
7 Issue 10. Pg. 141.
Patterson, David A. Gibson, Garth, Katz, Randy. “A Case for
Redundant Arrays of Inexpensive Disks.” Berkely. 1987.
Planet IT Web Site. “Right RAID For You.”
http://www.planetit.com/techcenters/docs/Storage/
expert/PIT19990113S0013
November 28, 2000.
RAID7 Web Site. “RAID 7 Architecture.”
http://www.raid7.com/wp_raid7afa.html
November 19, 2000.
Rapaport, Lowell. Imaging & Document Solutions. “RAID
today…and tomorrow.” San Francisco. January 1999. Vol 8
Issue 1. Pg. 55.
Soran, Phil. Inform. “RAID is the answer for fast, secure
storage. Silver Spring. April 1999. Vol 13 Issue 4. Pg.
8-9.
24
”Wong, Brian. “RAID: What does it mean to me?”
http://www.sunworld.com/sunworldonline/swol-09-1995/
swol-09-raid5_p.html.
Novemer 29, 2000.
Yager, Tom. Unix Review’s Performance Computing. “RAID!”
San Francisco. April 1999. Vol 17 Issue 4. Pg. 21-24.
25