24
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA ASSIGNMENT 1 RAID (REDUNDANT ARRAY OF INDEPENDENT DISKS) Group Members : Nur Hidayah Binti Mohd Din 0822614 : Nur Farhana Binti Noordin 0823844 : Nur Khairunnisa Binti Juarah 0824780 : Nor Iffah Binti Md Najib 0828770 Group : COAD 09 Section : 1 Lecturer : Moaiad Ahmad Khder Subject : CSC1401---Introduction to Computer

Raid Assgmnt

Embed Size (px)

Citation preview

Page 1: Raid Assgmnt

INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

ASSIGNMENT 1

RAID(REDUNDANT ARRAY OF INDEPENDENT DISKS)

Group Members : Nur Hidayah Binti Mohd Din 0822614

: Nur Farhana Binti Noordin 0823844 : Nur Khairunnisa Binti Juarah

0824780 : Nor Iffah Binti Md Najib 0828770

Group : COAD 09Section : 1Lecturer : Moaiad Ahmad KhderSubject : CSC1401---Introduction to Computer Organization.

Page 2: Raid Assgmnt

Introduction

An Overall description

The purpose of this paper is to give an over view of the history of RAID, what actually RAID is, why we need it, how it works, the different levels of RAID and also the advantages and disadvantages of each levels of the RAID.

History

Back in the middle eighties SLED (Single Large Expensive Disk) was the most popular media for storing data. At that time, the disk drives did not by far have the storing capacity or even the performance that the disks available today. Therefore, in order to save a large amount of data, one will need a bunch of disks which sometimes can be a complete mess. This is also an inconvenient way of handling data. On top of that, the price of the disk was also very expensive which explained SLED.

Another big problem was, and still is, loss of data because of the disk failure. A solution for this was pretty much needed. Hence, IBM co-sponsored Berkeley University of California to build a disk array subsystem in order to help solving these problems to which IBM had received a patent in 1978. In 1987 Randy Katz and Dave Patterson, both working at Berkeley University of California, had succeeded. They called the solution as RAID which stand for “Redundant Array of Inexpensive Disks”, although some people prefer to change the original word which is ‘Inexpensive’, to ‘Independent’. Randy and Dave had clustered multiple smaller and less expensive disks into an array.

By doing this, all disks appear to the rest of the world as if there was just one single large disk. The result was compared to SLED according to cost versus the performance. It turned out that RAID had the same or superior performance as SLED, but with a theoretical Meantime Before Data Loss (MTBDL) that was reduced to an acceptable level. The search for a way of decreasing the MTBDL now started. There was need for a way to prevent single disk drive failures from causing data loss within the array of disks. The result was the seven RAID levels 0 through 6. In addition to that, there are few more inventions that are still going on such as the inventions of RAID 7, 10, 12, and 15.

In computing, RAID is a system which uses multiple hard drives to share or replicate data among the drives. Depending on the version chosen, the benefit of RAID is one or more of increased data integrity, fault-tolerance, throughput or capacity compared to single drives. In its original implementations (in which it was an abbreviation for "redundant array of inexpensive disks"), its key advantage was the ability to combine multiple low-cost devices using older technology into an array that offered greater capacity, reliability, speed, or a combination of these things, than was affordably available in a single device using the newest technology.

Page 3: Raid Assgmnt

At the very simplest level, RAID combines multiple hard drives into a single logical unit. Thus, instead of seeing several different hard drives, the operating system sees only one. RAID is typically used on server computers, and is usually (but not necessarily) implemented with identically-sized disk drives. With decreases in hard drive prices and wider availability of RAID options built into motherboard chipsets, RAID is also being found and offered as an option in more advanced user computers. This is especially true in computers dedicated to storage-intensive tasks, such as video and audio editing.

The original RAID specification suggested a number of prototype "RAID levels", or combinations of disks. Each had theoretical advantages and disadvantages. Over the years, different implementations of the RAID concept have appeared. Most differ substantially from the original idealized RAID levels, but the numbered names have remained.

Why we need RAID?

RAID has for a long time been something that you only find in large server systems, but lately cheaper RAID controller card have made it possible to get a RAID system even for small servers and home computers. These will of course not have all the features, which the more expensive ones have. Different levels of RAID have different advantages and disadvantages. Therefore one must make an analysis of the workload before deciding what to buy. The choice also much depend on the quality attributes needed. Some examples of quality attributes one can get by using a RAID system is data redundancy, fault tolerance, increased capacity and increased performance.

How does RAID work?

The main idea behind RAID is, as mentioned in the introduction, to take some inexpensive disks and group them together, which will make the system see them as one single disk. This is done by using a RAID controller card that handle all I/O to the disks, and which knows where the stored data can be found. RAID works in three different ways to provide the quality attributes mentioned above. These ways are mirroring, striping and parity, of which each can be used either separately or mixed with one or more of the others. This is why RAID is divided into different levels.

Mirroring

Page 4: Raid Assgmnt

The easiest way to get both availability and fault tolerance is to make a copy of all data on a second disk. This is called mirroring and you normally get one MB for every two MB of physical disk space. You will always have the second disk to read from if the other disk fails. The disadvantages of this method are waste of disk space and that you will not get higher write performance. You can however get higher read performance because reads can occur simultaneously on every drive.

Striping

Whereas mirroring and parity deal with improvement of reliability, striping is used to get higher performance. The idea is to split data into small pieces, which then are distributed across the disks. This way the disks can work in parallel with different pieces of data. You will not lose any disk space as with mirroring and striping. One big disadvantage with this method is that if one disk breaks, all data will be lost. Therefore it is most often not used alone but in combination with mirroring or parity. Only such data that can be recreated by the application, such as cache or other temporary stored data is recommended to store using only striping.

Parity

Mirroring and striping are fairly easy to understand. Parity however is a bit more complicated. In the same way as with mirroring, it is used to improve the availability but without the waste of space. If you have X number of data elements, they can be used to create a parity. Then you end up with X+1 data elements. It is always possible to recover a lost element by using the others. The advantage with parity is of course that you have no single point of failure. However, to achieve this, it will cost a lot of computing power.

Different levels of RAID

Page 5: Raid Assgmnt

1) RAID 0

RAID 0 was created in the early advancements of the RAID technology. This level is also known as striping. Taking more than two disk drives, preferably five, and striping them together to create one virtual disk will accomplish this level of RAID. Data is then written to what is known as the stripe set and is spanned across the volume, where each drive operates parallel of the others. RAID 0 is commonly used in environments where files are large and the data is sequential.

The benefits of RAID 0 are fairly straightforward. Data access performance is increased because data request queues are shortened for each disk drive. Disk utilization is decreased because there are more drives to help take on the load of data access. This is achieved by writing data sequentially across the drive set so the data can later be retrieved by each drive simultaneously.

However, the increased performance of RAID 0 only applies to applications using sequential access because it involves no indexing of the data. Furthermore, striping the drives together does nothing to protect the information stored on the drives; therefore there is no data redundancy. In spite of this, RAID 0 can be combined with other levels of RAID to not only increase performance, but also employ data redundancy and fault tolerance.

Page 6: Raid Assgmnt

2) RAID 1

RAID 1 encompasses the potential for data redundancy and is commonly known as mirroring, which is the response for the reliability issues of RAID 0. In lieu of writing the data across the set of drives, as in RAID 0, mirroring duplicates the data across the set. For example, in the most simple of cases, a system may have two hard disk drives operating on the same controller. The same data written to disk 0 would we simultaneously written to disk.

The RAID 1 scenario grants the user data protection, in that when one drive fails, there is a replica, which can be immediately brought online, depending upon the sophistication of the environment, to eliminate any downtime. Additionally, the failed disk drive can be replaced during a more convenient time. Common uses for RAID 1 include very sensitive data or data that is mandatory for a system to operate, such as the boot drive, and where data is not sequential. Because data is written twice as often with RAID 1, it may seem that writes to the drive set would take twice as long, but this is a myth. In opposition, writes to a mirrored set generally take only 15% to 20% longer than writes to a single member. Some write performance to the mirrored array may be lost; however, as in RAID 0, lowering disk utilization increases performance. One other fallback to implementing RAID 1 is the higher costs it demands, since disk drive requirements double. Implementing RAID 1 and RAID 0 is a fairly simple task, but they only lay the groundwork for the absolute potential of RAID.

3) RAID 2

Page 7: Raid Assgmnt

Level 2 is the "black sheep" of the RAID family, because it is the only RAID level that does not use one or more of the "standard" techniques of mirroring, striping and/or parity. RAID 2 uses something similar to striping with parity, but not the same as what is used by RAID levels 3 to 7. It is implemented by splitting data at the bit level and spreading it over a number of data disks and a number of redundancy disks. The redundant bits are calculated using Hamming codes, a form of error correcting code (ECC).

RAID Level 2, which uses Hamming error correction codes, is intended for use with drives which do not have built-in error detection. Disks are synchronized and striped in very small stripes, often in single bytes/words. Hamming codes contain parity for distinct overlapping subsets of components. Each data word has its Hamming Code ECC word recorded on the ECC disks.

Each time something is to be written to the array these codes are calculated and written along side the data to dedicated ECC disks; when the data is read back these ECC codes are read as well to confirm that no errors have occurred since the data was written. If a single-bit error occurs, it can be corrected "on the fly".

In one version of this scheme, four disks require three redundant disks, one less than mirroring. Hamming codes error-correction is calculated across corresponding bits on disks, and is stored on multiple parity disks. All SCSI drives support built-in error detection, so this level is of little use when using SCSI drives. Raid 2 is seldom used today since ECC is embedded in almost all modern disk drives.

For a number of reasons, including the fact that modern disk drives contain their own internal ECC, RAID 2 is not a practical disk array scheme. If a single component fails, several of the parity components will have inconsistent values, and the failed component is the one held in common by each incorrect subset. The lost information is recovered by reading the other components in a subset, including the parity component, and setting the missing bit to 0 or 1 to create proper parity value for that subset. Thus, multiple redundant disks are needed to identify the failed disk, but only one is needed to recover the lost information.

4) RAID 3

Page 8: Raid Assgmnt

RAID 3 implements byte level striping with parity. It requires a minimum of 3 disks to be implemented. Data to be written is divided into stripes and stripe parity is calculated for every write operation. The stripe parity is stored on a separate parity disk. Provides fault tolerance and disk usage is better than that of mirroring. Controller design is quite complex. Write operations are slow as there are overheads of parity calculation and writing parity to a separate disk. Read operations are faster as compared to write.

RAID 3 can be used in data intensive or single-user environments which access long sequential records to speed up data transfer. However, RAID-3 does not allow multiple I/O operations to be overlapped and requires synchronized-spindle drives in order to avoid performance degradation with short records. Byte-level striping requires hardware support for efficient use.

Striped set with dedicated parity or bit interleaved parity or byte level parity. This mechanism provides an improved performance and fault tolerance similar to RAID 5, but with a dedicated parity disk rather than rotated parity stripes. The single parity disk is a bottle-neck for writing since every write requires updating the parity data. One minor benefit is the dedicated parity disk allows the parity drive to fail and operation will continue without parity or performance penalty.

The parity information is sent to a dedicated parity disk, but the failure of any disk in the array can be tolerated (i.e., the dedicated parity disk doesn't represent a single point of failure in the array). The dedicated parity disk does generally serve as a performance bottleneck, especially for random writes, because it must be accessed any time anything is sent to the array. This is contrasted to distributed-parity levels such as RAID 5 which improve write performance by using distributed parity (though they still suffer from large overheads on writes. RAID 3 differs from RAID 4 only in the size of the stripes sent to the various disks.

Page 9: Raid Assgmnt

One can improve upon memory-style ECC disk arrays by noting that, unlike memory component failures, disk controllers can easily identify which disk has failed. Thus, one can use a single parity rather than a set of parity disks to recover lost information.

5) RAID 4

RAID Level 4 stripes data at a block level across several drives, with parity stored on one drive. This makes it in some ways the "middle sibling" in a family of close relatives, RAID levels 3, 4 and 5. It is like RAID 3 except that it uses blocks instead of bytes for striping, and like RAID 5 except that it uses dedicated parity instead of distributed parity. Going from byte to block striping improves random access performance compared to RAID 3, but the dedicated parity disk remains a bottleneck, especially for random write performance. Fault tolerance, format efficiency and many other attributes are the same as for RAID 3 and RAID 5.

Each entire block is written onto a data disk. Parity for same rank blocks is generated on writes, recorded on the parity disk and checked on Reads. RAID 4 requires a minimum of 3 drives to implement.

This type uses large stripes, which means you can read records from any single drive. In this setup, files can be distributed between multiple disks. Each disk operates independently which allows I/O requests to be performed in parallel, though data transfer speeds can suffer due to the type of parity. The error detection is achieved through dedicated parity and is stored in a separate, single disk unit.

The parity information allows recovery from the failure of any single drive. The performance of a level 4 array is very good for reads (the same as level 0). Writes, however, require that parity data be updated each time. The Controller design is quite complex. This slows small random writes, in particular, though large writes or

Page 10: Raid Assgmnt

sequential writes are fairly fast. Because only one drive in the array stores redundant data, the cost per megabyte of a level 4 array can be fairly low. It is difficult to rebuild data in case of a disk failure. RAID 4 offers no advantages over RAID-5 and does not support multiple simultaneous write operations.

6) RAID 5

RAID 5 is similar to RAID 4 except that it exchanges the dedicated parity drive for distributed parity drive for a distributed parity algorithm, writing data and parity blocks across all the drives in the array. This removes the bottleneck that the dedicated parity drive represents, improving write performance slightly and allowing better parallelism in a multiple-transaction environment, thought the overhead necessary in dealing with the parity continues to bog down writes. Fault tolerance is maintained by ensuring that the parity information for any given block of data is placed on a drive separate from those used to store the data itself. The performance of a RAID 5 array can be adjusted by trying different stripe size until one is found that is well-matched to the application being used.

7) RAID 6

Page 11: Raid Assgmnt

RAID 6 stripes blocks of data and parity across an array of drives like RAID 5, except that it calculates two sets of parity information for each parcel of data. The goal of this duplication is solely to improve fault tolerance; RAID 6 can handle the failure of any two drives in the array while other single RAID levels can handle at most one fault. Performance-wise, RAID 6 is generally slightly worse than RAID 5 in terms of writes due to the added overhead of more parity calculations, but may be slightly faster in random reads due to spreading of data over one more disk. As with RAID levels 4 and 5, performance can be adjusted by experimenting with different stripe sizes.

Additional Information

8) RAID 7

Unlike the other RAID levels, RAID 7 isn't an open industry standard. It is a trademarked marketing term of Storage Computer Corporation, used to describe their proprietary RAID design. RAID 7 is based on concepts used in RAID levels 3 and 4, but greatly enhanced to address some of the limitations of those levels. Of particular note is the inclusion of a great deal of cache arranged into multiple levels, and a specialized real-time processor for managing the array asynchronously. This hardware allow the array to handle many simultaneous operations, greatly improving performance of all sorts while maintaining fault tolerance. In particular, RAID 7 offers much improved random read and writes performance over RAID 3 or RAID 4 because the dependence on the dedicated parity disk is greatly reduced through the added hardware. The increased performance of RAID 7 of course comes at a cost.

The advantages and disadvantages of each RAID

Page 12: Raid Assgmnt

As the RAID in each different levels are unique in its design architectures, all of it have its’ own specialties and drawbacks.

RAID 0

Specialties:

RAID 0 is very simple design and easy to be implemented. It also offers the best performance and cheap because no parity is used. It is also very high in data transfer capacity. It also reduce the I/O requests queuing time which is way much better than

having a single large disk -- For instance, if there are two I/O request which each are from two different blocks of data. There is great possibility that both blocks are from different disks. This will then enable the two requests to be issued simultaneously which fasten the queuing time.

Drawbacks:

Since RAID 0 is non-redundant, there is a high risk of data might be lost if any disk failure occurred. This shows that RAID 0 does not provide a fault-tolerance environment. Hence, a critical files or workloads are simply not suitable to be stored in this level of RAID.

RAID 1

Specialties:

RAID 1 provides 100% of data redundancy. It will duplicate all the data in logical disk which will then be mapped to two separate strips in the physical disks. Hence, every data has it’s mirror disk.

RAID 1 also offers a fault-tolerance environment -- have real-time back-up of all data.

It also increase the performance in data transfer for reads -- as the requests application can be split into both disks that contain the desired data (from the disk and it’s parity ), hence both disks can participate in each request thus increase the performance.

Drawbacks:

Is expensive as it needs twice the space of the logical disk it supports (for the parity/mirror disks). Therefore RAID 1 is usually being used to store system software or any other critical files only.

RAID 2

Specialties:

Page 13: Raid Assgmnt

Extremely high data transfer rates. Even the highest compared to all other RAID levels.

It requires fewer disks compared to RAID 1. The data availability is very good as it implies the "on the fly" error correction.

Drawbacks:

It is very expensive.

RAID 3

Specialties:

RAID 3 is very good in sequential read and write. It is even faster than RAID 5.

It has almost the same level with RAID 0 in striping but with the additional capabilities of data protection.

Very high data transfer rates. Disk failure has an insignificant impact to the application process.

Drawbacks:

RAID 3 is very poor in random reads and writes. RAID 3 does not allow multiple I/O operations to be overlapped.

RAID 4

Specialties:

RAID 4 has a high I/O request rates. Since it uses independent access technique, each member disk is able to

operate independently. Hence, separate I/O requests can be satisfied in parallel.

Drawbacks:

It is not really suitable for application that requires a high data transfer rate. Difficult and inefficient data rebuild in the event of disk failure.

Block Read transfer rate equal to that of a single disk RAID 4 does not support multiple simultaneous write operations.

RAID 5

Specialties:

Page 14: Raid Assgmnt

RAID 5 has high data availability. Has the highest read data transaction rate.

Drawbacks:

Has the most complex controller design. Difficult to rebuilt if disk failure occures (compared to RAID 1).

RAID 6

Specialties:

Offers highest data availability -- three disks need to be fail in order to cause data to be lost.

RAID 6 allows extra fault-tolerance by using a second independent parity scheme.

Perfect solution for mission critical applications.

Drawbacks:

More complex controller design Controller overhead to compute parity addresses is extremely high.

RAID 7

Specialties:

Secure storage system Hugely reduced manual handing

The best RAID

As we all know, there are two basic design goals in using RAID technology, one is performance, and the other one is data protection. So, here are some of the

Page 15: Raid Assgmnt

characteristics that differ each levels of RAID and suggestion of the best RAID available for the specific characteristic or attributes :

No. Attributes Best RAID

1 Data transfer rate RAID 2 and RAID 3. Both have the highest data transfer rate.

2 Data Protection and Reliability

RAID 6 as it has dual disk drive-failure protection. It needs at least 3 disks to be failed in order for the data to be lost.

3 Price RAID 0 is the cheapest compared to the other level of RAID.

4 I/O request rate RAID 0 has the highest I/O request rate as it balances the I/O load across multiple disks.

5 Minimum disk drives needed

RAID 0 and RAID 1 needs the most minimum number of disk drive which is only 2 disk drives. RAID 0 needs at least 2 disks for striping or else it would not make any different from the logical disk (single disk) in any way. As for RAID 1, one disk will be the mirror or parity of the other disks.

6 Performance In RAID 0, data is split across drives, resulting in higher data throughput and because no redundant information is stored, performance is very good.

Page 16: Raid Assgmnt

In order to determine which one is the best RAID, few question need to be answered on which one is more important the person,

Cost of disk storage? Data protection or data availability? Or high performance?

As for a student, I think RAID 0 would be the best choice among all the levels of RAID as it is high in performance. RAID 0 has the fastest read and writes performance as no redundant data is stored. It also offers the highest I/O request rate as it balance the I/O load in multiple disks. Besides, it needs only a minimum of two drives to operate. On top of that, it is also lower in cost compared to the other levels of RAID. Although RAID 0 offers no fault-tolerance environment as it has no redundant data or back-up disks, this should not be a main problem for people nowadays as we can easily get our secondary storage (to act as back up storage) with an affordable price. Furthermore, as a student I don’t think there is any so-called critical files that need to have a back-up in case of disk failure other than the assignments and projects which are usually be copied into the removable secondary storage such as the external hard disk or the pen drives too. That is why I think RAID 0 really suits the students best.

Page 17: Raid Assgmnt

References

Book

Computer Organization & Architecture Designing for Performance, Seventh Edition, William Stallings, Pearson Practice Hall, 2006.

Websites

Storagesearch.com, RAID manufacturers on STORAGE search.com, http://www.storagesearch.com/raid.html

What is a RAID, http://www.4raid.com/raidlevels.htm

Berkeley, RAID, http://www.sims.berkeley.edu/courses/is257/f99/Lecture10_257/sld025.htm

Basic levels of RAID, http://www.broadberry.co.uk/explanations/RAID_level_4.htm http://en.wikipedia.org/wiki/RAID http://www.acnc.com/04_00.html http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html