51
ECE 6160: Advanced Computer Networks Introduction to Storage Devices Instructor: Dr. Xubin (Ben) He Email: [email protected] Tel: 931-372-3462

Storage devices and disk arrays

Embed Size (px)

Citation preview

Page 1: Storage devices and disk arrays

ECE 6160: Advanced Computer Networks

Introduction to Storage Devices

Instructor: Dr. Xubin (Ben) He

Email: [email protected]

Tel: 931-372-3462

Page 2: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

2

Prev…

• Gigabit Ethernet– MAC

– PHY

• VLAN

Page 3: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

3

Anatomy: Key components for a typical computer

Page 4: Storage devices and disk arrays

4ECE6160:Advanced Computer Networks

Motivation: Who Cares About I/O?

• CPU Performance: 60% per year• I/O system performance limited by mechanical delays

(disk I/O)< 10% per year (IO per sec)

• Amdahl's Law: system speed-up limited by the slowest part!

10% IO & 10x CPU => 5x Performance (lose 50%)10% IO & 100x CPU => 10x Performance (lose 90%)

• I/O bottleneck: Diminishing fraction of time in CPUDiminishing value of faster CPUs

Page 5: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

5

I/O Systems

Processor

Cache

Memory - I/O Bus

MainMemory

I/OController

Disk Disk

I/OController

I/OController

Graphics Network

interruptsinterrupts

Page 6: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

6

Storage Technology Drivers

• Driven by the prevailing computing paradigm– 1950s: migration from batch to on-line processing

– 1990s: migration to ubiquitous computing

» computers in phones, books, cars, video cameras, …

» nationwide fiber optical network with wireless tails

• Effects on storage industry:– Embedded storage

» smaller, cheaper, more reliable, lower power

– Data utilities

» high capacity, hierarchically managed storage

Page 7: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

7

Outline

• Disk Basics/Disk History

• Current Disk options

• Disk fallacies and performance

• FLASH

• Tapes

• RAID

Page 8: Storage devices and disk arrays

8

Disk Device Terminology

• Several platters, with information recorded magnetically on both surfaces (usually)

• Actuator moves head (end of arm,1/surface) over track (“seek”), select surface, wait for sector rotate under head, then read or write

– “Cylinder”: all tracks under heads

• Bits recorded in tracks, which in turn divided into sectors (e.g., 512 Bytes)

Platter

OuterTrack

InnerTrackSector

Actuator

HeadArm

Page 9: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

9

Tracks, Sectors, and Cylinders

A cylinder is comprised of the set of tracks described by all the heads (on separate platters) at a single seek position

Page 10: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

10

Photo of Disk Head, Arm, Actuator

Actuator

ArmHead

Platters (12)

{Spindle

source: www.seagate.com

Page 11: Storage devices and disk arrays

11ECE6160:Advanced Computer Networks

Disk Device Performance

Platter

Arm

Actuator

HeadSectorInnerTrack

OuterTrack

• Disk Latency = Seek Time + Rotation Time + Transfer Time + Controller Overhead

• Seek Time? depends on tracks move arm, seek speed of disk

• Rotation Time? depends on speed disk rotates, how far sector is from head

• Transfer Time? depends on data rate (bandwidth) of disk (bit density), size of request

ControllerSpindle

Page 12: Storage devices and disk arrays

12ECE6160:Advanced Computer Networks

Disk Device Performance

• Average rotation time:– Average distance sector from head?– 1/2 time of a rotation

» 10000 Revolutions Per Minute 166.67 Rev/sec

» 1 revolution = 1/ 166.67 sec 6.00 milliseconds

» 1/2 rotation (revolution) 3.00 ms

• Average seek time?– Sum all possible seek distances

from all possible tracks / # possible

» Assumes average seek distance is random

– Disk industry standard benchmark

– About 10ms (ATA) and 5ms (SCSI)

Page 13: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

13

Data Rate: Inner vs. Outer Tracks

• To keep things simple, orginally kept same number of sectors per track

– Since outer track longer, lower bits per inch

• Competition decided to keep BPI the same for all tracks (“constant bit density”)

More capacity per disk

More of sectors per track towards edge

Since disk spins at constant speed, outer tracks have faster data rate

• Bandwidth outer track 1.7X inner track!– Inner track highest density, outer track lowest, so not really constant

– 2.1X length of track outer / inner, 1.7X bits outer / inner

Page 14: Storage devices and disk arrays

14ECE6160:Advanced Computer Networks

Disk Performance Model /Trends

• Capacity+ 100%/year (2X / 1.0 yrs)

• Transfer rate (BW)+ 40%/year (2X / 2.0 yrs)

• Rotation + Seek time– 8%/ year (1/2 in 10 yrs)

• MB/$> 100%/year (2X / 1.0 yrs)

60GB/$300 => $5/GB =>0.5c/MB

Page 15: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

15

State of the Art: Barracuda 180

– 181.6 GB, 3.5 inch disk

– 12 platters, 24 surfaces

– 24,247 cylinders

– 7,200 RPM;

– 7.4/8.2 ms avg. seek (r/w)

– 64 to 35 MB/s (internal)

– 0.1 ms controller time

– 10.3 watts (idle)

source: www.seagate.com

Latency = Queuing Time + Controller time +Seek Time + Rotation Time + Size / Bandwidth

per access

per byte{+

Sector

Track

Cylinder

Head PlatterArmTrack Buffer

Q:Calculate time to read 64 KB (128 sectors) for Barracuda 180, sector is on outer track.

Page 16: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

16

Disk Performance Example

• Calculate time to read 64 KB (128 sectors) for Barracuda 180 X using advertised performance; sector is on outer track

Disk latency = average seek time + average rotational delay + transfer time + controller overhead

= 7.4 ms + 0.5 * 1/(7200 RPM) + 64 KB / (64 MB/s) + 0.1 ms

= 7.4 ms + 0.5 /(7200 RPM/(60000ms/M)) + 64 KB / (64 KB/ms) + 0.1 ms

= 7.4 + 4.2 + 1.0 + 0.1 ms = 12.7 ms

Page 17: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

17

Areal Density: linear density x track density

• Bits recorded along a track– Metric is Bits Per Inch (BPI) along a track

• Number of tracks per surface– Metric is Tracks Per Inch (TPI)

• Disk Designs Brag about bit density per unit area– Metric is Bits Per Square Inch

– Called Areal Density

– Areal Density = BPI x TPI

Page 18: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

18

Areal DensityYear Areal Density

1973 1.71979 7.71989 631997 30902000 17100

1

10

100

1000

10000

100000

1970 1980 1990 2000

Year

Are

al D

ensity

– Areal Density = BPI x TPI

– Change slope 30%/yr to 60%/yr about 1991

Page 19: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

19

Historical Perspective

• 1956 IBM Ramac — early 1970s Winchester– Developed for mainframe computers, proprietary interfaces– Steady shrink in form factor: 27 in. to 14 in

• Form factor and capacity drives market, more than performance• 1970s: Mainframes 14 inch diameter disks• 1980s: Minicomputers,Servers 8”,5 1/4” diameter• PCs, workstations Late 1980s/Early 1990s:

– Mass market disk drives become a reality» industry standards: SCSI, IPI, IDE

– Pizzabox PCs 3.5 inch diameter disks– Laptops, notebooks 2.5 inch disks– Palmtops didn’t use disks,

so 1.8 inch diameter disks didn’t make it

• 2000s:– 1 inch for cameras, cell phones?

Page 20: Storage devices and disk arrays

20

Disk History

Data densityMbit/sq. in.

Capacity ofUnit ShownMegabytes

1973:1. 7 Mbit/sq. in140 MBytes

1979:7. 7 Mbit/sq. in2,300 MBytes

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces”

Page 21: Storage devices and disk arrays

21

Disk History

1989:63 Mbit/sq. in60,000 MBytes

1997:1450 Mbit/sq. in2300 MBytes

source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even mroe data into even smaller spaces”

1997:3090 Mbit/sq. in8100 MBytes

Page 22: Storage devices and disk arrays

22

1 inch disk drive!

• 2003 IBM MicroDrive:– 1.7” x 1.4” x 0.2”

– 1 GB, 3600 RPM, 13.3 MB/s, 15 ms seek

– Digital camera, PalmPC?

• 2003 USB mini flash drive– 75mmx28mmx12mm, 17g

– 256MB

– 0.5-1MB/s

• 2006 MicroDrive?– 9 GB, 50 MB/s!

» Assuming it finds a niche in a successful product

» Assuming past trends continue

Page 23: Storage devices and disk arrays

23

Disk Characteristics in 2003

  Seagate Cheetah3146807LW

Hitachi TravelStar 7K60

IBM Microdrive (07N5574)

Dimension 3.5”, 1.5 lbs 2.5”, 0.2 lbs 1.7”, 0.6 lbs

Interface Ultra 320 SCSI ATA-100 (Ultra) CF+

Capacity 146.8GB 60GB 1GB

Data Transfer Rate

320MB/s 100MB/s 13.3MB/s

Average seek time

4.7ms 10ms 15ms

Spindle Speed 10000RPM 7200RPM 3600RPM

Buffer 8MB 8MB 128KB

Price (warehouse.com)

$899.95 $299.95 $319.95

Page 24: Storage devices and disk arrays

24

Disk Characteristics in 2007

  Seagate CheetahST3146707LW

Hitachi TravelStar 7K60

Seagate Barracuda ESST3750640NS

Dimension 3.5” 2.5”, 0.2 lbs 3.5”

Interface Ultra 320 SCSI ATA-100 (Ultra) Serial ATA-300

Capacity 146.8GB 60GB 750GB

Data Transfer Rate

320MB/s 100MB/s

Average seek time

4.7ms 10ms

Spindle Speed 10000RPM 7200RPM 7200RPM

Buffer 8MB 8MB 16MB

Price (warehouse.com)

$259 $93.99 (80GB) $319.95

Page 25: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

25

Today’s Hard Drive (09/29/2008)(warehouse.com)

• Seagate Barracuda 7200.11 - hard drive - 750 GB - SATA-300 Hard drive - 750 GB - internal - 3.5" - SATA-300 - 7200 rpm - buffer: 32 MB, $116

• Seagate FreeAgent Pro - hard drive - 500 GB - FireWire/Hi-Speed USB/eSATA Hard drive – 500 GB – external – 3.5” – USB 2.0, eSATA-300, FireWire interfaces – 7200 rpm – buffer: 32 MB, $100

Page 26: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

26

Fallacy: Use Data Sheet “Average Seek” Time

• Manufacturers needed standard for fair comparison (“benchmark”)

– Calculate all seeks from all tracks, divide by number of seeks => “average”

• Real average would be based on how data laid out on disk, where seek in real applications, then measure performance

– Usually, tend to seek to tracks nearby, not to random track

• Rule of Thumb: observed average seek time is typically about 1/4 to 1/3 of quoted seek time (i.e., 3X-4X faster)

– Barracuda 180 X avg. seek: 7.4 ms 2.5 ms

Page 27: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

27

Fallacy: Use Data Sheet Transfer Rate

• Manufacturers quote the speed off the data rate off the surface of the disk.

• Sectors contain an error detection and correction field (can be 20% of sector size) plus sector number as well as data

• There are gaps between sectors on track

• Rule of Thumb: disks deliver about 3/4 of internal media rate (1.3X slower) for data

• For example, Barracuda 180X quotes 64 to 35 MB/sec internal media rate

47 to 26 MB/sec external data rate (74%)

Page 28: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

28

Disk Performance Example: revision

• Calculate time to read 64 KB for UltraStar 72 again, this time using 1/3 quoted seek time, 3/4 of internal outer track bandwidth; (12.7 ms before)

Disk latency = average seek time + average rotational delay + transfer time + controller overhead

= (0.33 * 7.4 ms) + 0.5 * 1/(7200 RPM) + 64 KB / (0.75 * 65 MB/s) + 0.1 ms

= 2.5 ms + 0.5 /(7200 RPM/(60000ms/M)) + 64 KB / (47 KB/ms) + 0.1 ms

= 2.5 + 4.2 + 1.4 + 0.1 ms = 8.2 ms (64% of 12.7)

Page 29: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

29

Future Disk Size and Performance

• Continued advance in capacity (60%/yr) and bandwidth (40%/yr).

• Slow improvement in seek, rotation (8%/yr).

• Time to read whole disk.

Year Sequentially Randomly (1 sector/seek)

1990 4 minutes 6 hours

2000 12 minutes 1 week(!)

• 3.5” form factor make sense in 5 yrs?– What is capacity, bandwidth, seek time, RPM?

– Assume today 80 GB, 100 MB/sec, 6 ms, 10000 RPM

» 838GB, 537MB/s, 4ms, 15000RPM

Page 30: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

30

What about FLASH

• Compact Flash Cards– Intel Strata Flash

» 16 Mb in 1 square cm. (.6 mm thick)– 100,000 write/erase cycles.– Standby current = 100uA, write = 45mA– Compact Flash 256MB~=$120 512MB~=$542– Transfer @ 3.5MB/s

• IBM Microdrive 1G~370– Standby current = 20mA, write = 250mA– Efficiency advertised in wats/MB

• VS. Disks– Nearly instant standby wake-up time– Random access to data stored– Tolerant to shock and vibration (1000G of operating shock)

Page 31: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

31

Disk Operations

• Seek: move head to track;

• Rotation: wait for sector under head;

• Transfer:Move data to/from disks;

• Overhead: – Controller delay

– Queuing delay

Access time = seek time + rotational delay + transfer time+overhead

Page 32: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

32

Improving disk performance.

• Use large sectors to improve bandwidth

• Use track caches and read ahead;– Read entire track into on-controller cache

– Exploit locality (improves both latency and BW)

• Design file systems to maximize locality– Allocate files sequentially on disks (exploit track cache)

– Locate similar files in same cylinder (reduce seeks)

– Locate simlar files in near-by cylinders (reduce seek distance)

• Pack bits closer together to improve transfer rate and density.

• Use a collection of small disks to form a large, high performance one--->disk array

Stripping data across multiple disks to allow parallel I/O, thus improving performance.

Page 33: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

33

Use Arrays of Small Disks?

14”10”5.25”3.5”

3.5”

Disk Array: 1 disk design

Conventional: 4 disk designs

Low End High End

•Katz and Patterson asked in 1987: •Can smaller disks be used to close gap in performance between disks and CPUs?

Page 34: Storage devices and disk arrays

34

Replace Small Number of Large Disks with Large Number of Small Disks! (1988 Disks)

Capacity

Volume

Power

Data Rate

I/O Rate

MTTF

Cost

IBM 3390K

20 GBytes

97 cu. ft.

3 KW

15 MB/s

600 I/Os/s

250 KHrs

$250K

IBM 3.5" 0061

320 MBytes

0.1 cu. ft.

11 W

1.5 MB/s

55 I/Os/s

50 KHrs

$2K

x70

23 GBytes

11 cu. ft.

1 KW

110 MB/s

3900 IOs/s

??? Hrs

$150K

Disk Arrays have potential for large data and I/O rates, high MB per cu. ft., high MB per KW, but what about reliability?

9X

3X

8X

6X

Page 35: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

35

Array Reliability

•MTTF: Mean Time To Failure: average time that a non - repairable component will operate before experiencing failure.

•Reliability of N disks = Reliability of 1 Disk ÷N

50,000 Hours ÷ 70 disks = 700 hours Disk system MTTF: Drops from 6 years to 1 month!

• Arrays without redundancy too unreliable to be useful!

Solution: redundancy.

Page 36: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

36

Redundant Arrays of (Inexpensive) Disks

• Replicate data over several disks so that no data will be lost if one disk fails.

• Redundancy yields high data availability– Availability: service still provided to user, even if some components

failed

• Disks will still fail• Contents reconstructed from data redundantly stored

in the array Capacity penalty to store redundant info

Bandwidth penalty to update redundant info

Page 37: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

37

Page 38: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

38

Levels of RAID

• Original RAID paper described five categories (RAID levels 1-5). (Patterson et al, “A case for redundant arrays of inexpensive disks (RAID)”, ACM SIGMOD, 1988)

• Disk striping with no redundant now is called RAID0 or JBOD(Just a bunch of disks).

• Other kinds have been proposed in literature,Level 6 (P+Q Redundancy), Level 10, RAID53, etc.

• Except RAID0, all the RAID levels trade disk capacity for reliability, and the extra reliability makes parallism a practical way to improve performance.

Page 39: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

39

RAID 0: Nonredundant (JBOD)

file data block 1block 0 block 2 block 3

Disk 1Disk 0 Disk 2 Disk 3

• High I/O performance.•Data is not save redundantly.•Single copy of data is striped across multiple disks.

•Low cost.•Lack of redundancy.

•Least reliable: single disk failure leads to data loss.

Page 40: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

40

                                                                                                      

         

RAID Level 0 requires a minimum of 2 drives to implement

RAID 0: Striped Disk Array without Fault Tolerance

Page 41: Storage devices and disk arrays

41

Redundant Arrays of Inexpensive DisksRAID 1: Disk Mirroring/Shadowing

• Each disk is fully duplicated onto its “mirror” Very high availability can be achieved• Bandwidth sacrifice on write: Logical write = two physical writes

• Reads may be optimized, minimize the queue and disk search time

• Most expensive solution: 100% capacity overhead

recoverygroup

Targeted for high I/O rate , high availability environments

Page 42: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

42

RAID 1: Mirroring and Duplexing

                                                                                                      

         

For Highest performance, the controller must be able to perform two concurrent separate Reads per mirrored pair or two duplicate Writes per mirrored pair.

RAID Level 1 requires a minimum of 2 drives to implement

Page 43: Storage devices and disk arrays

43

RAID 4: Block Interleaved Parity

block 0

block 4

block 8

block 12

block 1

block 5

block 9

block 13

block 2

block 6

block 10

block 14

block 3

block 7

block 11

block 15

P(0-3)

P(4-7)

P(8-11)

P(12-15)

•Blocks: striping units •Allow for parallel access by multiple I/O requests, high I/O rates • Doing multiple small reads is now faster than before. (allows small read requests to be restricted to a single disk).• Large writes(full stripe), update the parity:

P’ = d0’ + d1’ + d2’ + d3’; • Small writes(eg. write on d0), update the parity:

P = d0 + d1 + d2 + d3P’ = d0’ + d1 + d2 + d3 = P + d0’ + d0;

• However, writes are still very slow since the parity disk is the bottleneck.

Page 44: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

44

RAID 4: Independent Data disks with shared Parity disk

                                                                                                      

         

Each entire block is written onto a data disk. Parity for same rank blocks is generated on Writes, recorded on the parity disk and checked on Reads.

RAID Level 4 requires a minimum of 3 drives to implement

Page 45: Storage devices and disk arrays

45

Inspiration for RAID 5

• RAID 4 works well for small reads• Small writes (write to one disk):

– Option 1: read other data disks, create new sum and write to Parity Disk

– Option 2: since P has old sum, compare old data to new data, add the difference to P

• Small writes are limited by Parity Disk: Write to D0, D5 both also write to P disk. Parity disk must be updated for every write operation!

D0 D1 D2 D3 P

D4 D5 D6 PD7

Page 46: Storage devices and disk arrays

46

Redundant Arrays of Inexpensive Disks RAID 5: High I/O Rate Interleaved Parity

Independent writespossible because ofinterleaved parity

Independent writespossible because ofinterleaved parity

D0 D1 D2 D3 P

D4 D5 D6 P D7

D8 D9 P D10 D11

D12 P D13 D14 D15

P D16 D17 D18 D19

D20 D21 D22 D23 P

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.Disk Columns

IncreasingLogical

Disk Addresses

Example: write to D0, D5 uses disks 0, 1, 3, 4

Page 47: Storage devices and disk arrays

47

RAID 5: Block Interleaved Distributed-Parity

block 0

block 4

block 8

block 12

P(16-19)

block 1

block 5

block 9

P(12-15)

block 16

block 2

block 6

P(8-11)

block 13

block 17

block 3

P(4-7)

block 10

block 14

block 18

P(0-3)

block 7

block 11

block 15

block 19

• Parity disk = (block number/4) mod 5 • Eliminate the parity disk bottleneck of RAID 4• Best small read, large read and large write performance• Can correct any single self-identifying failure• Small logical writes take two physical reads and two physical writes.• Recovering needs reading all nonfailed disks

Left Symmetric Distribution

Page 48: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

48

RAID 5: Independent Data disks with distributed parity blocks

                                                                                                     

          

Each entire data block is written on a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads.

RAID Level 5 requires a minimum of 3 drives to implement

Page 49: Storage devices and disk arrays

49

Comparison of RAID Levels (N disks, each with capacity of C)

RAID level

Technique Capacity Advantage Disadvantage

0 Striping NxC Maximum data transfer rate and sizes

No redundancy

1 Mirroring (N/2)xC High performance,

fastest write

cost

3 Bit(byte)-level parity

(N-1)xC Easy to implement, high error recoverability

Low performance

4 Block-level parity

(N-1)xC High redundancy and better performance

Write-related bottleneck

5 Interleave parity

(N-1)xC High performance, reliability

Small write problem

Page 50: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

50

Other RAIDs

• HP AutoRAID

• AFRAID

• RAPID

• SwiftRAID

• TickerTAIP

• SMDA

• ……

Page 51: Storage devices and disk arrays

ECE6160:Advanced Computer Networks

51

Berkeley History: RAID-I

• RAID-I (1989) – Consisted of a Sun 4/280

workstation with 128 MB of DRAM, four dual-string SCSI controllers, 28 5.25-inch SCSI disks and specialized disk striping software

• Today RAID is $19 billion dollar industry, 80% nonPC disks sold in RAIDs