Parity Lost and Parity Regained Andrew Krioukov, Lakshmi N. Bairavasundaram, Andrea C....

Preview:

Citation preview

Parity Lost and Parity Regained

Andrew Krioukov, Lakshmi N. Bairavasundaram,

Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-DusseauUniversity of Wisconsin - MadisonUniversity of Wisconsin - Madison

Garth R. Goodson, Kiran Srinivasan, Randy Thelen

Bare-bones RAID• Stripe data across multiple drives• Store redundant parity data• Can reconstruct data with any single disk failure• Will RAID protect data in all single failure cases?

A B C

Data 1 Data 2 Data 3 Parity

P(ABC)

2

Bare-bones RAID Problems• Stripe contains file ABC consisting of 3 blocks• RAID has redundancy to recover data• RAID does not detect corruption

Data 1 Data 2 Data 3 Parity

P(ABC)Corruption

Read file ABC

Return Corrupt File

AA BB @#$%C@#$%RAID Stripe

3

Bare-bones RAID Problems

• RAID cannot detect partial disk failures:– Corruptions– Torn writes– Lost writes– Misdirected writes

• RAID only protects against– Complete disk failures– Errors reported by the disk (e.g. Latent Sector

Errors)4

Data Protection Techniques

• Need improvements to bare-bones RAID

– Techniques needed to help detect errors

• Checksums are common

– Many kinds: block, sector, parent checksums

• Which type of checksums are used?

• We examined real systems to determine protection schemes

5

Enterprise RAID Systems• Mixed bag of protections

Scrub Sector Cksum

Block Cksum

Parent Cksum

Write Verify

PhysIdent

LogicalIdent

Write Stamp

Dell Power-vault

√ √ √

Hitachi Thunder

√ √ √NetApp ONTAP

√ √ √ √ √Sun ZFS √ √ 6

Question

• Which errors do these systems protect against?

• How can we ensure complete data protection?

• Need method to identify all corruption & data loss scenarios in a design

7

Model Checking Solution

• Create a model of storage system design using primitives

• Checker exhaustively searches space of all possible states– Start with clean RAID stripe– Apply single disk error– Apply any number of disk operations (e.g. write)

• Identifies all possible data loss scenarios

8

Results Summary• Applied model checking on enterprise RAID

system designs• For all designs, a single error can cause data

loss• Identified a common problem, parity pollution

– Partial disk failure goes undetected– The erroneous data is used to compute parity– Recovery is no longer possible

• Presented a design that protects against all single failures 9

Outline

• Introduction• Background: Storage Errors• Model Checking Approach• Data Protection Design & Analysis• Conclusion

10

Storage Errors• Latent Sector Errors

– Data is inaccessible– Explicit error code returned– Affect 19% of nearline, 2% of enterprise disks in 2

years [Bairavasundaram et al. SIGMETRICS’07]

• Corruptions– Data is silently corrupted– Affect 0.6% of nearline and 0.06% of enterprise

disks in 17 months [Bairavasundaram et al. FAST’08]

• Reality: Partial disk failures happen11

Storage Errors (Cont’d)• Torn Write

– Only part of a block is written– Some sectors are lost– Write returns success code

• Lost Writes– Write returns success code– Data not reflected on disk A

Write B

Success

A

Write B

Success

12

Storage Errors (Cont’d)

• Misdirected Writes– Write goes to wrong location

(either wrong block or wrong disk)– Combination of lost write

and corruption

13

A

Overwrite A A’ Success

BA’

Outline

• Introduction• Background: Storage Errors• Model Checking Approach• Data Protection Design & Analysis• Conclusion

14

Modeling Storage System

• Use primitives to describe:– On disk layout in terms of sectors– Data protections

• Checker uses built-in models:– Storage errors– Disk operations (e.g. Read/Write)– Basic RAID functionality

15

Model Checking• Assumptions

– Single RAID stripe– Single storage error– Single parity protection– Data disks are interchangeable

• Apply error followed by any number of disk operations

• Generate state diagram with all data loss states

16

State Diagram Example• Bare-bones RAID state diagram

Clean

Parity Error

Corrupt(p), Torn(p),Lost(p), Misdir(p)

Wadd(x+)

Disk x Error

Corrupt(x), Torn(x),Lost(x), Misdir(x)

Wsub(x+)

Corrupt Data

Polluted Parity

R(x)

R(x)

Wadd()

W(x+)

Wadd(!x)

17

Outline

• Introduction• Background: Storage Errors• Model Checking Approach• Data Protection Design & Analysis• Conclusion

18

Data Protection Design

• Need fault tolerance for all partial failures

• Bare-bones RAID handles latent sector errors and complete disk failures

• Corruption is next most common failure

• Add protections cumulatively until design has complete protection

19

ProtectionsProtections in red will be discussed in the talk• Scrubbing• Sector checksums• Block checksums• Parental checksums• Write verify• Physical identity• Logical identity• Version mirroring

20

Checksums• Checksum per data block

• Checksum per sector

• Parent checksum– Checksum stored in parent inode

21

Acksum(A)

ck(a1)

ck(a2)a2

a1

A

• Corruption scenario is now fixed

Data 1 Data 2 Data 3 Parity

Bcksum(B)

Acksum(A)

Ccksum(C)

P(ABC)cksum(P)

Checksum Example

22

Corruption

Read file ABC

@#$%@#$%cksum(C)

Perform reconstruction

File is valid

A B P(ABC)

CC

User
MOVE

Checksum Problems

• Great for protecting against corruption errors• Fails to protect when data and checksum are

lost together:– Lost write (with any type of checksums)– Torn write (only with sector checksums)

• Parity pollution can occur

23

Data 1 Data 2 Data 3 Parity

Bcksum(B)

Acksum(A)

Ccksum(C)

P(ABC)cksum(P)

Checksum Problems – Lost Write• Block checksums

Overwrite C→C’

P(ABC’)

Lost Write

Read file ABC’

Ccksum(C)

Return data (ABC)Return Corrupt Data (C instead of C’)

Write Verify• Attempt to solve lost write problem• Costly solution, expect good protection• Procedure:

1. Write data to disk2. Read back to verify3. If lost write detected, write again

or remap to new location

Overwrite C→C’Lost Write

Ccksum(C)

Read back (C)Lost write detected, write C’ again

C’

Success

cksum(C’)

25

Write Verify Problems

• Protects against lost writes• Susceptible to misdirected writes

– Cannot detect/recover the overwritten data

26

Write Verify – Misdirected Write

Overwrite X→X’

Misdirected Write A

cksum(A)

Read back XLost, Re-write X

X’ Bcksum(B)

Parity

P(ABC)cksum(P)cksum(X’)

Read file ABC

X’

Return Corrupt Data (A has been corrupted)

Data 1 Data 2

Initially…

Later…

Data 3

Ccksum(C)

B C

27

X Y Z P(XYZ)X’ P(X’YZ)

Physical Identity• Protection against misdirected writes• Store disk & block number of destination in

each block

28

A 1

Overwrite Block 1: A A’ B 2A’

1Read Block 2

Returned (A’, 1)Block num does not match (1≠2)Misdirected Write Detected

Misdirected Write

Data, Block Number

Problem Solved?

• Write verify with block checksums and physical identity offers complete protection

• But… twice the I/O cost!• Need a more efficient solution

29

Logical Identity

• Less expensive protection against lost writes• Store file identifier (e.g. inode number) in

each data block• Test that file identifier

matches on a read

30

A

cksum(A)

Lost WriteOverwrite File 0

with File 1 (X)

File 0

Read File 1Logical ID does not match.Lost Write Detected

A

File 0

Logical Identity Problem

• Cannot be verified when re-computing parity– Not reading a file

• Parity pollution may occur

31

Parity Pollution ExampleData 1 Data 2 Data 3 Parity

Bcksum(B)

Acksum(A)

Ccksum(C)

P(ABC)cksum(P)

C→C’,

P(ABC’)

Lost Write

Overwrite AB →A’B’Parity:

A’cksum(A’)

B’cksum(B’)

A’ B’

C

Read Data 3 P(A’B’C)

Write File 1

Later… Write File 2

P(A’B’C)

Parity consistent with invalid data

File 0 File 0 File 0File 2 File 2

New Parity

Later… Read File 1 Logical ID mismatch (File 0 ≠ File 1)Reconstruct… Data is consistent!

C

File 0

Report Data Loss

A File0 B File0 C File0 P(ABC)P(ABC’)A’ File2 B’ File2 C’ File1 P(A’B’C’)What should

be on the disk

Version Mirroring• Lost write protection• Verifiable at RAID level• Store a version number in each data block• Mirror the version numbers on parity disk• Versions numbers verified on read

33

Bcksum(B)

Acksum(A)

Ccksum(C)

P(ABC)cksum(P)

Ver0 Ver0 Ver0 0,0,0

Parity Pollution SolvedData 1 Data 2 Data 3 Parity

Bcksum(B)

Acksum(A)

Ccksum(C)

P(ABC)cksum(P)

C→C’,

P(ABC’)

Lost Write

Overwrite AB →A’B’Parity:

A’cksum(A’)

B’cksum(B’)

A’ B’

Read Data 3

P(A’B’C’)

Write File 1

Later… Write File 2

P(A’B’C’)

Ver0 Ver0 Ver0Ver1 Ver 1

New Parity

0,0,00,0,1Ver0 0,0,1

Version mismatchReconstruct Data 3

Ver1

C

C’A B P(ABC’) C’

cksum(C’)

1,1,1

A Ver0 B Ver0 C Ver0 P(ABC)P(ABC’)A’ Ver1 B’ Ver1 C’ Ver1 P(A’B’C’)What should

be on the disk

C’

Problem Solved… Efficiently

• Version mirroring with block checksums and physical identity provides complete protection

• Use with logical identity for efficiency• More efficient than write verify

35

Conclusion• Applied model checking on real system designs

– For all designs, a single error can cause data loss– Parity pollution is a common problem– Version mirroring is a key technique to offering

complete and efficient data protection

• Partial failures are complex, no obvious data protection solution– Model checking is useful

36

37

ADvanced Systems Laboratorywww.cs.wisc.edu/adsl

Advanced Technology Grouphttp://www.netapp.com/company/research/

Recommended