24
Copyright 2014 FUJITSU LABORATORIES LIMITED Erasure Code with Shingled Local Parity Groups for Efficient Recovery from Multiple Disk Failures Takeshi Miyamae, Takanori Nakao, Kensuke Shiozawa Fujitsu Laboratories Ltd. October 5 th , 2014 (HotDep’14) 0

Shingled Erasure Code (SHEC) at HotDep'14

Embed Size (px)

Citation preview

Copyright 2014 FUJITSU LABORATORIES LIMITED

Erasure Code with

Shingled Local Parity Groups for

Efficient Recovery from

Multiple Disk Failures

Takeshi Miyamae, Takanori Nakao, Kensuke Shiozawa

Fujitsu Laboratories Ltd.

October 5th, 2014 (HotDep’14)

0

1. Backgrounds and Our Proposal

2. SHEC's Theoretical Analysis

3. SHEC's Experimental Evaluation

4. Summary

Contents

Copyright 2014 FUJITSU LABORATORIES LIMITED 1

1. Backgrounds and Our Proposal

Copyright 2014 FUJITSU LABORATORIES LIMITED 2

Backgrounds (1)

Erasure codes for content data

Content data for ICT services is ever-growing

Demand for higher space efficiency and durability

Reed Solomon code (de facto erasure code) improves both

3 Copyright 2014 FUJITSU LABORATORIES LIMITED

Reed Solomon Code (Old style)Triple Replication

However, Reed Solomon code is not so recovery-efficient

content data

copy copy

3x space

parity parity

1.5x space

content data

Backgrounds (2)

Local parity improves recovery efficiency

Data recovery should be as efficient as possible

• in order to avoid multiple disk failures and data loss

Reed Solomon code is improved by local parity methods

• data read from disks is reduced during recovery

4 Copyright 2014 FUJITSU LABORATORIES LIMITED

Data Chunks

Parity Chunks

Reed Solomon Code

(No Local Parities) Local Parities

data read from disks

However, multiple disk failures is out of consideration

A Local Parity Method

Local parity method for multiple disk failures

Existing methods are optimized for single disk failure

• e.g. Microsoft MS-LRC, Facebook Xorbas

However, Its recovery overhead is large in case of

multiple disk failures

• because they have a chance to use global parities for recovery

Our Goal

5 Copyright 2014 FUJITSU LABORATORIES LIMITED

A Local Parity Method

Our goal is a method efficiently handling multiple disk failures

Multiple Disk Failures

SHEC (= Shingled Erasure Code)

An erasure code only with local parity groups

• to improve recovery efficiency in case of multiple disk failures

The calculation ranges of local parities are shifted and

partly overlap with each other (like the shingles on a roof)

• to keep enough durability

Copyright 2014 FUJITSU LABORATORIES LIMITED

Our Proposal Method (SHEC)

6

k : data chunks (=10)

m :

parity

chunks

(=6) l : calculation range (=5)

2. SHEC's Theoretical Analysis

Copyright 2014 FUJITSU LABORATORIES LIMITED 7

Erasure Code’s Properties

8 Copyright 2014 FUJITSU LIMITED

Space Efficiency The ratio of user data

Durability Probability of Data Loss (PDL)

Recovery Efficiency The ratio of data read during recovery

We picked three erasure code’s properties for

SHEC’s theoretical analysis

Three-Way Trade-Off

The properties satisfy a three-way trade-off relationship

Copyright 2014 FUJITSU LABORATORIES LIMITED

High Recovery Efficiency from Multiple Disk Failures

The amount of data read from disks is minimized

• (e.g.) When D6/D9 break out, SHEC will select P3/P4 for recovery

SHEC’s Recovery Efficiency

9

No need to be read

a minimum union of calculation

ranges including D6/D9

Recovery efficiency is one of the biggest features of SHEC

SHEC is expected to recover more efficiently than

the other methods in case of multiple disk failures

Other methods : Reed Solomon, MS-LRC and Xorbas

Comparison with Other Methods

10 Copyright 2014 FUJITSU LABORATORIES LIMITED

multiple disk failures

Copyright 2014 FUJITSU LABORATORIES LIMITED

Durability Estimator (=ml/k)

Indicates the number up to how many disks can be failed

Therefore, ml/k+1 disk failures can cause data loss

• (e.g.) SHEC(10,6,5)’s durability estimator is three. Therefore, four

failures of D1/P1/P5/P6 cause data loss because D1 cannot be

recovered from the remaining chunks

SHEC’s Durability

11

Durability Estimator

ml/k = 3

k =10

m = 6 l = 5

Upper area becomes sparse

Reed Solomon code has few recovery-efficient layouts

Property Map of Reed Solomon code

12 Copyright 2014 FUJITSU LABORATORIES LIMITED

Durability(PDL)

1e-44

1e-0

Recovery

Effic

iency

Space Efficiency

RAID6=RS(4,2)

sparse

Upper area is filled with SHEC-specific layouts

SHEC provides many recovery-efficient layouts

SHEC is more adjustable than Reed Solomon code

Property Map of SHEC

13 Copyright 2014 FUJITSU LABORATORIES LIMITED

Durability(PDL)

1e-44

1e-0

Recovery

Effic

iency

Space Efficiency

RAID6=RS(4,2)

SHEC(6,5,2)

dense

Single disk failure case

MS-LRC is plotted farther from the origin (= superior)

SHEC is plotted in a broader area (= more flexible)

Comparison with MS-LRC (1)

14 Copyright 2014 FUJITSU LABORATORIES LIMITED

(conditions: 16 OSDs)

SHEC MS-LRC emulation

Space Efficiency Space Efficiency

Re

co

ve

ry E

ffic

ien

cy

Re

co

ve

ry E

ffic

ien

cy

durability durability

Double disk failures case

Both are plotted at the same distance from the origin

SHEC is plotted in a broader area (=more flexible)

Comparison with MS-LRC (2)

15 Copyright 2014 FUJITSU LABORATORIES LIMITED

(conditions: 16 OSDs)

MS-LRC emulation SHEC

Space Efficiency Space Efficiency

Re

co

ve

ry E

ffic

ien

cy

Re

co

ve

ry E

ffic

ien

cy

durability durability

3. SHEC's Experimental Evaluation

Copyright 2014 FUJITSU LABORATORIES LIMITED 16

SHEC is implemented as an erasure code plugin of

Ceph, an open source scalable object storage

SHEC’s Implementation on Ceph

17 Copyright 2014 FUJITSU LABORATORIES LIMITED

4MB objects are split

into data/parity chunks,

distributed over OSDs

encode/decode logic is separated

from main part of Ceph Storage

SHEC plugin

Experiment of Recovery Efficiency

Experiment Abstract

Test items : Recovery completion time / Resource profiles

Failure degree : Double disk failures

Comparison : Reed Solomon RS(6,4) / SHEC(6,4,3)

18 Copyright 2014 FUJITSU LABORATORIES LIMITED

Hardware and Software Setup

SHEC’s recovery completion time was 18.6% faster

OTOH, total amount of data read from disks was 26%

decreased (= theoretical improvement)

Recovery Completion Time

19 Copyright 2014 FUJITSU LABORATORIES LIMITED

18.6%

Why were not these figures the same?

Disks were only partly (65%) bottlenecked

The Reason (= Disk utilization)

20 Copyright 2014 FUJITSU LABORATORIES LIMITED

65% (bottlenecked time ratio)

There is 35% room for recovery time improvement

4. Summary

Copyright 2014 FUJITSU LABORATORIES LIMITED 21

Copyright 2014 FUJITSU LABORATORIES LIMITED

1. We proposed Shingled Erasure Code (SHEC)

SHEC is recovery-efficient especially in case of multiple

disk failures

2. We found SHEC is more adjustable than Reed

Solomon code

because SHEC provides many recovery-efficient layouts

including Reed Solomon codes

3. We confirmed SHEC’s recovery efficiency in an

experiment

SHEC’s recovery time was 18.6% faster than Reed

Solomon code in case of double disk failures

Summary

22