© 2012 A. Datta & F. Oggier, NTU Singapore Redundantly Grouped Cross-object Coding for Repairable Storage Anwitaman Datta & Frédérique Oggier NTU Singapore

© 2012 A. Datta & F. Oggier, NTU Singapore

Redundantly Grouped Cross-object Coding

for Repairable Storage

Anwitaman Datta & Frédérique OggierNTU Singapore

APSYS 2012, Seoul

http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage


Distributed Storage Systems

What is this work about?

Huge volume of data

Scale-out C’est la vie

Failures are inevitable!

Fault-tolerance

Overheads

Erasure coding

Over time

Repairing lost redundancy

Redundancy

The story so far …


What is this work about?The story so far …

B2

B1

Bn

n encoded blocks

…… Lost block

Retrieve some k” blocks (k”=2…n-1)

to recreate a lost block

Bx

Re-insert

Reinsert in (new) storage devices, so that there is (again) n encoded blocks

• Design space– Repair fan-in k’’– Data tx. per node– Overall data tx.– Storage per node– …

(n,k) code

Bx


Related worksA non-exhaustive list

An Overview of Codes Tailor-made for Networked Distributed Data StorageAnwitaman Datta, Frederique Oggier

arXiv:1109.2317

Codes on codese.g. Hierarchical &

Pyramid codes

Locally repairable codese.g. Self-repairing codes

Network codinge.g. Regenerating codes

Array codes

…

Most of these codes look at design of new codes with inherent repairability properties.

This work: An engineering approach – can we achieve good repairability using existing (mature) techniques? (Our solution is similar to “codes on codes”)


Separation of concerns

• Two distinct design objectives for distributed storage systems– Fault-tolerance– Repairability

• Related works: Codes with inherent repairability properties– Achieve both objectives together

• There is nothing fundamentally wrong with that– E.g., We continue to work on self-repairing codes

• This work: An extremely simple idea– Introduce two different kinds of redundancy

• Any (standard) erasure code – for fault-tolerance

• RAID-4 like parity (across encoded pieces of different objects) – for repairability


Redundantly Grouped Cross-object Coding (RGC)

e11

e21

em1

p1

…

e12

e22

em2

p1

…

e1k

e2k

emk

pk

……

e1k+1

e2k+1

emk+1

pk+1

…e1n

e2n

emn

pn

…

…

Erasure coding of individual objects

RA

ID-4

of

eras

ure

code

d pi

eces

of

diff

eren

t obj

ects


RGC repairability

• Choosing a suitable m < k– Reduction in data transfer for repair– Repair fan-in disentangled from base code parameter “k”

• Large “k” may be desirable for faster (parallel) data access• Codes typically have trade-offs between repair fan-in, code parameter

“k” and code’s storage overhead (n/k)

• However: The gains from reduced fan-in is probabilistic– For i.i.d. failures with probability “f”

• Possible to reduce repair time– By pipelining data through the live nodes, and computing partial

parity


RGC repairability (and storage overhead ρ)


Parameter “m” choice

• Smaller m: lower repair cost, larger storage overhead• Is there an optimal choice of m? If so, how to determine it?

– A rule of thumb: rationalized by r simultaneous (multiple) repairs

– E.g. for (n=15, k=10) code: m < 5

• m = 3 or 4 implies – Repair bandwidth saving of 40-50% even for f = 0.1

• Typically, in stable environments, f will be much smaller, and the relative repair gains much more

– Relatively low storage overhead of 2x or 1.875x


Storage overhead & static resilience


Further discussions

• Possibility to localize repair traffic – Within a storage rack, by placing a whole parity group in same rack– Without introducing any correlated failures of pieces of the same

object

• Many unexplored issues– Soft errors (flipped bits)– Object update, deletion, …– Non i.i.d./correlated failures


Concluding remarks• RAID-4 parity of erasure encoded pieces of multiple objects

– Lowers the cost of data transfer for a repair– Reduces repair fan-in– Possibility to localize repairs (and save precious interconnect BW)

• w/o introducing correlated failures w.r.to a single object

– Pipelining the repair traffic helps realize very fast repairs• Since the repairing node’s I/O, bandwidth or compute does not become a

bottleneck• Also the computations for repair are cheaper than decoding/encoding

– Retains comparable storage overhead for comparable static resilience if only erasure coding was used (surprisingly so!)• At least for quite some specific code parameter choices we tried

• Opens up many interesting questions that can be investigated experimentally as well as theoretically

http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage

Documents

© 2012 A. Datta & F. Oggier, NTU Singapore Redundantly Grouped Cross-object Coding for Repairable Storage Anwitaman Datta & Frédérique Oggier NTU Singapore