Upload
barbra-owens
View
217
Download
0
Embed Size (px)
Citation preview
© 2012 A. Datta & F. Oggier, NTU Singapore
Redundantly Grouped Cross-object Coding
for Repairable Storage
Anwitaman Datta & Frédérique OggierNTU Singapore
APSYS 2012, Seoul
http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage
© 2012 A. Datta & F. Oggier, NTU Singapore
Distributed Storage Systems
What is this work about?
Huge volume of data
Scale-out C’est la vie
Failures are inevitable!
Fault-tolerance
Overheads
Erasure coding
Over time
Repairing lost redundancy
Redundancy
The story so far …
© 2012 A. Datta & F. Oggier, NTU Singapore
What is this work about?The story so far …
B2
B1
Bn
n encoded blocks
…… Lost block
Retrieve some k” blocks (k”=2…n-1)
to recreate a lost block
Bx
Re-insert
Reinsert in (new) storage devices, so that there is (again) n encoded blocks
• Design space– Repair fan-in k’’– Data tx. per node– Overall data tx.– Storage per node– …
(n,k) code
Bx
© 2012 A. Datta & F. Oggier, NTU Singapore
Related worksA non-exhaustive list
An Overview of Codes Tailor-made for Networked Distributed Data StorageAnwitaman Datta, Frederique Oggier
arXiv:1109.2317
Codes on codese.g. Hierarchical &
Pyramid codes
Locally repairable codese.g. Self-repairing codes
Network codinge.g. Regenerating codes
Array codes
…
Most of these codes look at design of new codes with inherent repairability properties.
This work: An engineering approach – can we achieve good repairability using existing (mature) techniques? (Our solution is similar to “codes on codes”)
© 2012 A. Datta & F. Oggier, NTU Singapore
Separation of concerns
• Two distinct design objectives for distributed storage systems– Fault-tolerance– Repairability
• Related works: Codes with inherent repairability properties– Achieve both objectives together
• There is nothing fundamentally wrong with that– E.g., We continue to work on self-repairing codes
• This work: An extremely simple idea– Introduce two different kinds of redundancy
• Any (standard) erasure code – for fault-tolerance
• RAID-4 like parity (across encoded pieces of different objects) – for repairability
© 2012 A. Datta & F. Oggier, NTU Singapore
Redundantly Grouped Cross-object Coding (RGC)
e11
e21
em1
p1
…
e12
e22
em2
p1
…
e1k
e2k
emk
pk
……
e1k+1
e2k+1
emk+1
pk+1
…e1n
e2n
emn
pn
…
…
Erasure coding of individual objects
RA
ID-4
of
eras
ure
code
d pi
eces
of
diff
eren
t obj
ects
© 2012 A. Datta & F. Oggier, NTU Singapore
RGC repairability
• Choosing a suitable m < k– Reduction in data transfer for repair– Repair fan-in disentangled from base code parameter “k”
• Large “k” may be desirable for faster (parallel) data access• Codes typically have trade-offs between repair fan-in, code parameter
“k” and code’s storage overhead (n/k)
• However: The gains from reduced fan-in is probabilistic– For i.i.d. failures with probability “f”
• Possible to reduce repair time– By pipelining data through the live nodes, and computing partial
parity
© 2012 A. Datta & F. Oggier, NTU Singapore
Parameter “m” choice
• Smaller m: lower repair cost, larger storage overhead• Is there an optimal choice of m? If so, how to determine it?
– A rule of thumb: rationalized by r simultaneous (multiple) repairs
– E.g. for (n=15, k=10) code: m < 5
• m = 3 or 4 implies – Repair bandwidth saving of 40-50% even for f = 0.1
• Typically, in stable environments, f will be much smaller, and the relative repair gains much more
– Relatively low storage overhead of 2x or 1.875x
© 2012 A. Datta & F. Oggier, NTU Singapore
Further discussions
• Possibility to localize repair traffic – Within a storage rack, by placing a whole parity group in same rack– Without introducing any correlated failures of pieces of the same
object
• Many unexplored issues– Soft errors (flipped bits)– Object update, deletion, …– Non i.i.d./correlated failures
© 2012 A. Datta & F. Oggier, NTU Singapore
Concluding remarks• RAID-4 parity of erasure encoded pieces of multiple objects
– Lowers the cost of data transfer for a repair– Reduces repair fan-in– Possibility to localize repairs (and save precious interconnect BW)
• w/o introducing correlated failures w.r.to a single object
– Pipelining the repair traffic helps realize very fast repairs• Since the repairing node’s I/O, bandwidth or compute does not become a
bottleneck• Also the computations for repair are cheaper than decoding/encoding
– Retains comparable storage overhead for comparable static resilience if only erasure coding was used (surprisingly so!)• At least for quite some specific code parameter choices we tried
• Opens up many interesting questions that can be investigated experimentally as well as theoretically
http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage