48
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Background – 2. Solution – 3. Evaluation – 4. Conclusion . . FAST’12 NCCloud: Applying Network Coding for the Storage Repair in a Cloud-of-Clouds Yuchong Hu, Henry C. H. Chen, Patrick P. C. Lee, Yang Tang presented by Shuai YUAN NTHU LSA lab presented by Shuai YUAN FAST’12: NCCloud 1/20

my presentation of the paper "FAST'12 NCCloud"

Embed Size (px)

Citation preview

Page 1: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion

.

......

FAST’12NCCloud: Applying Network Coding for the

Storage Repair in a Cloud-of-Clouds

Yuchong Hu, Henry C. H. Chen, Patrick P. C. Lee, Yang Tang

presented by Shuai YUAN

NTHULSA lab

presented by Shuai YUAN FAST’12: NCCloud 1/20

Page 2: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion

.. Outline

Background

BackgroundErasure CodesRegenerating Codes

Solution

the Paper’s ContributionsF-MSR Code

Evaluation

Repair Traffic AnalysisCost AnalysisExperiments

Conclusion

presented by Shuai YUAN FAST’12: NCCloud 2/20

Page 3: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Background

Cloud storage is an emerging service model for remote backupand data synchronization.

Single-cloud storage raises concerns:

Cloud outageVendor lock-ins: Costly to switch cloud providers

presented by Shuai YUAN FAST’12: NCCloud 3/20

Page 4: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Proposed Solution

Multi-cloud storage

Deploy a proxy between users and multiple clouds

Stripe data across multiple clouds

presented by Shuai YUAN FAST’12: NCCloud 4/20

Page 5: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Proposed Solution

Problem: Multi-cloud storage repairThe proxy reads the essential data pieces from other survivingclouds, reconstructs new data pieces, and writes these new piecesto a new cloud.

presented by Shuai YUAN FAST’12: NCCloud 4/20

Page 6: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Proposed Solution

Problem: Multi-cloud storage repair

Table : Monthly price plans (in US dollars) for Amazon S3 (USStandard), Rackspace Cloud Files and Windows Azure Storage, as ofSeptember, 2011.

Amazon S3 Rackspace Azure

Storage (per GB) $0.14 $0.15 $0.15

presented by Shuai YUAN FAST’12: NCCloud 4/20

Page 7: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Proposed Solution

Problem: Multi-cloud storage repair

Table : Monthly price plans (in US dollars) for Amazon S3 (USStandard), Rackspace Cloud Files and Windows Azure Storage, as ofSeptember, 2011.

Amazon S3 Rackspace Azure

Storage (per GB) $0.14 $0.15 $0.15

Replication is expensive!e.g. To achieve double-fault tolerance, we need 2 replicas.

presented by Shuai YUAN FAST’12: NCCloud 4/20

Page 8: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Proposed Solution

Problem: Multi-cloud storage repair

Table : Monthly price plans (in US dollars) for Amazon S3 (USStandard), Rackspace Cloud Files and Windows Azure Storage, as ofSeptember, 2011.

Amazon S3 Rackspace Azure

Storage (per GB) $0.14 $0.15 $0.15

Replication is expensive!e.g. To achieve double-fault tolerance, we need 2 replicas.; Use erasure code instead.

presented by Shuai YUAN FAST’12: NCCloud 4/20

Page 9: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Erasure Codes

Given a message/file of k equal-size native chunks, encode it inton code chunks (n > k)

Optimal erasure codes: any k out of the n code blocks/chunksare sufficient to recover the original message/file.i.e. can tolerate arbitrary (n− k) failures.We call this (n,k) MDS (maximum distance separable)property.e.g. Reed-Solomon Codes [6]

Near-optimal erasure codes: require (1 + ε)k codeblocks/chunks to recover (ε > 0).i.e. cannot tolerate arbitrary (n− k) failures.e.g. Local Reconstruction Codes (used in Windows AzureStorage System) [4]

presented by Shuai YUAN FAST’12: NCCloud 5/20

Page 10: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed Solomon Codes (RAID-6) [6]

Achieve double-fault toleranceReed-Solomon Codes: additional data size = M

Relication: additional data size = 2M (2 replicas)

presented by Shuai YUAN FAST’12: NCCloud 6/20

Page 11: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed-Solomon Codes (RAID-6) [6]

How do Reed-Solomon Codes work?Given file F : divide it into k equal-size native chunks:F = [Fi]i=1,2,...,k. Encode them into (n− k) code chunks:C = [Ci]i=1,2,...,n−k.Use Encoding Matrix EM(n−k)×k to produce code chunks:

CT = EM × F T

Ci is the linear combination of F1, F2, . . . , Fk.

presented by Shuai YUAN FAST’12: NCCloud 7/20

Page 12: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed-Solomon Codes (RAID-6) [6]

In our case,F =

(A B

)EM =

(1 11 2

)CT =

(1 11 2

(AB

)=

(A+BA+ 2B

)

presented by Shuai YUAN FAST’12: NCCloud 7/20

Page 13: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed-Solomon Codes (RAID-6) [6]

We can rewrite it in another equivalent representation:Let P = [Pi]i=1,2,...,n = [F1, F2, . . . , Fk, C1, C2, . . . , Cn−k] be the

n chunks in storage, EM ′ =

(I

EM

), Here

I =

1 0 . . . 00 1 . . . 0...

.... . .

...0 0 . . . 1

then

P T = EM ′ × F T

=

(F T

CT

)

presented by Shuai YUAN FAST’12: NCCloud 7/20

Page 14: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed-Solomon Codes (RAID-6) [6]

In our case,

EM ′ =

(I

EM

)=

1 00 11 11 2

P T = EM ′ × F T

=

1 00 11 11 2

×(AB

)

=

AB

A+BA+ 2B

presented by Shuai YUAN FAST’12: NCCloud 7/20

Page 15: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed-Solomon Codes (RAID-6) [6]

Why Reed-Solomon Codes have MDS property?EM ′ is the key of MDS property!

presented by Shuai YUAN FAST’12: NCCloud 7/20

Page 16: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed-Solomon Codes (RAID-6) [6]

.Theorem (necessary and sufficient condition)..

......

Every possible k × k submatrix obtained by removing (n− k) rowsfrom EM ′ has full rank.equivalent expression of full rank:

rank = k

non-singular...

Alternative view:Consider the linear space ofP = [Pi]i=1,2,...,n = [F1, F2, . . . , Fk, C1, C2, . . . , Cn−k], itsdimension is k, and any k out of n vectors form a basis of thelinear space.

presented by Shuai YUAN FAST’12: NCCloud 7/20

Page 17: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed-Solomon Codes (RAID-6) [6]

Reed-Solomon Codes uses Vandermonde Matrix V as EM

V =

1 1 1 . . . 11 2 3 . . . k12 22 32 . . . k2

......

.... . .

...

1(n−k) 2(n−k) 3(n−k) . . . k(n−k)

presented by Shuai YUAN FAST’12: NCCloud 7/20

Page 18: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed-Solomon Codes (RAID-6) [6]

So matrix EM ′ is:

EM ′ =

1 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 11 1 1 . . . 11 2 3 . . . k12 22 32 . . . k2

......

.... . .

...

1(n−k) 2(n−k) 3(n−k) . . . k(n−k)

presented by Shuai YUAN FAST’12: NCCloud 7/20

Page 19: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Reed-Solomon Codes (RAID-6) [6]

Remark:

All arithmetic operations in Galois Field GF(2x). Then everynumber is less than 2x.For more details of Galois Field, please refer to [7] and aimplementation tutorial [5].

EM ′ satisfies MDS property.

presented by Shuai YUAN FAST’12: NCCloud 7/20

Page 20: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Problem of Traditional Erasure Codes (e.g. R-S Codes)

.Definition (Repair traffic)..

......

the amount of outbound data being read from other survivingclouds during the single-cloud failure recovery.

presented by Shuai YUAN FAST’12: NCCloud 8/20

Page 21: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Problem of Traditional Erasure Codes (e.g. R-S Codes)

.Definition (Repair traffic)..

......

the amount of outbound data being read from other survivingclouds during the single-cloud failure recovery.

Why do we only consider outbound traffic without taking inboundtraffic into account?

presented by Shuai YUAN FAST’12: NCCloud 8/20

Page 22: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Problem of Traditional Erasure Codes (e.g. R-S Codes)

.Definition (Repair traffic)..

......

the amount of outbound data being read from other survivingclouds during the single-cloud failure recovery.

Why do we only consider outbound traffic without taking inboundtraffic into account?

Amazon S3 Rackspace Azure

Date transfer in (per GB) free free free

Date transfer out (per GB) $0.12 $0.18 $0.15

presented by Shuai YUAN FAST’12: NCCloud 8/20

Page 23: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Problem of Traditional Erasure Codes (e.g. R-S Codes)

.Problem of Reed-Solomon Codes..

......

Need to obtain otain size of the whole file to repair.; Repair traffic is high!

presented by Shuai YUAN FAST’12: NCCloud 8/20

Page 24: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Regenerating Codes [1] [2] [3]

Goal:Minimize repair traffic while maintaining same fault tolerance aserasure codes ; recover faster than erasure codesIdea:

Do not reconstruct the whole file as in erasure codes

Instead, read only the chunks (smaller than whole file) thatare needed to recover the lost chunks

Build on network coding:

network bandwidth is more critical resource compared to diskaccess.encode chunks in storage nodes

presented by Shuai YUAN FAST’12: NCCloud 9/20

Page 25: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Regenerating Codes [1] [2] [3]

An example:

presented by Shuai YUAN FAST’12: NCCloud 9/20

Page 26: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Regenerating Codes [1] [2] [3]

.Problem of Regenerating Codes..

......

In our case, the storage nodes provided by cloud venders:

Support read/write

Do not have encoding/decoding capabilities

presented by Shuai YUAN FAST’12: NCCloud 9/20

Page 27: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Background Erasure Codes Regenerating Codes

.. Regenerating Codes [1] [2] [3]

.Problem of Regenerating Codes..

......

In our case, the storage nodes provided by cloud venders:

Support read/write

Do not have encoding/decoding capabilities

Next we will see how this paper of NCCloud solves these problems.

presented by Shuai YUAN FAST’12: NCCloud 9/20

Page 28: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. the Paper’s Contributions

Build NCCloud, a proxy-based storage system that appliesregenerating codes in multiple-cloud storage

Design goals:

Propose an implementable design of functionalminimum-storage regenerating (F-MSR) codeSupport basic read/write operations and the repair functionPreserve storage overhead as in MDS codes, while reducingrepair traffic

Implement and evaluate NCCloud in real storage setting

focus on double-fault tolerance (k = n− 2)focus on single-fault recovery

presented by Shuai YUAN FAST’12: NCCloud 10/20

Page 29: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. F-MSR Key Idea

non-systematic: don’t keep original data as in systematiccodes.

functional repair: don’t need to exactly regenerate the failedchunks, only require the repaired system maintains MDSproperty.

Let the proxy do the encoding/decoding.

presented by Shuai YUAN FAST’12: NCCloud 11/20

Page 30: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. File Upload

Given file F : divide it into k(n− k) equal-size native chunks:F = [Fi]i=1,2,...,k(n−k). Encode them into n(n− k) code chunks:P = [Pi]i=1,2,...,n(n−k).

presented by Shuai YUAN FAST’12: NCCloud 12/20

Page 31: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. File Upload

Upload process:

EncodeProduce Encoding Matrix (Use Vandermonde Matrix)EM = [ECV1, ECV2, . . . , ECVn(n−k)] = [αi,j ]n(n−k)×k(n−k)

Then produce code chunks:

P T = EM × F T

For i = 1, 2, . . . , n(n− k)

Pi = ECVi × F T

=∑k(n−k)

j=1 αi,jFj

ECVi: encoding coeffient vector of Pi.All arithmetic operations are in Galois Field GF(28).

Duplicate encoding matrix to all nodes as metadata.

presented by Shuai YUAN FAST’12: NCCloud 12/20

Page 32: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. File Download

Download process:Download all the chunks from any k of n clouds

Multiply inverted encoding matrix with downloaded chunks

F T = P T × EM−1

presented by Shuai YUAN FAST’12: NCCloud 13/20

Page 33: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. File Repair

6 steps:

presented by Shuai YUAN FAST’12: NCCloud 14/20

Page 34: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. File Repair

.Simulations..

......

Consider multiple rounds of permanent node failures for differentvalues of n. In each round, we randomly pick a node topermanently fail and trigger a repair.

.Simulation result..

......

If the loop of Steps 2 to 5 is repeated over 10 times ; bad repairOnly checking the MDS property, we see a bad repair very quickly:after no more than 7 and 2 rounds for n = 8 and n = 12,respectively.

presented by Shuai YUAN FAST’12: NCCloud 14/20

Page 35: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. File Repair

Solution:Two-phrase checking

MDS property check: Current repair maintains MDS property

Repair MDS property check: Next repair for any possiblefailure maintains MDS property

presented by Shuai YUAN FAST’12: NCCloud 14/20

Page 36: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. File Repair

Solution:Two-phrase checking

presented by Shuai YUAN FAST’12: NCCloud 14/20

Page 37: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. File Repair

Cost of two-phrase checking

MDS property check: enumerating Ckn subsets of n nodes to

see if each of their corresponding encoding matrices forms afull rank.

Repair MDS property check: for any failed node (out of nnodes), we collect any one out of (n− k) chunks from theother (n− 1) surviving nodes to reconstruct, therefore thecost is n(n− k)(n−1)Ck

n.

Return to .. Unsolved Problems

presented by Shuai YUAN FAST’12: NCCloud 14/20

Page 38: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion the Paper’s Contributions F-MSR Code

.. File Repair

Cost of two-phrase checking

MDS property check: enumerating Ckn subsets of n nodes to

see if each of their corresponding encoding matrices forms afull rank.

Repair MDS property check: for any failed node (out of nnodes), we collect any one out of (n− k) chunks from theother (n− 1) surviving nodes to reconstruct, therefore thecost is n(n− k)(n−1)Ck

n.

Return to .. Unsolved Problems

We have to check more times for the current repair, but bad repairwill be rare in the future iterative repairs.

presented by Shuai YUAN FAST’12: NCCloud 14/20

Page 39: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Repair Traffic Analysis Cost Analysis Experiments

.. Repair Traffic

Native data size Mk(n− k) native chunks of size M/k(n− k)

Repair Traffic: M/k(n− k)× (n− 1) = M/k × (1 +k − 1

n− k)

For k = n− 2, Repair Traffic: M/2× (1 +1

n− 2)

limn→∞Repair Traffic = M/2

.

......Save the repair traffic by close to 50% when n is large.

presented by Shuai YUAN FAST’12: NCCloud 15/20

Page 40: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Repair Traffic Analysis Cost Analysis Experiments

.. Cost Analysis

Table : Monthly price plans (in US dollars) for Amazon S3 (USStandard), Rackspace Cloud Files and Windows Azure Storage, as ofSeptember, 2011.

Amazon S3 Rackspace Azure

Storage (per GB) $0.14 $0.15 $0.15

Date transfer in (per GB) free free free

Date transfer out (per GB) $0.12 $0.18 $0.15

PUT,POST (per 10K requests) $0.10 free $0.01

GET (per 10K requests) $0.01 free $0.01

presented by Shuai YUAN FAST’12: NCCloud 16/20

Page 41: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Repair Traffic Analysis Cost Analysis Experiments

.. Cost Analysis

Metadata of F-MSR

Metadata size = 160B; file size = several MBs

Overhead due to GET requests during repair

Assuming S3 plan in Sep 2011, n = 4, k = 2, file size = 4MBConventional repair: 0.427%F-MSR repair: 0.854%

.

......Overhead cost is low.

presented by Shuai YUAN FAST’12: NCCloud 16/20

Page 42: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Repair Traffic Analysis Cost Analysis Experiments

.. Experiments

NCCloud deployment

Single machine connected to a cloud-of-cloudsn = 4, k = 2

Coding schemes

Reed-Solomon-based RAID-6 vs. F-MSR

Metric

Response time

Cloud environments:

Local cloud: OpenStack SwiftCommercial cloud: multiple containers in Azure

presented by Shuai YUAN FAST’12: NCCloud 17/20

Page 43: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Repair Traffic Analysis Cost Analysis Experiments

.. Response Time: Local Cloud

F-MSR has higher responsetime due toencoding/decodingoverhead.

F-MSR has slightly lessresponse time in repair, dueto less data download

presented by Shuai YUAN FAST’12: NCCloud 18/20

Page 44: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion Repair Traffic Analysis Cost Analysis Experiments

.. Response Time: Commercial Cloud

No distinct response time difference, as network fluctuations play abigger role in actual response time.

presented by Shuai YUAN FAST’12: NCCloud 19/20

Page 45: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion

.. Conclusion & Unsolved Problems

Conclusion:

Propose an implementable design of F-MSR:

Preserve storage cost.Use less repair traffic.Do not require storage nodes to have encoding capabilities.

Build NCCloud, which realizes F-MSR

Source code:http://ansrlab.cse.cuhk.edu.hk/software/nccloud/

presented by Shuai YUAN FAST’12: NCCloud 20/20

Page 46: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion

.. Conclusion & Unsolved Problems

Unsolved problems:

Repair costs is high when n and k are large: As we mentionedbefore (Click .. here ), F-MSR uses two-phrase checking, whichconsumes a lot of checking costs in the current repair phrase.Just as Reed-Solomon codes use Vandermonde Matrix toensure MDS property, a better algorithm is still seeking toreplace the check-after-trying approach.

The reason why F-MSR chooses to download chunks from all(n− 1) nodes for repairing a file comes from an argument in[1]: The more nodes we download chunks from, the lowerrepair traffic is. However, [1]’s conclusion is based on ahomogeneity model, and NCCloud’s multi-cloud solution isactually a heterogeneous environment. Such a basis may beinvalid.

presented by Shuai YUAN FAST’12: NCCloud 20/20

Page 47: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion

A.G. Dimakis, P.B. Godfrey, Y. Wu, M.J. Wainwright, andK. Ramchandran.Network coding for distributed storage systems.Information Theory, IEEE Transactions on, 56(9):4539–4551,2010.

A.G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh.A survey on network codes for distributed storage.Proceedings of the IEEE, 99(3):476–489, 2011.

A. Duminuco and E. Biersack.A practical study of regenerating codes for peer-to-peerbackup systems.In Distributed Computing Systems, 2009. ICDCS’09. 29thIEEE International Conference on, pages 376–384. IEEE, 2009.

C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan,J. Li, and S. Yekhanin.Erasure coding in windows azure storage.

presented by Shuai YUAN FAST’12: NCCloud 20/20

Page 48: my presentation of the paper "FAST'12 NCCloud"

..........

.....

.....................................................................

.....

......

.....

.....

.

1. Background – 2. Solution – 3. Evaluation – 4. Conclusion

In USENIX Annual Technical Conference (USENIX ATC),2012.

J.S. Plank et al.A tutorial on reed-solomon coding for fault-tolerance inraid-like systems.Software Practice and Experience, 27(9):995–1012, 1997.

I.S. Reed and G. Solomon.Polynomial codes over certain finite fields.Journal of the Society for Industrial & Applied Mathematics,8(2):300–304, 1960.

B. Sklar.Reed-solomon codes.Downloaded from URL http://www. informit.com/content/images/art. sub.–sklar7.sub.–reed-solomo-n/elementLinks/art. sub.–sklar7.sub.–reed-solomon. pdf,(unknown pub date), pages 1–33,2001.

presented by Shuai YUAN FAST’12: NCCloud 20/20