26
Symmetric Allocations for Distributed Storage Derek Leong 1 , Alexandros G. Dimakis 2 , Tracey Ho 1 1 California Institute of Technology, USA 2 University of Southern California, USA GLOBECOM 2010 2010-12-09

Symmetric Allocations for Distributed Storage

  • Upload
    madison

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Symmetric Allocations for Distributed Storage. Derek Leong 1 , Alexandros G. Dimakis 2 , Tracey Ho 1 1 California Institute of Technology, USA 2 University of Southern California, USA GLOBECOM 2010 2010-12-09. A Motivating Example. - PowerPoint PPT Presentation

Citation preview

Page 1: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage

Derek Leong1, Alexandros G. Dimakis2, Tracey Ho1

1California Institute of Technology, USA2University of Southern California, USA

GLOBECOM 20102010-12-09

Page 2: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 2

A Motivating Example

1 2 3 4 5

Suppose you have a distributed storage system comprising 5 storage devices (“nodes”)…

Page 3: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 3

2 4

A Motivating Example

1 2 3 4 5

(1/3)2 (2/3)3 ≈ 0.0329218

Each node independently fails with probability 1/3, and

survives with probability 2/3 …

Page 4: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 4

2 41 3 5

A Motivating Example

(1/3)5 ≈ 0.00411523

1 3 52 4

Each node independently fails with probability 1/3, and

survives with probability 2/3 …

Page 5: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 5

A Motivating Example

1 2 3 4 5

You are given a single data object of unit size,

and a total storage budget of 7/3 …

Page 6: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 6

A Motivating Example

1 2 3 4 5

You can use any coding scheme to store any amountof coded data in each node, as long as the total amount

of storage used is at most the given budget 7/3 …

Page 7: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 7

A Motivating Example

1 2 3 4 5

010010101010010101000101010101000101010111010101001001010001010100

01101010001010101110101010010010100010101001

1010010101000101001110

1010010101000101001110

Page 8: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 8

A Motivating Example

1 2 3 4 5

010010101010010101000101010101000101010111010101001001010001010100

01101010001010101110101010010010100010101001

1010010101000101001110

1010010101000101001110

?

(1/3)2 (2/3)3 ≈ 0.0329218

Page 9: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 9

A Motivating Example

For maximum reliability, we need to find

(1) an optimal allocation of the given budget over the nodes, and

(2) an optimal coding scheme

that jointly maximize the probability of successful recovery

Page 10: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 10

A Motivating Example

Using an appropriate code, successful recovery occurs whenever the data collector accesses at least a unit amount of data (= size of the original data object)

S

1 2 3 4 5

t1 t2

Page 11: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 11

A Motivating Example

1 2 3 4 5

Page 12: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 12

A Motivating Example

1 2 3 4 5

RecoveryProbability

A 7/15 7/15 7/15 7/15 7/15

B 7/6 7/6 0 0 0

C 2/3 2/3 1/3 1/3 1/3

0.79012

0.88889

0.90535C

for p = 2/3

Page 13: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 13

Problem Formulation

recovery probability

budget constraint

Given n nodes, access probability p, and total storage budget T, find an optimal allocation (x1; …; xn) that maximizes the probability of successful recovery

Trivial cases of minimum and maximum budgets: when T = 1, the allocation (1, 0, …, 0) is optimal when T = n, the allocation (1, 1, …, 1) is optimal

The optimal allocation also

tells us whether coding is

beneficial for reliable storage

#P-hard to compute for a

given allocation and choice of p

Page 14: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 14

Discussion between R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley, 2005

S. Jain, M. Demmer, R. Patra, K. Fall, “Using redundancy to cope with failures in a delay tolerant network,” SIGCOMM 2005

Related Work

Page 15: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 15

We are particularly interested in symmetric allocations because they are easy to describe and implement

Successful recovery for the symmetric allocationoccurs if and only if at least out of them nonempty nodes are accessed

Therefore, the recovery probability of is

Symmetric Allocations

Page 16: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 16

The symmetric allocation that spreads the budget maximally over all n nodes is asymptotically optimal when the budget T is sufficiently large

Asymptotic Optimality of Max Spreading

RESULT 1The gap between the recovery probabilities for anoptimal allocation and for the symmetric allocation is at most

.

If p and T are fixed such that , then this gap approaches zero as .

Page 17: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 17

Asymptotic Optimality of Max SpreadingProof Idea: Bounding the optimal recovery probability…

1. By conditioning on the number of accessed nodes r, we can express the probability of successful recovery as

where Sr is the number of successful r-subsets

2. We can in turn bound Sr by observing that we have Sr inequalitiesof the form , which can be summed up to produce

,

where

Page 18: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 18

Asymptotic Optimality of Max SpreadingProof Idea: Bounding the optimal recovery probability…

3. We therefore have

4. Applying the bound

to

leads to the conclusion that the optimal recovery probability is at most

Page 19: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 19

Asymptotic Optimality of Max SpreadingProof Idea: Bounding the suboptimality gap for max spreading…

1. The recovery probability of the allocation is

2. The suboptimality gap for this allocation is therefore at most the difference between the upper bound for the optimal recovery probability and 1, which is

3. For , we can apply the Chernoff bound to obtain

4. As , this upper bound approaches zero

Page 20: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 20

The problem is nontrivial even when restrictedto symmetric allocations…

Optimal Symmetric Allocation number of nonempty nodes in the symmetric

allocation

Page 21: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 21

Maximal spreading is optimal among symmetric allocations when the budget T is sufficiently large

Optimal Symmetric Allocation

RESULT 2

If , then either or

is an optimal symmetric allocation.

Page 22: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 22

Minimal spreading is optimal among symmetric allocations when the budget T is sufficiently small

Optimal Symmetric Allocation

RESULT 3

If , then is an optimal

symmetric allocation.

Coding is unnecessary for such an allocation

Page 23: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 23

Optimal Symmetric AllocationProof Idea: Finding the optimal symmetric allocation…

1. Observe that we can find an optimal m* from among candidates:

2. For , where , the recovery probability is

3. RESULT 2 (max spreading optimal) is a sufficient condition on p and Tfor to be nondecreasing in k

4. To obtain RESULT 3 (min spreading optimal) , we first establish asufficient condition on p and T for to be nonincreasing in k; we subsequently expand the condition to include other points for which remains optimal

m…

For constant p and k, is anondecreasing function of m

Recall that the recovery probability of thesymmetric allocation is given by

Page 24: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 24

Optimal Symmetric Allocation

maximal spreading is optimal

among symmetric allocations

minimalspreading is optimal

among symmetric allocations

other symmetric allocations may be optimal in the gap

Page 25: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 25

The optimal allocation is not necessarily symmetric

However, the symmetric allocation that spreads the budget maximally over all n nodes is asymptotically optimal when the budget is sufficiently large

Furthermore, we are able to specify the optimal symmetric allocation for a wide range of parameter values of p and T

Conclusion

Page 26: Symmetric Allocations for Distributed Storage

Symmetric Allocations for Distributed Storage 26

Thank you!