22
Coterie Coterie availability in availability in sites sites Flavio Junqueira and Keith Marzullo University of California, San Diego DISC, Krakow, Poland, September 2005

Coterie availability in sites

Embed Size (px)

DESCRIPTION

Coterie availability in sites. Flavio Junqueira and Keith Marzullo University of California, San Diego DISC, Krakow, Poland, September 2005. Multi-site systems. Emerging class of distributed systems Collection of sites across a WAN Multiple nodes in each site Share resources Data sets - PowerPoint PPT Presentation

Citation preview

Page 1: Coterie availability in sites

Coterie availability in sitesCoterie availability in sites

Flavio Junqueira and Keith Marzullo

University of California, San Diego

DISC, Krakow, Poland, September 2005

Page 2: Coterie availability in sites

2DISC’05

Multi-site systemsMulti-site systems

Emerging class of distributed systems Collection of sites across a WAN Multiple nodes in each site Share resources

Data sets Computational power

E.g. BIRN, Geon, TeraGrid, PlanetLab

Site failure All the nodes in a site simultaneously

unavailable

Page 3: Coterie availability in sites

3DISC’05

Site availability — BIRNSite availability — BIRN

10 sites experience at least one outage

One site under 97%

Page 4: Coterie availability in sites

4DISC’05

Improving availabilityImproving availability

Better availability through replication Coteries

Set system of processes: a set of subsets of processes Each subset is called a quorum Minimal sets, pairwise intersect

Coteries are useful Distributed mutual exclusion Distributed registers Consensus through Paxos

Coterie availability in multi-site systems

Page 5: Coterie availability in sites

5DISC’05

RoadmapRoadmap

System model Availability metrics

Previous deterministic metrics not necessarily good A new metric

Failure model Characterize failures using survivor sets Survivor sets: more expressive

Quorum construction Multi-site hierarchical construction

Practical issues Failure model in practice PlanetLab experiment

Conclusions

Page 6: Coterie availability in sites

6DISC’05

System modelSystem model

Set P of processes Pairwise connected by quasi-reliable asynchronous channels Process failure: crash Processes can recover

Set B of sites Partition of the set processes Site failure: simultaneous failure of all the processes in the site Process failures are not independent

Execution Sequence of steps of processes E: set of all executions

In a step s

Available process in s p P is available if p F(s) €

NF(s) = P \ F(s)

F(s) = {p : ( p ∈ P)∧( p is faulty in s)}

Page 7: Coterie availability in sites

7DISC’05

Survivor setsSurvivor sets

A set S P is a survivor set iff

Example

∀p ∈ S : ∀E ∈E : S \ p ≠ NF(s)

∃E ∈E : ∃s ∈ E : S = NF(s)

Processes

Sites

E={E1,E2,E3,E4}

E1,E2: s1 s2 E3: s1 E4: s1

NF(si)

Survivor sets

Page 8: Coterie availability in sites

8DISC’05

Availability metricsAvailability metrics

Traditional deterministic metrics Undirected graph: nodes = processes, edges = comm. links Node vulnerability: Minimal number of nodes Edge vulnerability: Minimal number of edges

Majority is optimal [Barbara and Garcia-Molina’86] Complete graphs

Page 9: Coterie availability in sites

9DISC’05

A counterexampleA counterexample

Processes

Survivor sets

Sites

Majority Quorum: 5 processes In some step, no quorum can

be formed

Using SP as quorums In every step, at least one

quorum can be formed

Majority is not optimal

Page 10: Coterie availability in sites

10DISC’05

Availability metricsAvailability metrics

Traditional deterministic metrics Undirected graph: nodes = processes, edges = comm. links Node vulnerability: Minimal number of nodes Edge vulnerability: Minimal number of edges

Majority is optimal [Barbara and Garcia-Molina’86] Complete graphs

A new metric A(Q), Q is a coterie Number of covered survivor sets in Q A survivor set S is covered in Q if:

∃Q ∈Q : Q ⊆ S

Page 11: Coterie availability in sites

11DISC’05

Failure modelFailure model

Multi-site hierarchical model A set Fs of subsets of B

Subsets of simultaneously faulty sites

An array Fp One entry per site Each entry: subsets of

processes in the site Subsets of simultaneously

faulty processes at a site

A survivor set S: FS Fs

Bi FS:FP Fp[i]:P\FP S

Bi FS:Bi S =

Processes (P)

B1 B2 B3

Fs ={{B1},{B2},{B3}}

1 2 3 1 2 3 1 2 3

Fp [1]={{ }: i {1,2,3}}i

Fp [2]={{ }: i {1,2,3}}i

Fp [3]={{ }: i {1,2,3}}i

Sites(B )

Sp={{ }: i, j,k,l {1,2,3} ij kl}i j k l

{{ }: i, j,k,l {1,2,3} ij kl}i j k l

{{ }: i, j,k,l {1,2,3} ij kl}i j k l

Page 12: Coterie availability in sites

12DISC’05

Quorum constructionQuorum construction

Optimal availability with respect to A

Coterie Q : Sp = Q OR Q dominates Sp

Survivor sets in Sp pairwise intersect

If not, then optimally discarding survivor sets is NP-Complete

A special case: Qsite All subsets of B of size fs inFs

All subsets of size t of Bi in Fp[i], for every i

Site 1

Site 2

Site 3

E.g.: fs = 1, t = 1

Quorums

Page 13: Coterie availability in sites

13DISC’05

Model in practiceModel in practice

Qsite fs: Threshold on site failures

Data on site availability t : Threshold on process failures

Markov chains One Markov chain for each site

Transitions Failure transitions: same probability, homogeneous processes Repair transitions: variable probability, amount of resources used

Failure transitions

Repair transitions

Page 14: Coterie availability in sites

14DISC’05

PlanetLab experimentPlanetLab experiment

Toy application Paxos: quorums of acceptors Client accessing quorums

Hosts used Three sites: three from each site One UCSD host: proposer,

learner

Three settings 3Sites: One acceptor per site

Quorum: two hosts 3SitesMaj: All hosts

Quorum: four hosts, majority from each of two sites

SimpleMaj: All hosts Quorum: any five processes

UC Davis

UT Austin

DukeUC San Diego

SimpleMaj has worse availability

3SitesMaj has better availability

Page 15: Coterie availability in sites

15DISC’05

The Bimodal modelThe Bimodal model

Sites are survivor sets Sp is not a coterie

“Throw out” survivor sets In general, optimal solution is NP-Complete Simple solution for this model

Practical issues Practical for two sites More than two sites: open problem

n0

t0 t1 t t

00 01 0t

10 11 1t

0n

n1 n t nn

t n

1n

Page 16: Coterie availability in sites

16DISC’05

ConclusionsConclusions

Coteries for multi-site systems Site failures: process failures not independent

A new metric Counts covered survivor sets

Multi-site hierarchical construction Practical Illustrated with Markov model Experiment shows better availability

Using majority quorums is not a good idea Not optimal Poor performance

Future work More experiments, more constructions, real deployment

Page 17: Coterie availability in sites

17DISC’05

END

Page 18: Coterie availability in sites

18DISC’05

Backup Slides

Page 19: Coterie availability in sites

19DISC’05

Failure modelsFailure models

The multi-site hierarchical model A set Fs of subsets of B

An array Fp One entry per site Each entry: subsets of processes in

the site

A survivor set S: FS Fs

Bi FS:FP Fp[i]:P\FP S

Bi FS:Bi S =

The bimodal model A set Fs of subsets of B

There is one site that is in no element of Fs

An array Fp

A survivor set S As in the previous model OR

Bi B: S = Bi

Processes

B2B1

Fs =

Fp [1]={{ }: i {1,2,3}}

1 2 3 1 2 3

i

Fp [2]={{ }: i {1,2,3}}i

MSH: Sp={{ }: i, j,k,l {1,2,3}

ij kl} i j k l

B: Sp={{ }: i, j,k,l {1,2,3} ij kl} B

i j k l

Page 20: Coterie availability in sites

20DISC’05

Bimodal constructionBimodal construction

Bimodal model By construction: Not all pairs of survivor sets intersect

Discard survivor sets until remaining intersect Selecting optimally is NP-Complete

Solution: Remove |B|-1 survivor sets Survivor sets containing processes from multiple sites pairwise intersect Construction is also optimal with respect to metric A

A special case: Bsite All elements of Fs have size fs

All elements of Fp[i] have the same size t, for every i

E.g.: fs = 1, t = 1 B1

B2

Quorums

Page 21: Coterie availability in sites

21DISC’05

Site availabilitySite availability

Goals Show that sites are unavailable frequently enough

BIRN - Biomedical Informatics Research Network Test bed projects centered around brain imaging Currently: 19 universities, 26 research groups

Availability Monthly basis Pings (BIRN-CC) Storage broker logs

Site availability Jan/04-Aug/04 Availability under 100%

On average in 5 out of the 8 months

Availability = Total hours - Unplanned outages

Total hours×100

Page 22: Coterie availability in sites

22DISC’05

Causes of site failuresCauses of site failures

Misconfigured software Shared resources

1.Storage2.Power circuits3.Cooling pipes4.Air conditioning5.Network