36
A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

A World of (Im)PossibilitiesNancy Lynch Celebration: Sixty and Beyond

Hagit Attiya, Technion

Jennifer Welch, Texas A&M University

Page 2: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 2

Introduction

One of the main themes of Nancy's work has been proving lower bounds and impossibility results for problems that arise in distributed computing.

Overview some of Nancy's results Less known results, hidden gems closer to our hearts

Emphasize their meaning and implications How they influenced the development of the field and of

distributed systems Concentrating on their positive impact

Page 3: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 3

Best-Known Example: FLP

Impossibility of asynchronous fault-tolerant consensus[Fischer, Lynch, Paterson]

Motivated work on strengthening models of computation

partially synchronous models [Dwork, Lynch, Stockmeyer] unreliable failure detectors [Chandra, Toueg]

weakening the problem definition k-set agreement

[Chaudhuri] renaming [Attiya et al.] condition-based approaches [Raynal, Rajsbaum et al.]

Page 4: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 4

FLP: Impact

Related practical problems: transaction commit leader election atomic broadcast maintaining consistent replicated data

The wait-free hierarchy (classify concurrent abstract data types) [Herlihy]

Attempts to solve k-set agreement and renaming led to the application of topology in distributed computing.

[Chaudhuri] [Borowsky, Gafni][Saks, Zaharoglou][Herlihy, Shavit]

Page 5: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 5

2nd Example: Brewer's Conjecture

[Brewer, PODC 2000 invited talk]

A web service cannot provide all three guarantees: Consistency Availability Partition-tolerance

Page 6: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 6

What Does This Mean?

[Gilbert, Lynch, SIGACT News 2002]

A web service cannot provide all three guarantees: Consistency: atomicity of (read / write) operations Availability: request by nonfaulty client gets response Partition-tolerance: even when lost messages create

two partitioned components in the network

Page 7: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 7

Proof Ideaadapted from [Attiya, Bar-Noy, Dolev]

p0

p1

XX

XX

p1 reads 0

p0 writes 1Exec 1:

Exec 2:

p0 writes 1 p1 reads 0Exec 3:

looksameto p1

Page 8: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 8

Brewer's Conjecture: Implications Traditional database services maintain the consistency

and fail to provide availability in the face of partitions Relax the consistency guarantees of the web service

Sometimes miss values or return stale data (Internet queries)[PIER: Huebsch, Hellerstein, Lanham, Loo, Shenker, Stoica]

Allow partitions to evolve separately, and build mechanisms to cope when this happens (stream processing)

[Medusa: Balazinska, Balakrishnan, Stonebraker]

Sacrifice availability, but not often (stream processing)…[BOREALIS: Balazinska, Balakrishnan, Madden, Stonebraker]

Assume a mechanism to guard against partitions… [CQ: Shah, Hellerstein, Brewer]

Page 9: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 9

3rd Example: Best-Case Cost of Fault-Tolerant Algorithms

Does making an algorithm be fault-tolerant incur a cost even when the system is well-behaved?

Previous investigation focused on the synchronous case early stopping algorithms for consensus:

2 rounds vs. 1 round for non-fault-tolerant algorithm[Dolev, Reischuk, Strong] [Dwork, Moses]

[Moses, Tuttle] non-blocking commit:

twice as many rounds as for blocking commit [Dwork, Skeen]

What about the asynchronous case?

Page 10: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 10

Are Wait-Free Algorithms Fast? [Attiya, Lynch, Shavit]

Studies the best-case complexity of an algorithm When there are no failures, although algorithm can tolerate any

number of crashes (is wait-free) When the execution is synchronized, although the algorithm

works in asynchronous executions also

Complexity measure of interest is running time Time is measured by synchronized rounds

Problem of interest is approximate agreement

n = 6

Page 11: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 11

Wait-Free Algorithms are not Fast A non-fault-tolerant algorithm takes O(1) time

one process writes its input and the rest read it achieves perfect agreement ( = 0)

Prove an Ω(log n) time lower bound for wait-free approximate agreement

So there are problems for which being wait-free in the asynchronous model imposes more than constant additional cost even when failures do not occur.

Page 12: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 12

Proof Idea

< log n

< n

0

0

0 decide0

0

0

0

0

0

0

this process cannotinfluence the decision

Page 13: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 13

Proof Idea1

< 1

0

0

< log n

< n

0

0

0

0

0

0

decide0

decide1

Page 14: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 14

The Best-Case Cost of Fault-Tolerance Formalize the idea of "designing for the normal /

common case" and show its cost[Lampson, "Hints for computer system design"]

The idea of accommodating the worst case & measuring the best / normal / common case has become standard. message cost of consensus in failure-free runs

[Halpern, Hadzilacos] contention-free step complexity

[Alur, Taubenfeld] obstruction-free step complexity

[Ellen, Luchangco, Moir, Shavit]

Page 15: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 15

Interleaving Algorithms

Also an approximate agreement algorithm matching the (log n) time lower bound

Interleaves two algorithms: One guarantees fault-tolerance Another guarantees best-case time complexity Need to coordinate results… Using a “virtual” two-process approximate agreement

algorithm Similar applications of interleaving,

especially in randomized consensus [Saks, Shavit, Woll] E.g., this morning session [Aspnes, Attiya,

Censor]

Page 16: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 16

Application: Replicated Storage

[Yu and Vahdat] Emulates a shared memory Replication-based implementation of wide-area data

access services need automatic regeneration of failed replicas and

reconfiguration of groups Probabilistic guarantee: reads may return stale values

with a small probability Optimizes for best case:

Failure-free reconfiguration is quick and cheap Failure-induced calls a consensus protocol [Saks, Shavit, Woll]

for replicas to agree on next configuration

Page 17: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 17

4th Example: Clock Synchronization In a distributed system with n nodes that experiences

variable message delays, how closely can the nodes' clocks be synchronized?

Page 18: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 18

Clock Synchronization Lower Bound [Lundelius, Lynch]

No algorithm can synchronize n clocks closer than

(1-1/n)u For a clique with same message delay uncertainty u on all links (u = max delay - min delay)

Even if no failures and no clock drift

Proof introduced the shifting technique

p0

p1

d-u dp0

p1

d-ud

shift p0 backwards by u

Page 19: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 19

What About Other Topologies?

[Halpern, Megiddo, Munshi]

Arbitrary topologies and nonuniform uncertainties Adversary's optimal strategy is to maximize a certain

quantity involving neighboring nodes' initial clock values and the delays

between them subject to constraints on message uncertainty

Bound is expressed as a system of equations, and this linear program is solved using optimization techniques Shifting notion is captured in the linear program Not in closed form except for a few special cases

Bound is tight

Page 20: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 20

What About Closed Form Bounds? [Biaz, Welch] If uncertainties are symmetric (same in both directions of

a link), then lower bound is diam/2

where diam is diameter of the graph w.r.t. uncertainties

e

2

3

4

3

5

2 4

1

5 diam = 9

af

dcb

Page 21: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 21

f

Arbitrary topology G with arbitrary uncertainties is equivalent to clique G' with same nodes where uncertainty between any two nodes is length of shortest path between them in G (w.r.t. uncertainties)

[Halpern, Megiddo, Munshi]

Shift a carefully chosen execution on the clique, for 2 nodes diam apart to get the diam/2 lower bound.

a

Shifting Equivalent Clique

a b

c

de

f

3

5649 233

6

12

4

35

5

Page 22: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 22

What About Upper Bounds? For arbitrary graph and arbitrary topology,

the radius is an upper bound [Halpern, Megiddo, Munshi]

Since radius ≤ diam, within factor of 2

diam = 9radius = 5

2

3

4

3

5

2 4

1

5

af

dcb

e Tight & almost tight closed form upper bounds for some

specific common topologies with uniform uncertainties[Biaz, Welch]

Page 23: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 23

External Clock Synchronization What about external synchronization,

when some clocks have outside time sources? Previous results for internal synchronization

The tight bound on how close a node's clock can get to the source time is half the shortest path distance (w.r.t. uncertainties) from the node to a source

[Attiya, Hay, Welch]

2

3

4

3

5

2 4

1

5sourcea

f

dsource

cb

bounds are:b: 3/2c: 1/2e: 3/2f: 5/2

Page 24: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 24

Optimal Synchronization Per Execution Given information collected in a specific execution,

by some algorithm strategy, find the tightest possible synchronization internal synchronization, offline algorithm

[Attiya, Herzberg, Rajsbaum]

external synchronization, online algorithm [Patt-Shamir,

Rajsbaum] extended to handle clock drift

[Ostrovsky, Patt-Shamir]

Page 25: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 25

Gradient Clock Synchronization The clock skew between any pair of nodes should be a

function of the distance between them[Fan,

Lynch]

af

dcb

e

clocks of a and dneed not beas tightly synch'edas those of a and b

Page 26: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 26

Gradient Clock Synchronization motivated by problems in sensor networks,

or more generally, large scale networks, where nodes in the same locality need to be more tightly synchronized data fusion target tracking

http://www.mikalac.com/mis/missile.html

Page 27: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 27

Gradient Clock Synch Lower Bound Closest that two nodes' clocks can get (in worst case) is

(log D / log log D) D is diameter of network global influence

Algorithms requiring a fixed maximum skew for nearby nodes may not scale well E.g., TDMA

http://www.dsna-dti.aviation-civile.gouv.fr/actualities/revuesgb/revue64gb/64pgarticle2gb/telecom_c2gb.html

Page 28: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 28

Gradient Clock Synch Lower Bound: Assumption 1Nonzero clock drift: (hardware) clocks can run fast or slow,

within known bounds

clocktime

real time

hardwareclockmax slope

< 1+

1+

min slope< (1+)-1 (1+)-1

Page 29: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 29

Gradient Clock Synch Lower Bound: Assumption 2Algorithm must ensure that (logical) clocks always increase

at some minimum positive rate

clocktime

real time

logicalclock

min slope<

Page 30: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 30

Gradient Clock Synch LB: Simple Case

Consider a simple algorithm in which the clock value of p1 is periodically propagated down the chain

Can construct execution in which pn-1's new clock value is larger than pn's old clock value by an amount depending on D carefully choose message delays manipulate clock drift rates cause nodes to suddenly jump to higher values without

synchronizing with their neighbors Insight in the paper is generalizing this to any algorithm

pnp3p2p1

Page 31: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 31

Is the Lower Bound Tight?

Recall lower bound is (log D / log log D) Several pre-existing algorithms have O(D) Then upper bound improved to O(√D)

[Locher, Wattenhofer]

Recently upper bound improved to O(log D)[Lenzen, Locher, Wattenhofer]

Still a small gap; can the lower bound be improved?

Page 32: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 32

How Long Can Large Difference Last? In the simple diffusion algorithm on the chain,

large difference between pn-1 and pn only lasts while message is in transit

Perhaps difficulties could be avoided by keeping track of “generation” of clock value and only comparing apples with apples (clocks of the same generation)? but this could be complicated

Page 33: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 33

And There’s a Lot More… Lower bounds on space for mutual exclusion

[Burns, Lynch] Lower bound on number of messages for leader election in

synchronous rings[Frederickson, Lynch]

Impossibility results for data link layer and connection management[Fekete, Lynch, Mansour, Spinelli] [Kleinberg, Attiya, Lynch]

Lower bound on time for consensus in partially synchronous models[Attiya, Dwork, Lynch, Stockmeyer]

Lower bound on time for synchronous k-set agreement[Chaudhuri, Herlihy, Lynch, Tuttle]

Tradeoff between safety and liveness for randomized coordinated attack

[Varghese, Lynch] Impossibility of boosting fault tolerance

[Attie, Guerraoui, Kouznetsov, Lynch, Rajsbaum] …

Page 34: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 34

Final Observations

Strive to make the results relevant Natural problems Practical architectural assumptions Realistic performance measures (for lower bounds)

Crisp arguments (ingenious but clear) Easy to understand and verify Simple to extend and lead to follow-ups

Page 35: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

PODC/Concur 2008 World of (Im)Possibilities 35

Take-Home Message

Impossibility results help the development of the area

Understanding inherent limits guides efforts in the appropriate directions

And setting boundaries is good for everyone…

Page 36: A World of (Im)Possibilities Nancy Lynch Celebration: Sixty and Beyond Hagit Attiya, Technion Jennifer Welch, Texas A&M University

Thanks for your attention

Thank you, Nancy!