51
Distributed Computing Group The Consensus Problem Roger Wattenhofer a lot of kudos to Maurice Herlihy and Costas Busch for some of their slides Distributed Computing Group Roger Wattenhofer 2 Sequential Computation memory object object thread Distributed Computing Group Roger Wattenhofer 3 Concurrent Computation memory object object threads Distributed Computing Group Roger Wattenhofer 4 Asynchrony Sudden unpredictable delays – Cache misses (short) – Page faults (long) – Scheduling quantum used up (really long)

The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

DistributedComputing

Group

The Consensus ProblemRoger Wattenhofer

a lot of kudos to Maurice Herlihy

and Costas Busch for some of their slides

Distributed Computing Group Roger Wattenhofer 2

Sequential Computation

memory

object object

thread

Distributed Computing Group Roger Wattenhofer 3

Concurrent Computation

memory

object object

thre

ads

Distributed Computing Group Roger Wattenhofer 4

Asynchrony

• Sudden unpredictable delays– Cache misses (short)– Page faults (long)– Scheduling quantum used up (really long)

Page 2: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 5

Model Summary

• Multiple threads– Sometimes called processes

• Single shared memory• Objects live in memory• Unpredictable asynchronous delays

Distributed Computing Group Roger Wattenhofer 6

Road Map

• We are going to focus on principles– Start with idealized models– Look at a simplistic problem– Emphasize correctness over pragmatism– “Correctness may be theoretical, but

incorrectness has practical impact”

Distributed Computing Group Roger Wattenhofer 7

You may ask yourself …

I’m no theory weenie - why all the theorems and proofs?

Distributed Computing Group Roger Wattenhofer 8

Fundamentalism

• Distributed & concurrent systems are hard– Failures– Concurrency

• Easier to go from theory to practice than vice-versa

Page 3: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 9

The Two Generals

Red army winsIf both sides

attack together

Distributed Computing Group Roger Wattenhofer 10

Communications

Red armies send messengers across valley

Distributed Computing Group Roger Wattenhofer 11

Communications

Messengersdon’t always make it

Distributed Computing Group Roger Wattenhofer 12

Your Mission

Design a protocol to ensure that red armies attack

simultaneously

Page 4: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 13

Theorem

There is no non-trivial protocol that ensures the red armies attacks simultaneously

Distributed Computing Group Roger Wattenhofer 14

Proof Strategy

• Assume a protocol exists• Reason about its properties• Derive a contradiction

Distributed Computing Group Roger Wattenhofer 15

Proof

1. Consider the protocol that sends fewest messages

2. It still works if last message lost3. So just don’t send it

– Messengers’ union happy4. But now we have a shorter protocol!5. Contradicting #1

Distributed Computing Group Roger Wattenhofer 16

Fundamental Limitation

• Need an unbounded number of messages

• Or possible that no attack takes place

Page 5: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 17

You May Find Yourself …

I want a real-time YAFA compliant Two Generals

protocol using UDP datagramsrunning on our enterprise-level

fiber tachyion network ...

I want a real-time YAFA compliant Two Generals

protocol using UDP datagramsrunning on our enterprise-level

fiber tachyion network ...

Distributed Computing Group Roger Wattenhofer 18

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagramsrunning on our enterprise-level

fiber tachyion network ...

Yes, Ma’am, right away!Yes, Ma’am, right away!

Distributed Computing Group Roger Wattenhofer 19

You might say

I want a real-time dot-net compliant Two Generals

protocol using UDP datagramsrunning on our enterprise-level

fiber tachyion network ...

Yes, Ma’am, right away!

Advantage:•Buys time to find another job•No one expects software to work anyway

Advantage:•Buys time to find another job•No one expects software to work anyway

Distributed Computing Group Roger Wattenhofer 20

You might say

I want a real-time dot-net compliant Two Generals

protocol using UDP datagramsrunning on our enterprise-level

fiber tachyion network ...

Yes, Ma’am, right away!

Advantage:•Buys time to find another job•No one expects software to work anyway

Disadvantage:•You’re doomed•Without this course, you may not even know you’re doomed

Disadvantage:•You’re doomed•Without this course, you may not even know you’re doomed

Page 6: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 21

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagramsrunning on our enterprise-level

fiber tachyion network ...

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser.

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser.

Distributed Computing Group Roger Wattenhofer 22

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagramsrunning on our enterprise-level

fiber tachyion network ...

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser

Advantage:•No need to take course

Advantage:•No need to take course

Distributed Computing Group Roger Wattenhofer 23

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagramsrunning on our enterprise-level

fiber tachyion network ...

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser

I can’t find a fault-tolerant algorithm, I guess I’m just a

pathetic loser

Advantage:•No need to take course

Advantage:•No need to take course

Disadvantage:•Boss fires you, hires University St. Gallen graduate

Disadvantage:•Boss fires you, hires University St. Gallen graduate

Distributed Computing Group Roger Wattenhofer 24

You might say

I want a real-time YAFA compliant Two Generals

protocol using UDP datagramsrunning on our enterprise-level

fiber tachyion network ...

Using skills honed in course, I can avert certain disaster!

•Rethink problem spec, or•Weaken requirements, or•Build on different platform

Using skills honed in course, I can avert certain disaster!

•Rethink problem spec, or•Weaken requirements, or•Build on different platform

Page 7: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 25

Consensus: Each Thread has a Private Input

32 1921

Distributed Computing Group Roger Wattenhofer 26

They Communicate

Distributed Computing Group Roger Wattenhofer 27

They Agree on Some Thread’s Input

1919 19

Distributed Computing Group Roger Wattenhofer 28

Consensus is important

• With consensus, you can implement anything you can imagine…

• Examples: with consensus you can decide on a leader, implement mutual exclusion, or solve the two generals problem

Page 8: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 29

You gonna learn

• In some models, consensus is possible• In some other models, it is not

• Goal of this and next lecture: to learn whether for a given model consensus is possible or not … and prove it!

Distributed Computing Group Roger Wattenhofer 30

Consensus #1shared memory

• n processors, with n > 1• Processors can atomically read or

write (not both) a shared memory cell

Distributed Computing Group Roger Wattenhofer 31

Protocol (Algorithm?)

• There is a designated memory cell c.• Initially c is in a special state “?”• Processor 1 writes its value v1 into c,

then decides on v1.• A processor j (j not 1) reads c until j

reads something else than “?”, and then decides on that.

Distributed Computing Group Roger Wattenhofer 32

Unexpected Delay

Swapped outback at

??? ???

Page 9: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 33

Heterogeneous Architectures

??? ???

PentiumPentium286

yawn

(1) Distributed Computing Group Roger Wattenhofer 34

Fault-Tolerance

??? ???

Distributed Computing Group Roger Wattenhofer 35

Consensus #2wait-free shared memory

• n processors, with n > 1• Processors can atomically read or

write (not both) a shared memory cell• Processors might crash (halt)• Wait-free implementation… huh?

Distributed Computing Group Roger Wattenhofer 36

Wait-Free Implementation

• Every process (method call) completes in a finite number of steps

• Implies no mutual exclusion• We assume that we have wait-free

atomic registers (that is, reads and writes to same register do not overlap)

Page 10: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 37

A wait-free algorithm…

• There is a cell c, initially c=“?”• Every processor i does the following

r = Read(c);

if (r == “?”) then

Write(c, vi); decide vi;

else

decide r;

Distributed Computing Group Roger Wattenhofer 38

Is the algorithm correct?

time

cell c32 17???

321732! 17!

Distributed Computing Group Roger Wattenhofer 39

Theorem:No wait-free consensus

??? ???

Distributed Computing Group Roger Wattenhofer 40

Proof Strategy

• Make it simple– n = 2, binary input

• Assume that there is a protocol• Reason about the properties of any

such protocol• Derive a contradiction

Page 11: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 41

Wait-Free Computation

• Either A or B “moves”• Moving means

– Register read– Register write

A moves B moves

Distributed Computing Group Roger Wattenhofer 42

The Two-Move TreeInitial state

Final states

Distributed Computing Group Roger Wattenhofer 43

Decision Values

1 0 0 1 1 1Distributed Computing Group Roger Wattenhofer 44

Bivalent: Both Possible

1 1 1

bivalent

1 0 0

Page 12: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 45

Univalent: Single Value Possible

1 1 1

univalent

1 0 0Distributed Computing Group Roger Wattenhofer 46

1-valent: Only 1 Possible

0 1 1 1

1-valent

01

Distributed Computing Group Roger Wattenhofer 47

0-valent: Only 0 possible

1 1 1

0-valent

1 0 0Distributed Computing Group Roger Wattenhofer 48

Summary

• Wait-free computation is a tree• Bivalent system states

– Outcome not fixed• Univalent states

– Outcome is fixed– May not be “known” yet– 1-Valent and 0-Valent states

Page 13: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 49

Claim

Some initial system state is bivalent

(The outcome is not always fixed from the start.)

Distributed Computing Group Roger Wattenhofer 50

A 0-Valent Initial State

• All executions lead to decision of 0

0 0

Distributed Computing Group Roger Wattenhofer 51

A 0-Valent Initial State

• Solo execution by A also decides 0

0

Distributed Computing Group Roger Wattenhofer 52

A 1-Valent Initial State

• All executions lead to decision of 1

1 1

Page 14: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 53

A 1-Valent Initial State

• Solo execution by B also decides 1

1

Distributed Computing Group Roger Wattenhofer 54

A Univalent Initial State?

• Can all executions lead to the same decision?

0 1

Distributed Computing Group Roger Wattenhofer 55

State is Bivalent

• Solo execution by Amust decide 0

• Solo execution by B must decide 1

0 1

Distributed Computing Group Roger Wattenhofer 56

0-valent

Critical States

1-valent

critical

Page 15: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 57

Critical States

• Starting from a bivalent initial state• The protocol can reach a critical

state– Otherwise we could stay bivalent

forever– And the protocol is not wait-free

Distributed Computing Group Roger Wattenhofer 58

From a Critical State

c

If A goes first, protocol decides 0

If B goes first, protocol decides 1

0-valent 1-valent

Distributed Computing Group Roger Wattenhofer 59

Model Dependency

• So far, memory-independent!• True for

– Registers– Message-passing– Carrier pigeons– Any kind of asynchronous computation

Distributed Computing Group Roger Wattenhofer 60

What are the Threads Doing?

• Reads and/or writes• To same/different registers

Page 16: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 61

Possible Interactions

????y.write()

????x.write()

????y.read()

????x.read()

y.write()x.write()y.read()x.read()

Distributed Computing Group Roger Wattenhofer 62

Reading Registers

A runs solo, decides 0

B reads x

1

0A runs solo, decides 1

c

States look the same to A

Distributed Computing Group Roger Wattenhofer 63

Possible Interactions

??nonoy.write()

??nonox.write()

nonononoy.read()

nonononox.read()

y.write()x.write()y.read()x.read()

Distributed Computing Group Roger Wattenhofer 64

Writing Distinct Registers

A writes y B writes x

10

c

The song remains the same

A writes yB writes x

Page 17: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 65

Possible Interactions

?nononoy.write()

no?nonox.write()

nonononoy.read()

nonononox.read()

y.write()x.write()y.read()x.read()

Distributed Computing Group Roger Wattenhofer 66

Writing Same Registers

States look the same to A

A writes x B writes x

1A runs solo, decides 1

c

0

A runs solo, decides 0 A writes x

Distributed Computing Group Roger Wattenhofer 67

That’s All, Folks!

nonononoy.write()

nonononox.write()

nonononoy.read()

nonononox.read()

y.write()x.write()y.read()x.read()

Distributed Computing Group Roger Wattenhofer 68

Theorem

• It is impossible to solve consensus using read/write atomic registers– Assume protocol exists– It has a bivalent initial state– Must be able to reach a critical state– Case analysis of interactions

• Reads vs others• Writes vs writes

Page 18: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 69

What Does Consensus have to do with Distributed Systems?

Distributed Computing Group Roger Wattenhofer 70

We want to build a Concurrent FIFO Queue

Distributed Computing Group Roger Wattenhofer 71

With Multiple Dequeuers!

Distributed Computing Group Roger Wattenhofer 72

A Consensus Protocol

2-element array

FIFO Queue with red and black balls

8

Coveted red ball Dreaded black ball

Page 19: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 73

Protocol: Write Value to Array

0 10

(3) Distributed Computing Group Roger Wattenhofer 74

0

Protocol: Take Next Item from Queue

0 18

Distributed Computing Group Roger Wattenhofer 75

0 1

Protocol: Take Next Item from Queue

I got the coveted red ball, so I will decide

my value

I got the dreaded black ball, so I will decide the other’s value from the

array8

Distributed Computing Group Roger Wattenhofer 76

Why does this Work?

• If one thread gets the red ball• Then the other gets the black ball• Winner can take her own value• Loser can find winner’s value in array

– Because threads write arraybefore dequeuing from queue

Page 20: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 77

Implication

• We can solve 2-thread consensus using only– A two-dequeuer queue– Atomic registers

Distributed Computing Group Roger Wattenhofer 78

Implications

• Assume there exists– A queue implementation from atomic registers

• Given– A consensus protocol from queue and registers

• Substitution yields– A wait-free consensus protocol from atomic

registers

contradi

ction

Distributed Computing Group Roger Wattenhofer 79

Corollary

• It is impossible to implement a two-dequeuer wait-free FIFO queue with read/write shared memory.

• This was a proof by reduction; important beyond NP-completeness…

Distributed Computing Group Roger Wattenhofer 80

Consensus #3read-modify-write shared mem.• n processors, with n > 1• Wait-free implementation• Processors can atomically read and

write a shared memory cell in one atomic step: the value written can depend on the value read

• We call this a RMW register

Page 21: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 81

Protocol

• There is a cell c, initially c=“?”• Every processor i does the following

RMW(c), with

if (c == “?”) then

Write(c, vi); decide vi;

else

decide c;

atomic stepDistributed Computing Group Roger Wattenhofer 82

Discussion

• Protocol works correctly– One processor accesses c as the first;

this processor will determine decision• Protocol is wait-free• RMW is quite a strong primitive

– Can we achieve the same with a weaker primitive?

Distributed Computing Group Roger Wattenhofer 83

Read-Modify-Write more formally

• Method takes 2 arguments:– Variable x– Function f

• Method call:– Returns value of x– Replaces x with f(x)

Distributed Computing Group Roger Wattenhofer 84

public abstract class RMW {private int value;

public void rmw(function f) {int prior = this.value;this.value = f(this.value);return prior;

}

}

Read-Modify-Write

Return prior value

Apply function

Page 22: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 85

public abstract class RMW {private int value;

public void read() {int prior = this.value;this.value = this.value;return prior;

}

}

Example: Read

identity functionDistributed Computing Group Roger Wattenhofer 86

public abstract class RMW {private int value;

public void TAS() {int prior = this.value;this.value = 1;return prior;

}

}

Example: test&set

constant function

Distributed Computing Group Roger Wattenhofer 87

public abstract class RMW {private int value;

public void fai() {int prior = this.value;this.value = this.value+1;return prior;

}

}

Example: fetch&inc

increment functionDistributed Computing Group Roger Wattenhofer 88

public abstract class RMW {private int value;

public void faa(int x) {int prior = this.value;this.value = this.value+x;return prior;

}

}

Example: fetch&add

addition function

Page 23: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 89

public abstract class RMW {private int value;

public void swap(int x) {int prior = this.value;this.value = x;return prior;

}

}

Example: swap

constant functionDistributed Computing Group Roger Wattenhofer 90

public abstract class RMW {private int value;

public void CAS(int old, int new) {int prior = this.value;if (this.value == old)

this.value = new;return prior;

}

}

Example: compare&swap

complex function

Distributed Computing Group Roger Wattenhofer 91

“Non-trivial” RMW

• Not simply read• But

– test&set, fetch&inc, fetch&add, swap, compare&swap, general RMW

• Definition: A RMW is non-trivial if there exists a value v such that v ≠ f(v)

Distributed Computing Group Roger Wattenhofer 92

Consensus Numbers (Herlihy)

• An object has consensus number n– If it can be used

• Together with atomic read/write registers– To implement n-thread consensus

• But not (n+1)-thread consensus

Page 24: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 93

Consensus Numbers

• Theorem– Atomic read/write registers have

consensus number 1

• Proof– Works with 1 process– We have shown impossibility with 2

Distributed Computing Group Roger Wattenhofer 94

Consensus Numbers

• Consensus numbers are a useful way of measuring synchronization power

• Theorem– If you can implement X from Y– And X has consensus number c– Then Y has consensus number at least c

Distributed Computing Group Roger Wattenhofer 95

Synchronization Speed Limit

• Conversely– If X has consensus number c– And Y has consensus number d < c– Then there is no way to construct a

wait-free implementation of X by Y• This theorem will be very useful

– Unforeseen practical implications!

Distributed Computing Group Roger Wattenhofer 96

Theorem

• Any non-trivial RMW object has consensus number at least 2

• Implies no wait-free implementation of RMW registers from read/write registers

• Hardware RMW instructions not just a convenience

Page 25: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 97

Proofpublic class RMWConsensusFor2

implements Consensus {private RMW r;

public Object decide() {int i = Thread.myIndex();if (r.rmw(f) == v)

return this.announce[i];else

return this.announce[1-i]; }}

Initialized to v

Am I first?

Yes, return my input

No, return other’s input Distributed Computing Group Roger Wattenhofer 98

Proof

• We have displayed– A two-thread consensus protocol– Using any non-trivial RMW object

Distributed Computing Group Roger Wattenhofer 99

Interfering RMW

• Let F be a set of functions such that for all fi and fj, either– They commute: fi(fj(x))=fj(fi(x))– They overwrite: fi(fj(x))=fi(x)

• Claim: Any such set of RMW objects has consensus number exactly 2

Distributed Computing Group Roger Wattenhofer 100

Examples

• Test-and-Set– Overwrite

• Swap– Overwrite

• Fetch-and-inc– Commute

Page 26: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 101

Meanwhile Back at the Critical State

c

0-valent 1-valent

A about to apply fA

B about to apply fB

Distributed Computing Group Roger Wattenhofer 102

Maybe the Functions Commute

c

0-valent

A applies fA B applies fB

A applies fAB applies fB

0 1C runs solo C runs solo

1-valent

Distributed Computing Group Roger Wattenhofer 103

Maybe the Functions Commute

c

0-valent

A applies fA B applies fB

A applies fAB applies fB

0 1C runs solo C runs solo

1-valent

These states look the same to CThese states look the same to C

Distributed Computing Group Roger Wattenhofer 104

Maybe the Functions Overwrite

c

0-valent

A applies fA B applies fB

A applies fA

0

1

C runs solo

C runs solo

1-valent

Page 27: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 105

Maybe the Functions Overwrite

c

0-valent

A applies fA B applies fB

A applies fA

0

1

C runs solo

C runs solo

1-valent

These states look the same to CThese states look the same to C

Distributed Computing Group Roger Wattenhofer 106

Impact

• Many early machines used these “weak” RMW instructions– Test-and-set (IBM 360)– Fetch-and-add (NYU Ultracomputer)– Swap

• We now understand their limitations– But why do we want consensus anyway?

Distributed Computing Group Roger Wattenhofer 107

public class RMWConsensusimplements Consensus {

private RMW r;

public Object decide() {int i = Thread.myIndex();int j = r.CAS(-1,i);if (j == -1)

return this.announce[i];else

return this.announce[j]; }}

CAS has Unbounded Consensus NumberInitialized to -1

Am I first?

Yes, return my input

No, return other’s input Distributed Computing Group Roger Wattenhofer 108

The Consensus Hierarchy

1 Read/Write Registers, …

2 T&S, F&I, Swap, …

∞ CAS, …

.

.

.

Page 28: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 109

Consensus #4Synchronous Systems

• In real systems, one can sometimes tell if a processor had crashed– Timeouts– Broken TCP connections

• Can one solve consensus at least in synchronous systems?

Distributed Computing Group Roger Wattenhofer 110

Communication Model

• Complete graph• Synchronous

1p

2p

3p

4p5p

Distributed Computing Group Roger Wattenhofer 111

1p

2p

3p

4p5p

aa

aa

Send a message to all processors in one round: Broadcast

Distributed Computing Group Roger Wattenhofer 112

At the end of the round:everybody receives a

1p

2p

3p

4p5p

a

a

a

a

Page 29: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 113

1p

2p

3p

4p5p

a

a

aab

b

b

b

Broadcast: Two or more processes can broadcast in the same round

Distributed Computing Group Roger Wattenhofer 114

1p

2p

3p

4p5p

a,b

a

ba,b

a,b

At end of round...

Distributed Computing Group Roger Wattenhofer 115

Crash Failures

Faulty processor 1p

2p

3p

4p5p

aa

aa

Distributed Computing Group Roger Wattenhofer 116

1p

2p

3p

4p5p

a

a

Some of the messages are lost,they are never received

Faulty processor

Page 30: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 117

1p

2p

3p

4p5p

a

a

Effect

Faulty processor

Distributed Computing Group Roger Wattenhofer 118

Failure

1p

2p

3p

4p

5p

Round1

1p

2p

3p

4p

5p

1p

2p

3p

4p

5p

Round2

Round3

1p

2p

4p

5p

Round4

1p

2p

4p

5p

Round5

3p 3p

After a failure, the process disappears from the network

Distributed Computing Group Roger Wattenhofer 119

Consensus: Everybody has an initial value

0

1

2 3

4

Start

Distributed Computing Group Roger Wattenhofer 120

3

3

3 3

3

Finish

Everybody must decide on the same value

Page 31: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 121

1

1

1 1

1

Start

If everybody starts with the same valuethey must decide on that value

Finish1

1

1 1

1

Validity condition:

Distributed Computing Group Roger Wattenhofer 122

A simple algorithm

1. Broadcasts value to all processors

2. Decides on the minimum

Each processor:

(only one round is needed)

Distributed Computing Group Roger Wattenhofer 123

0

1

2 3

4

Start

Distributed Computing Group Roger Wattenhofer 124

0

1

2 3

4

Broadcast values0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

Page 32: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 125

0

0

0 0

0

Decide on minimum

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

Distributed Computing Group Roger Wattenhofer 126

0

0

0 0

0

Finish

Distributed Computing Group Roger Wattenhofer 127

This algorithm satisfies the validity condition

1

1

1 1

1

Start Finish1

1

1 1

1

If everybody starts with the same initial value,everybody sticks to that value (minimum)

Distributed Computing Group Roger Wattenhofer 128

Consensus with Crash Failures

1. Broadcasts value to all processors

2. Decides on the minimum

Each processor:

The simple algorithm doesn’t work

Page 33: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 129

0

1

2 3

4

Start

fail

The failed processor doesn’t broadcast its value to all processors

0

0

Distributed Computing Group Roger Wattenhofer 130

0

1

2 3

40,1,2,3,4

1,2,3,4

fail

0,1,2,3,4

1,2,3,4

Broadcasted values

Distributed Computing Group Roger Wattenhofer 131

0

0

1 0

10,1,2,3,4

1,2,3,4

fail

0,1,2,3,4

1,2,3,4

Decide on minimum

Distributed Computing Group Roger Wattenhofer 132

0

0

1 0

1

fail

Finish - No Consensus!

Page 34: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 133

If an algorithm solves consensus for f failed processes we say it is

an f-resilient consensus algorithm

Distributed Computing Group Roger Wattenhofer 134

0

1

4 3

2

Start Finish1

1

Example: The input and output of a 3-resilient consensus algorithm

Distributed Computing Group Roger Wattenhofer 135

New validity condition:if all non-faulty processes start with the same value then all non-faulty processesdecide on that value

1

1

1 1

1

Start Finish1

1Distributed Computing Group Roger Wattenhofer 136

An f-resilient algorithm

Round 1:Broadcast my value

Round 2 to round f+1:Broadcast any new received values

End of round f+1:Decide on the minimum value received

Page 35: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 137

0

1

2 3

4

Start

Example: f=1 failures, f+1=2 rounds needed

Distributed Computing Group Roger Wattenhofer 138

0

1

2 3

4

Round 1

0

0fail

Example: f=1 failures, f+1 = 2 rounds needed

Broadcast all values to everybody

0,1,2,3,4

1,2,3,4 0,1,2,3,4

1,2,3,4

(new values)

Distributed Computing Group Roger Wattenhofer 139

Example: f=1 failures, f+1 = 2 rounds needed

Round 2 Broadcast all new values to everybody

0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

0,1,2,3,41

2 3

4

Distributed Computing Group Roger Wattenhofer 140

Example: f=1 failures, f+1 = 2 rounds needed

Finish Decide on minimum value

0

0 0

00,1,2,3,4

0,1,2,3,4 0,1,2,3,4

0,1,2,3,4

Page 36: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 141

0

1

2 3

4

Start

Example: f=2 failures, f+1 = 3 rounds needed

Example of execution with 2 failures

Distributed Computing Group Roger Wattenhofer 142

0

1

2 3

4

Round 1

0

Failure 1

Broadcast all values to everybody

1,2,3,4

1,2,3,4 0,1,2,3,4

1,2,3,4

Example: f=2 failures, f+1 = 3 rounds needed

Distributed Computing Group Roger Wattenhofer 143

0

1

2 3

4

Round 2

Failure 1

Broadcast new values to everybody

0,1,2,3,4

1,2,3,4 0,1,2,3,4

1,2,3,4

Failure 2

Example: f=2 failures, f+1 = 3 rounds needed

Distributed Computing Group Roger Wattenhofer 144

0

1

2 3

4

Round 3

Failure 1

Broadcast new values to everybody

0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

O,1,2,3,4

Failure 2

Example: f=2 failures, f+1 = 3 rounds needed

Page 37: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 145

0

0

0 3

0

Finish

Failure 1

Decide on the minimum value

0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

O,1,2,3,4

Failure 2

Example: f=2 failures, f+1 = 3 rounds needed

Distributed Computing Group Roger Wattenhofer 146

Example:5 failures,6 rounds

1 2

No failure

3 4 5 6Round

If there are f failures and f+1 rounds then there is a round with no failed process

Distributed Computing Group Roger Wattenhofer 147

• Every (non faulty) process knows about all the values of all the other participating processes

•This knowledge doesn’t change untilthe end of the algorithm

At the end of theround with no failure:

Distributed Computing Group Roger Wattenhofer 148

Everybody would decide on the same value

However, as we don’t know the exact position of this round,

we have to let the algorithm execute for f+1 rounds

Therefore, at the end of theround with no failure:

Page 38: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 149

when all processes start with the sameinput value then the consensus is that value

This holds, since the value decided fromeach process is some input value

Validity of algorithm:

Distributed Computing Group Roger Wattenhofer 150

A Lower Bound

Any f-resilient consensus algorithmrequires at least f+1 rounds

Theorem:

Distributed Computing Group Roger Wattenhofer 151

Proof sketch:

Assume for contradiction that f or less rounds are enough

Worst case scenario:

There is a process that fails in each round

Distributed Computing Group Roger Wattenhofer 152

Round

a

1before process fails, it sends its value a to only one process

ip

kp

ip

kp

Worst case scenario

Page 39: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 153

Round

a

1before process fails, it sends value a to only one process

mp

kp

kp

mp

Worst case scenario

2

Distributed Computing Group Roger Wattenhofer 154

Round 1

fp

Worst case scenario

2

………

a np

f3At the end of round fonly one processknows about value a

np

Distributed Computing Group Roger Wattenhofer 155

Round 1

Worst case scenario

2

………

f3Process may decide on a, and all other processes may decide on another value (b)

np

npa

b

decide

Distributed Computing Group Roger Wattenhofer 156

Round 1

Worst case scenario

2

………

f3

npa

b

decideTherefore f rounds are not enoughAt least f+1 rounds are needed

Page 40: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 157

Consensus #5Byzantine Failures

Faulty processor 1p

2p

3p

4p5p

ab

ac

Different processes receive different valuesDistributed Computing Group Roger Wattenhofer 158

1p

2p

3p

4p5p

a

a

A Byzantine process can behave like a Crashed-failed process

Some messages may be lost

Faulty processor

Distributed Computing Group Roger Wattenhofer 159

Failure

1p

2p

3p

4p

5p

Round1

1p

2p

3p

4p

5p

1p

2p

3p

4p

5p

Round2

Round3

1p

2p

4p

5p

Round4

1p

2p

4p

5p

Round5

After failure the process continuesfunctioning in the network

3p 3p

Failure

1p

2p

4p

5p

Round6

3p

Distributed Computing Group Roger Wattenhofer 160

Consensus with Byzantine Failures

solves consensus for f failed processes

f-resilient consensus algorithm:

Page 41: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 161

The input and output of a 1-resilient consensus algorithm

0

1

4 3

2

Start Finish3

3

Example:

3 3

Distributed Computing Group Roger Wattenhofer 162

Validity condition:if all non-faulty processes start withthe same value then all non-faulty processesdecide on that value

1

1

1 1

1

Start Finish1

1

1 1

Distributed Computing Group Roger Wattenhofer 163

Any f-resilient consensus algorithm requires at leastf+1 rounds

Theorem:

follows from the crash failure lower bound

Proof:

Lower bound on number of rounds

Distributed Computing Group Roger Wattenhofer 164

There is no f-resilient algorithm

for n processes, where f ≥ n/3

Theorem:

Plan: First we prove the 3 process case,and then the general case

Upper bound on failed processes

Page 42: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 165

There is no 1-resilient algorithmfor 3 processes

Lemma:

Proof: Assume for contradiction that there is a 1-resilient algorithm for 3 processes

The 3 processes case

Distributed Computing Group Roger Wattenhofer 166

0p

1p 2p

A(0)

B(1) C(0)

Initial value

Localalgorithm

Distributed Computing Group Roger Wattenhofer 167

0p

1p 2p

1

1 1

Decision value

Distributed Computing Group Roger Wattenhofer 168

3p

4p

2pA(0)

B(1)

C(1)

1p

5p 0pA(1)C(0)

B(0)

Assume 6 processes are in a ring

(just for fun)

Page 43: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 169

3p

4p

2pA(0)

B(1)

C(1)

1p

5p 0pA(1)C(0)

B(0)

B(1)1p

0pA(1)

2pfaulty

C(1)

C(0)Processes think they are in a triangle

Distributed Computing Group Roger Wattenhofer 170

3p

4p

2pA(0)

B(1)

C(1)

1p

5p 0pA(1)C(0)

B(0)

11p

0p1

2pfaulty

(validity condition)

Distributed Computing Group Roger Wattenhofer 171

3p

4p

2pA(0) C(1)

1p

5p 0pA(1)C(0)

B(0)

0p1

1p

2pC(0)

B(0)

0pA(0)

A(1)

faulty

B(1)

Distributed Computing Group Roger Wattenhofer 172

3p

4p

2pA(0)

B(1)

C(1)

1p

5p 0pA(1)C(0)

B(0)

0p1

1p

2p0

0

0pfaulty

(validity condition)

Page 44: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 173

3p

4p

2pA(0)

B(1)

C(1)

1p

5p 0pA(1)C(0)

B(0)

0p1

2p0

2p 0pA(1)C(0)

1pB(1)B(0)

faultyDistributed Computing Group Roger Wattenhofer 174

3p

4p

2pA(0)

B(1)

C(1)

1p

5p 0pA(1)C(0)

B(0)

0p1

2p0

2p 0p10

1p faulty

Distributed Computing Group Roger Wattenhofer 175

2p 0p10

1p faulty

Impossibility

Distributed Computing Group Roger Wattenhofer 176

There is no algorithm that solvesconsensus for 3 processesin which 1 is a byzantine process

Conclusion

Page 45: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 177

Assume for contradiction thatthere is an f -resilient algorithm Afor n processes, where f ≥ n/3

We will use algorithm A to solve consensusfor 3 processes and 1 failure (which is impossible, thus we have a contradiction)

The n processes case

Distributed Computing Group Roger Wattenhofer 178

1p

0 1

2p np

1

… …

2 21 0 00 1 1start

failures

1p

1 1

2p np… …

1 1 11 1finish

Algorithm A

Distributed Computing Group Roger Wattenhofer 179

31 npp K

1q

2q3q321

3nn pp K

+nn pp K1

32 +

Each process q simulates algorithm A

on n/3 of “p” processes

Distributed Computing Group Roger Wattenhofer 180

31 npp K1q

2q3q321

3nn pp K

+nn pp K1

32 +

fails

When a single q is byzantine, then n/3 of

the “p” processes are byzantine too.

Page 46: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 181

31 npp K

1q

2q3q321

3nn pp K

+nn pp K1

32 +

fails

algorithm A tolerates n/3 failures

Finish of algorithm A

kkk

k kk

k

k

kkkk

k

all decide k

Distributed Computing Group Roger Wattenhofer 182

1q

2q3q

fails

Final decision k

k

We reached consensus with 1 failureImpossible!!!

Distributed Computing Group Roger Wattenhofer 183

There is no f-resilient algorithm

for n processes with f ≥ n/3

Conclusion

Distributed Computing Group Roger Wattenhofer 184

The King Algorithm

solves consensus with n processes andf failures where f < n/4 in f +1 “phases”

There are f+1 phasesEach phase has two roundsIn each phase there is a different king

Page 47: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 185

Example: 12 processes, 2 faults, 3 kings

0 1 1 2 21 0 00 1 1 0

initial values

Faulty

Distributed Computing Group Roger Wattenhofer 186

Example: 12 processes, 2 faults, 3 kings

Remark: There is a king that is not faulty

0 1 1 2 21 0 00 1 1 0

initial values

King 1 King 2 King 3

Distributed Computing Group Roger Wattenhofer 187

Each processor has a preferred valueip iv

In the beginning, the preferred value

is set to the initial value

The King algorithm

Distributed Computing Group Roger Wattenhofer 188

Round 1, processor :ip

• Broadcast preferred value

• Set to the majority of values received

iviv

The King algorithm: Phase k

Page 48: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 189

•If had majority of less than

Round 2, king :kp

•Broadcast new preferred value

Round 2, process :ipkv

iv fn +2

then set to iv kv

The King algorithm: Phase k

Distributed Computing Group Roger Wattenhofer 190

End of Phase f+1:

Each process decides on preferred value

The King algorithm

Distributed Computing Group Roger Wattenhofer 191

Example: 6 processes, 1 fault

Faulty

0 1

king 1

king 20

11

2

Distributed Computing Group Roger Wattenhofer 192

0 1

king 1

0

11

2

Phase 1, Round 1

2,1,1,0,0,0

2,1,1,1,0,0

2,1,1,1,0,0 2,1,1,0,0,0

2,1,1,0,0,0

0

1

1 0

0

Everybody broadcasts

Page 49: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 193

1 0

king 1

0

11

0

Phase 1, Round 1 Choose the majority

Each majority population was 42

3 =+≤ fn

On round 2, everybody will choose the king’s value

Distributed Computing Group Roger Wattenhofer 194

Phase 1, Round 2

1 0

0

11

00

1

0 1

2

king 1

The king broadcasts

Distributed Computing Group Roger Wattenhofer 195

Phase 1, Round 2

0 1

0

11

2

king 1

Everybody chooses the king’s value

Distributed Computing Group Roger Wattenhofer 196

0 1

king 20

11

2

Phase 2, Round 1

2,1,1,0,0,0

2,1,1,1,0,0

2,1,1,1,0,0 2,1,1,0,0,0

2,1,1,0,0,0

0

1

1 0

0

Everybody broadcasts

Page 50: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 197

1 0

0

11

0

Phase 2, Round 1 Choose the majority

Each majority population is 42

3 =+≤ fn

On round 2, everybody will choose the king’s value

king 2

2,1,1,1,0,0

Distributed Computing Group Roger Wattenhofer 198

Phase 2, Round 2

1 0

0

11

0

The king broadcasts

king 2

000

0 0

Distributed Computing Group Roger Wattenhofer 199

Phase 2, Round 2

0 0

0

10

0king 2

Everybody chooses the king’s valueFinal decision

Distributed Computing Group Roger Wattenhofer 200

In the round where the king is non-faulty, everybody will choose the king’s value v

After that round, the majority will

remain value v with a majority populationwhich is at least fnfn +>−

2

Invariant / Conclusion

Page 51: The Consensus Problem...Consensus is important • With consensus, you can implement anything you can imagine… • Examples: with consensus you can decide on a leader, implement

Distributed Computing Group Roger Wattenhofer 201

Exponential Algorithm

solves consensus with n processes andf failures where f < n/3 in f +1 “phases”

But: uses messages with exponential size

Distributed Computing Group Roger Wattenhofer 202

Atomic Broadcast

• One process wants to broadcast message to all other processes

• Either everybody should receive the (same) message, or nobody should receive the message

• Closely related to Consensus: First send the message to all, then agree!

Distributed Computing Group Roger Wattenhofer 203

Summary

• We have solved consensus in a variety of models; particularly we have seen – algorithms– wrong algorithms– lower bounds– impossibility results– reductions– etc.

DistributedComputing

GroupRoger Wattenhofer

Questions?