Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes

Distributed Algorithms:

Agreement Protocols

Problems of Agreement

A set of processes need to agree on a value (decision), after one or more processes have proposed what that value (decision) should be

Examples: mutual exclusion, election, transactions

Processes may be correct, crashed, or they may exhibit arbitrary (Byzantine) failures

Messages are exchanged on an one-to-one basis, and they are not signed

Consensus and related problems

System modelN processes {p1, p2, ..., pN}Communication is reliable but processes may failAt most f processes out of N may be faulty.

Crash failure Byzantine failure (arbitrary)

The system is logically fully connectedA receiver process knows the identity of the sender processLimiting faults solely to the processes simplifies the solution to the agreement problems

Recently agreement problems have been studied under the failure of communication channels only & under the failure of both process & communication channels

Authenticated & Non-authenticated messages

To reach an agreement, processes have to exchange their values and relay the received values to other procs

Authenticated or signed message system – A (faulty) process cannot forge a message or change the contents of a received message (before it relays the message to other).

A process can verify the authenticity of a received message

Non-authenticated or unsigned or oral message – A (faulty) process can forge a message and claimed to have received it from another process or change the contents of a received message before it relays the message to other.

A process has no way of verifying the authenticity of a received message

Two Agreement Problems

Consensus problem: N processes agree on a value (e.g. synchronized action – go / abort)Consensus may have to be reached in the presence of failure Process failure – crash/fail-stop, arbitrary failure Communication failure

All process i starts in an “undecided” state

Every process i proposes a value vi , from a set D while in the undecided state.

Process i exchanges messages until it makes decision di and moves to decided state.A consensus is reached if all correct processes agree on the same value di

Consensus Requirements

Termination: Eventually each correct process sets its decision value

This may not be possible in the presence of process crashes in asynchronous system

Agreement: The decision value is same for all correct processes

Arbitrary (Byzantine) failures may cause inconsistency and prevent agreement

Integrity: if all correct processes propose the same value, any correct process decides that value

Consensus may involve a proposal stage and an agreement stage

Byzantine Generals Problem

Proposed and solved by LamportConsider a battle ground. There are a number of generals at different positions and want to reach an agreement in their attack plan, i.e, “attack” or “retreat”.

Generals are separated geographically and communicate through messengers. Some of the generals are “loyal” and some are “traitors”.

Upper bound on number of traitorsPease et al. showed that it is impossible to reach a consensus if f exceeds (N-1)/3


“Byzantine generals” problem: a “commander” process i orders value v.

The “lieutenant” processes must agree on what the commander ordered.

Processes may be faulty provide wrong or contradictory messages

Integrity requirement: A distinguished process decides a value for others to agree upon

Solution only exists if N > 3f, where f : #faulty processes

Differs from consensus in that a distinguished process supplies a value that the others are to agree upon, instead of each of them proposing a value


RequirementsTermination: Eventually each process sets its decision variable

Agreement: The decision value of all correct processes is the same

Integrity: If the commander is correct, then all correct processes agree on the value the commander proposed

Note: integrity implies agreement when the commander is correct; but the commander need not be correct

IC: A Variant of Consensus

Interactive Consistency Problem

Every process proposes a single value.

The goal of the algorithm is for the correct processes to agree on a vector of values, one for each process – the “decision vector”

Ex – for each of a set of processes to obtain the same information about their respective states

IC: A Variant of Consensus

Requirements

Termination: Eventually each process sets its decision variable

Agreement: The decision vector of all correct processes is the same

Integrity: If pi is correct, then all correct processes agree on vi as the ith component of its vector

Relationship between C, BG & IC

Although it is common to consider the BG problem with arbitrary process failures, in fact each of the three problems – C, BG, & IC – is meaningful in the context of either arbitrary or crash failures

Each can be framed assuming either a synchronous or an asynchronous system

It is sometimes possible to derive a solution to one problem using a solution to another


Suppose that there exist solutions to C, BG & ICCi(v1, v2, … vN) returns the decision value of pi in a run of the solution to the consensus problem where v1, v2, … are the values that the processes proposed

BGi(j, v) returns the decision value of pi in a run of the solution to the BG problem, where pj, the commander proposed the value v

ICi(v1, v2, … vN)[ j ] returns the jth value in the decision vector of pi in a run of the solution to the IC problem, where v1, v2, … are the values that the processes proposed

It is possible to construct solutions out of the solutions to other problems


IC can be solved by using BG’s solution by running it N times, once with each process pi (i = 1, 2, … N) acting as the commander:

ICi(v1, v2, … vN)[ j ] = BGi(j, v) (i = 1, 2, … N)

C can be solved by using IC’s solution by running IC to produce a vector of values at each process, then applying an appropriate function on the vector’s values to derive a single value:

Ci(v1, v2, … vN) = majority(ICi(v1, v2, … vN)[1], … ICi(v1, v2, … vN)[N] )

BG can be solved from C as follows: The commander pj sends its proposed value v to itself and each of the

remaining processes All processes run C with values v1, v2, … vN that they receive (pj may be faulty)

They derive BGi(j, v) = Ci(v1, v2, … vN) (i = 1, 2, … N)

Consensus

Solving consensus is equivalent to solving reliable and totally ordered multicast

Given a solution to one, we can solve the other

Implementing consensus with RTO-multicastCollect all processes into a group g

Each process pi performs RTO-multicast(g, vi)

Each process pi chooses di = mi, where mi is the first value that pi RTO-delivers

Termination property follows from the reliability of the multicast The agreement and integrity properties follow from the reliability and total

ordering of multicast delivery

Chandra & Toueg [1996] demonstrates how RTO-multicast can be derived from consensus

Consensus in a synchronous system

We discuss an algorithm that uses only a basic multicast protocol to solve consensus in a synchronous system

The algorithm assumes that up to f of the N processes exhibit crash failures

Communication Model

1p

2p

3p

4p5p

•Complete graph (i.e. logically fully connected)•Synchronous, network

Multicast

Send a message to all processors in one round

1p

2p

3p

4p5p

aa

aa

At the end of round: everybody receives a

1p

2p

3p

4p5p

a

a

a

a

Multicast

Two or more processes can multicast at the same round

1p

2p

3p

4p5p

a

a

aab

b

b

b

1p

2p

3p

4p5p

a,b

a

ba,b

a,b

Crash Failures

Faulty processor

1p

2p

3p

4p5p

aa

aa

Faulty processor

Some of the messages are lost,they are never received

1p

2p

3p

4p5p

a

a

Faulty processor

1p

2p

3p

4p5p

a

a

Consensus

0

1

2 3

4

Start

Everybody has an initial value

3

3

3 3

3

Finish

Everybody must decide the same value

1

1

1 1

1

Start

If everybody starts with the same valuethey must decide that value

Finish1

1

1 1

1

Validity condition:

A simple algorithm using B-multicast

1. B-multicast value to all processors

2. Decide on the minimum

Each processor:

(only one round is needed)

0

1

2 3

4

Start

0

1

2 3

4

B-multicast values

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0

0

0 0

0

Decide on minimum

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0,1,2,3,4

0

0

0 0

0

Finish

This algorithm satisfies the validity condition

1

1

1 1

1

Start Finish1

1

1 1

1

If everybody starts with the same initial value,everybody decides on that value (minimum)

Consensus with Crash Failures

1. B-multicast value to all processors

2. Decide on the minimum

Each processor:

The simple algorithm doesn’t work

0

1

2 3

4

Start fail

The failed processor doesn’t multicastits value to all processors

0

0

0

1

2 3

4

Multicasted values

0,1,2,3,4

1,2,3,4

fail

0,1,2,3,4

1,2,3,4

0

0

1 0

1

Decide on minimum

0,1,2,3,4

1,2,3,4

fail

0,1,2,3,4

1,2,3,4

0

0

1 0

1

Finish fail

No Consensus!!!

If an algorithm solves consensus for f failed process we say it is:

an f-resilient consensus algorithm

The input and output of a 3-resilient consensus algorithm

0

1

4 3

2

Start Finish1

1

Example:

New validity condition:

if all non-faulty processes start withthe same value then all non-faulty processesdecide that value

1

1

1 1

1

Start Finish1

1

An f-resilient algorithm

Round 1: B-multicast my value

Round 2 to round f+1: Multicast any new received values

End of round f+1: Decide on the minimum value received

0

1

2 3

4

Start

Example: f=1 failures, f+1 = 2 rounds needed

0

1

2 3

4

Round 1

0

0fail


B-multicast all values to everybody

0,1,2,3,4

1,2,3,4 0,1,2,3,4

1,2,3,4

(new values)


Round 2

B-multicast all new values to everybody

0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

0,1,2,3,41

2 3

4


Finish

Decide on minimum value

0

0 0

0

0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

0,1,2,3,4

0

1

2 3

4

Start


Another example execution with 3 failures

0

1

2 3

4

Round 1

0

Failure 1

Multicast all values to everybody

1,2,3,4

1,2,3,4 0,1,2,3,4

1,2,3,4


0

1

2 3

4

Round 2 Failure 1

Multicast new values to everybody

0,1,2,3,4

1,2,3,4 0,1,2,3,4

1,2,3,4

Failure 2


0

1

2 3

4

Round 3 Failure 1


0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

O, 1,2,3,4

Failure 2


0

0

0 3

0

Finish Failure 1

Decide on the minimum value

0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

O, 1,2,3,4

Failure 2


0

1

2 3

4

Start


Another example execution with 3 failures

0

1

2 3

4

Round 1

0

Failure 1

Multicast all values to everybody

1,2,3,4

1,2,3,4 0,1,2,3,4

1,2,3,4


0

1

2 3

4

Round 2 Failure 1


0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

0,1,2,3,4


At the end of this round all processesknow about all the other values

Remark:

0

1

2 3

4

Round 3 Failure 1


0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

0,1,2,3,4


(no new values are learned in this round)

Failure 2

0

0

0 3

0

Finish Failure 1

Decide on minimum value

0,1,2,3,4

0,1,2,3,4 0,1,2,3,4

0,1,2,3,4


Failure 2

If there are f failures and f+1 rounds then there is a round with no failed process

Example: 5 failures,6 rounds

1 2

No failure

3 4 5 6Round

In the algorithm, at the end of theround with no failure:

• Every (non faulty) process knows about all the values of all other participating processes

•This knowledge doesn’t change until the end of the algorithm

Therefore, at the end of theround with no failure:

everybody would decide the same value

However, we don’t know the exact positionof this round, so we have to let the algorithmexecute for f+1 rounds

Validity of algorithm:

when all processes start with the sameinput value then the consensus is that value

This holds, since the value decided fromeach process is some input value

A Lower Bound

Any f-resilient consensus algorithm requires at least f+1 rounds

Theorem:

Proof sketch:

Assume for contradiction that f or less rounds are enough

Worst case scenario:

There is a process that fails in each round

Round

a

1

before process fails, it sends its value a to only one process

ip

kp

ip

kp

Worst case scenario

Round

a

1

before process fails, it sends value a to only one process

mp

kp

kpmp

Worst case scenario

2

Round 1

Worst case scenario

2

………

f3

Process may decide a, and all other processes may decide another value (b)

np

npa

bdecide

Round 1

Worst case scenario

2

………

f3

npa

bdecide

Therefore f rounds are not enoughAt least f+1 rounds are needed

Consensus in synchronous systems

Up to f faulty processesDuration of round: max. delay of B-multicast

Dolev & Strong, 1983:Any algorithm to reach consensus despite up to f failures requires (f +1) rounds.

Byzantine agreement: synchronous

p1 (Commander)

p2 p3

1:v1:v

2:1:v

3:1:u

p1 (Commander)

p2 p3

1:x1:w

2:1:w

3:1:x

3 says 1 says ‘u’

Faulty process

Lamport et al, 1982: No solution for N = 3, f = 1

Nothing can be done to improve a correct process’ knowledge beyond the first stage: - It cannot tell which process is faulty.

Pease et al, 1982: No solution for N<= 3*f(assuming private comm. channels)

Byzantine agreement for N > 3*f

Example with N=4, f=1: - 1st round: Commander sends a value to each lieutenant - 2nd round: Each of the lieutenants sends the value it has received to each of its peers. - A lieutenant receives a total of (N – 2) + 1 values, of which (N – 2) are correct. - By majority(), the correct lieutenants compute the same value.

p1 (Commander)

p2 p3

1:v1:v

2:1:v3:1:u

p4

1:v

4:1:v

2:1:v 3:1:w

4:1:v

p1 (Commander)

p2 p3

1:w1:u

2:1:u3:1:w

p4

1:v

4:1:v

2:1:u 3:1:w

4:1:v

In general, O(N(f+1)) msg’s

O(N2) for signed msg’s

Four Byzantine Generals: N = 4, f = 1 in a Synchronous DS

p1 (Commander)

p2 p3

1:v1:v

2:1:v3:1:u

Faulty processes

p4

1:v

4:1:v

2:1:v 3:1:w

4:1:v

p1 (Commander)

p2 p3

1:w1:u

2:1:u3:1:w

p4

1:v

4:1:v

2:1:u 3:1:w

4:1:v

p2 decides on majority(v,u,v) = vp4 decides on majority(v,v,w) = v

p2, p3, p4 decide on majority(u,v, w) =

Asynchronous system

Solutions to consensus and BG problem ( and to IC) exist in synchronous systems

No algorithm can guarantee to reach consensus in an asynchronous system, even with one process crash failure

In an asynchronous system, processes can respond to messages at arbitrary times – so a crashed process is indistinguishable from a slow one

There is always some continuation of the processes’ execution that avoids consensus being reached

Impossibility of (deterministic) consensus in asynchronous systems

M.J. Fischer, N. Lynch, and M. Paterson: “Impossibility of distributed consensus with one faulty process”, J. ACM, 32(2), pp. 374-382, 1985.

A crashed process cannot be distinguished from a slow one. - Not even with a 100% reliable comm. network !

There is always a chance that some continuation of theprocesses’ execution avoid consensus being reached.

Contd

Note the word “guarantee” in the statement of the impossibility result

The result does not mean that processes can never reach consensus in an asynchronous system if one is faulty – it allows that consensus can be reached with some probability greater than zero

For example, despite the fact that our systems are often effectively asynchronous, transaction systems have been reaching consensus regularly for many years

Documents

Distributed Algorithms: Agreement Protocols. Problems of Agreement l A set of processes need to agree on a value (decision), after one or more processes