50
Lecture 14 Synchronization (cont)

Lecture 14 Synchronization (cont). EECE 411: Design of Distributed Software Applications Logistics Project P01 deadline on Wednesday November 3 rd. Non-blocking

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

Lecture 14

Synchronization (cont)

EECE 411: Design of Distributed Software Applications

Logistics

Project P01 deadline on Wednesday November 3rd. Non-blocking IO lecture: Nov 4th. P02 deadline on Wednesday November 15th.

Next quiz Tuesday 16th.

EECE 411: Design of Distributed Software Applications

Roadmap Clocks can not be perfectly synchronized.

What can I do in these conditions?

Figure out how large is the drift Example: GPS systems

Design the system to take drift into account Example: server design to provide at-most-once

semantics Do not use physical clocks!

Consider only event order (1) Logical clocks (Lamport)

But this does not account for causality! (2) Vector clocks!

Mutual exclusion; leader election

EECE 411: Design of Distributed Software Applications

Last time: “Happens-before” relation

The happened-before relation on the set of events in a distributed system:

if a and b in the same process, and a occurs before b, then a → b

if a is an event of sending a message by a process, and b receiving same message by another process then a → b

Two events are concurrent if nothing can be said about the order in which they happened (partial order)

EECE 411: Design of Distributed Software Applications

Lamport’s logical clocks

Each process Pi maintains a local counter Ci and adjusts this counter according to the following rules:

For any two successive events that take place within process Pi, the counter Ci is incremented by 1.

Each time a message m is sent by process Pi the message receives a timestamp ts(m) = Ci

Whenever a message m is received by a process Pj, Pj adjusts its local counter Cj to max{Cj, ts(m)}; then executes step 1 before passing m to the application.

a b

c d

e f

m1

m2

21

3 4

51

p1

p2

p3

Physical time

EECE 411: Design of Distributed Software Applications

Updating Lamport’s logical timestamps

p 1

p 2

p 3

p 4

1

2

2

3

3

54

5

7

6

8

9 10

7

0

0

0

0

1

2

4

3 6

8

7

n Clock Value

Messagetimestamp

Physical Time

EECE 411: Design of Distributed Software Applications

Problem with Lamport logical clocks

Notation: timestamp(a) is the Lamport logical clock associated with event a

By definition if a b => timestamp(a) < timestamp(b) (if a happens before b, then Lamport_timestamp(a) < Lamport_timestamp(b))

Q: is the converse true? That is: if timestamp(a) < timestamp(b) => a b(If Lamport_timestamp(a) < Lamport_timestamp(b), it does NOT imply that a happens before b

EECE 411: Design of Distributed Software Applications

logically concurrent events

Example

p 1

p 2

p 3

p 4

1

2

2

3

3

54

5

7

6

8

9 10

7

0

0

0

0

1

2

4

3 6

8

7

n Clock Value

Messagetimestamp

Physical Time

Note: Lamport Timestamps: 3 < 7, but event with timestamp 3 is concurrent to event with timestamp 7, i.e., events are not in ‘happen-before’ relation.

EECE 411: Design of Distributed Software Applications

Causality

Timestamps don’t capture causality Example: news postings have multiple independent threads of messages

To model causality – use Lamport’s vector timestamps

Intuition: each item in vector logical clock for one causality thread.

EECE 411: Design of Distributed Software Applications

Vector Timestamps

a b

c d

e f

m1

m2

(2,0,0)(1,0,0)

(2,1,0) (2,2,0)

(2,2,2)(0,0,1)

p1

p2

p3

Physical time

EECE 411: Design of Distributed Software Applications

Vector clocks Each process Pi has an array VCi [1..n] of clocks (all initialized at

0) VCi [j] denotes the number of events that process Pi knows have

taken place at process Pj. Pi increments VCi [i]: when an event occurs or when sending

Vector value is the timestamp of the event

When sending Messages sent by VCi include a vector timestamp vt(m). Result: upon arrival, recipient knows Pi’s timestamp.

When Pj receives a message sent by Pi with vector timestamp ts(m):

for k ≠ j: updates each VCj [k] to max{VCj [k], ts(m)[k]} for k = j: VCj [k] = VCj [k] + 1

Note: vector timestamps require a static notion of system membership

Question: What does VCi[j] = k mean in terms of messages sent and received?

EECE 411: Design of Distributed Software Applications

Example: Vector Logical Time

p 1

p 2

p 3

p 4

0,0,0,0

Vector logical clock

Message(vector timestamp)

Physical Time

0,0,0,0

0,0,0,0

0,0,0,0

(1,0,0,0)

1,0,0,0

1,1,0,0

2,0,0,0

2,0,1,0

(2,0,0,0)

2,0,2,0

2,0,2,1

(2,0,2,0)

1,2,0,0

2,2,3,0

(1,2,0,0)

4,0,2,2

4,2,4,2

(4,0,2,2)

2,0,2,2

3,0,2,2

(2,0,2,2)

2,0,2,3

4,2,5,3

(2,0,2,3)

n,m,p,q

EECE 411: Design of Distributed Software Applications

Comparing vector timestamps

VT1 = VT2, (identical)

iff VT1[i] = VT2[i], for all i = 1, … , n VT1 ≤ VT2,

iff VT1[i] ≤ VT2[i], for all i = 1, … , n VT1 < VT2, (happens before relationship)

iff VT1 ≤ VT2 and

j (1 ≤ j ≤ n) such that VT1[j] < VT2 [j]

VT1 is concurrent with VT2

iff (not VT1 ≤ VT2 AND not VT2 ≤ VT1)

EECE 411: Design of Distributed Software Applications

Quiz like problem

Show: a b if and only if vectorTS(a) < vectorTS(b)

EECE 411: Design of Distributed Software Applications

Message delivery for group communication

ASSUMPTIONS messages are multicast to named process groups reliable and fifo channels (from a given source to a given

destination) processes don’t crash (failure and restart not considered) processes behave as specified e.g., send the same values to

all processes (i.e., we are not considering Byzantine behaviour)

applicationprocess

Messaging middleware

OS comms. interface

may reorder delivery to application by buffering messages

may specify delivery order to message servicee.g. total order, FIFO order, causal order (last time total order)

assume FIFO from each source (done at lower levels)

EECE 411: Design of Distributed Software Applications

[Last time] Totally Ordered Multicast

Process Pi sends timestamped message msgi to all others. The message itself is put in a local queue queuei.

Any incoming message at Pk is queued in queuek, according to its timestamp, and acknowledged to every other process.

Pk passes a message msgi to its application if: msgi is at the head of queuek

for each process Px, there is a message msgx in queuek with a larger timestamp.

Note: We are assuming that communication is reliable and FIFO ordered.

Guarantee: all multicasted messages in the same order at all destination.

Nothing is guaranteed about the actual order!

EECE 411: Design of Distributed Software Applications

FIFO multicast

Fifo or sender ordered multicast: Messages are delivered in the order they were sent (by any single sender)

a eP1

P2

P3

P4

EECE 411: Design of Distributed Software Applications

a

b c d

e

delivery of c to P1 is delayed until after b is delivered

Fifo or sender ordered multicast: Messages are delivered in the order they were sent (by any single sender)

FIFO multicast

P1

P2

P3

P4

EECE 411: Design of Distributed Software Applications

Implementing FIFO multicast

Basic reliable multicast algorithm has this property

Without failures all we need is to run it on FIFO channels (like TCP)

[Later: dealing with node failures]

EECE 411: Design of Distributed Software Applications

Causal multicast

Causal or happens-before ordering If send(a) send(b) then deliver(a) occurs before

deliver(b) at common destinations

a

b

P1

P2

P3

P4

EECE 411: Design of Distributed Software Applications

Ordering properties: Causal

a

b cdelivery of c to P1 is delayed until after b is delivered

Causal or happens-before ordering If send(a) send(b) then deliver(a) occurs before

deliver(b) at common destinations

P1

P2

P3

P4

EECE 411: Design of Distributed Software Applications

Ordering properties: Causal

a

b c

e

e is sent (causally) after b and c

Causal or happens-before ordering If send(a) send(b) then deliver(a) occurs before

deliver(b) at common destinations

d

e is sent concurrently with d

P1

P2

P3

P4

EECE 411: Design of Distributed Software Applications

Ordering properties: Causal

a

b c d

e

delivery of c to P1 is delayed until after b is delivereddelivery of e to P3 is delayed until after b&c are delivered

Causal or happens-before ordering If send(a) send(b) then deliver(a) occurs before

deliver(b) at common destinations

P1

P2

P3

P4

delivery of e and d to P2 and P3 in any relative order (concurrent)

EECE 411: Design of Distributed Software Applications

Causally ordered multicast

VC0=(2,2,0)

VC1=(1,1,0)

VC1=(1,2,0)

VC2=(1,0,1)

VC2=(1,2,2)

EECE 411: Design of Distributed Software Applications

Implementing causal order Start with a FIFO multicast We can strengthen this into a causal multicast

by adding vector time No additional messages needed!

Advantage: FIFO and causal multicast are asynchronous:

Sender doesn’t get blocked and can deliver a copy to itself without “stopping” to learn a safe delivery order

EECE 411: Design of Distributed Software Applications

So far … Physical clocks

Two applications Provide at-most-once semantics Global Positioning Systems

‘Logical clocks’ Where only ordering of events matters

Other coordination primitives Mutual exclusion Leader election

EECE 411: Design of Distributed Software Applications

Mutual exclusion algorithms

Problem: A number of processes in a distributed system want exclusive access to some resource.

Basic solutions: Via a centralized server. Completely decentralized Completely distributed, with no roles imposed. Completely distributed along a (logical) ring.

Additional objective: Fairness

EECE 411: Design of Distributed Software Applications

Mutual Exclusion: A Centralized Algorithm

a) Process 1 asks the coordinator for permission to enter a critical region. Permission is grantedb) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2

EECE 411: Design of Distributed Software Applications

Decentralized Mutual Exclusion

Principle: Assume the resource is replicated n times, with each replica having its own coordinator

Access requires a majority vote from m > n/2 coordinators. A coordinator always responds immediately to a request.

Assumption: When a coordinator crashes, it will recover quickly, but will have forgotten about permissions it had granted.

Correctness: probabilistic!

Issue: How robust is this system?

EECE 411: Design of Distributed Software Applications

Decentralized Mutual Exclusion (cont)

Principle: Assume every resource is replicated n times, with each replica having its own coordinator

Access requires a majority vote from m > n/2 coordinators. A coordinator always responds immediately to a request.

Issue: How robust is this system? p the probability that a coordinator resets (crashes and

recovers) in an interval Δt p = Δt /T, where T is the an average peer lifetime

Quiz—like question: what’s the probability to violate mutual exclusion?

EECE 411: Design of Distributed Software Applications

Decentralized Mutual Exclusion (cont)

Principle: Assume every resource is replicated n times, with each replica having its own coordinator

Access requires a majority vote from m > n/2 coordinators. A coordinator always responds immediately to a request.

Issue: How robust is this system? p the probability that a coordinator resets (crashes and

recovers) in an interval Δt p = Δt /T, where T is the an average peer lifetime

The probability that k out m coordinators reset during Δt P[k]=C(k,m)pk(1-p)m-k:

Violation when at least 2m-n coordinators reset

EECE 411: Design of Distributed Software Applications

EECE 411: Design of Distributed Software Applications

Mutual Exclusion: A Distributed Algorithm (Ricart & Agrawala)

Idea: Similar to Lamport ordered group communication except that acknowledgments aren’t sent. Instead, replies (i.e. grants) are sent only when:

The receiving process has no interest in the shared resource; or

The receiving process is waiting for the resource, but has lower priority (known through comparison of timestamps).

In all other cases, reply is deferred (results in some more local administration)

EECE 411: Design of Distributed Software Applications

Mutual Exclusion: A Distributed Algorithm (II)

a) Two processes (0 and 2) want to enter the same critical region at the same moment.

b) Process 0 has the lowest timestamp, so it wins.c) When process 0 is done, it sends an OK also, so 2 can now enter the critical

region.Question: Is a fully distributed solution, i.e. one without a coordinator, always more robust than any centralized coordinated solution?

EECE 411: Design of Distributed Software Applications

Mutual Exclusion: A Token Ring Algorithm

Principle: Organize processes in a logical ring, and let a token be passed between them. The one that holds the token is allowed to enter the critical region (if it wants to)

EECE 411: Design of Distributed Software Applications

Logistics

Project P01 deadline tomorrow. Project/Non-blocking IO lecture: Thursday P02 deadline on Wednesday November 15th.

Next quiz Tuesday 16th.

EECE 411: Design of Distributed Software Applications

So far … Physical clocks

Two applications Provide at-most-once semantics Global Positioning Systems

‘Logical clocks’ Where only ordering of events matters

Lamport clocks Vector clocks

Other coordination primitives Mutual exclusion Leader election: How do I choose a

coordinator?

EECE 411: Design of Distributed Software Applications

[Last time] Mutual exclusion algorithms

Problem: A number of processes in a distributed system want exclusive access to some resource.

Basic solutions: Via a centralized server. Completely decentralized (voting based) Completely distributed, with no roles imposed. Completely distributed along a (logical) ring.

Additional objectives: Fairness; no starvation

EECE 411: Design of Distributed Software Applications

Mutual Exclusion: Algorithm Comparison

AlgorithmMessages

per entry/exit

Delay before entry

(in message times)

Problems

CentralizedCoordinator

crash

DecentralizedStarvation,

low efficiency

DistributedCrash of any

process

Token ringLost token,

process crash

EECE 411: Design of Distributed Software Applications

Mutual Exclusion: Algorithm Comparison

AlgorithmMessages

per entry/exit

Delay before entry

(in message times)

Problems

Centralized 3Coordinator

crash

Decentralized3mk

(k=number of attempts)

Starvation, low efficiency

Distributed 2(n-1)Crash of any

process

Token ring 1..∞ Lost token, process crash

EECE 411: Design of Distributed Software Applications

Mutual Exclusion: Algorithm Comparison

AlgorithmMessages

per entry/exit

Delay before entry

(in message times)

Problems

Centralized 3 2Coordinator

crash

Decentralized3mk

(k=number of attempts)

2mStarvation,

low efficiency

Distributed 2(n-1) 2(n-1)Crash of any

process

Token ring 1..∞ 0 to n-1Lost token,

process crash

EECE 411: Design of Distributed Software Applications

So far … Physical clocks

Two applications Provide at-most-once semantics Global Positioning Systems

‘Logical clocks’ Where only ordering of events matters

Other coordination primitives Mutual exclusion Leader election: How do I choose a

coordinator?

EECE 411: Design of Distributed Software Applications

Leader election algorithms

Context: An algorithm requires that some process acts as a coordinator.

Question: how to select this special process dynamically.

Note: In many systems the coordinator is chosen by hand (e.g. file servers). This leads to centralized solutions single point of failure.

EECE 411: Design of Distributed Software Applications

Leader election algorithms

Context: Each process has an associated priority (weight). The process with the highest priority needs to be elected as the coordinator.

Issue: How do we find the ‘heaviest’ process?

Two important assumptions: Processes are uniquely identifiable All processes know the identity of all participating

processes

Traditional algorithm examples The bully algorithm Ring based algorithm

EECE 411: Design of Distributed Software Applications

Election by Bullying

Any process can just start an election by sending an election message to all other (heavier) processes

If a process Pheavy receives an election message from a lighter process Plight, it sends a take-over message to Plight. Plight is out of the race.

If a process doesn’t get a take-over message back, it wins, and sends a victory message to all other processes.

EECE 411: Design of Distributed Software Applications

The Bully Algorithm

Process 4 detects 7 has failed and holds an election

Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election (also send

message to 7 as they have not detected 7 failure)

EECE 411: Design of Distributed Software Applications

The Bully Algorithm (2)

d) Process 6 tells 5 to stope) Process 6 wins and announces itself everyone

EECE 411: Design of Distributed Software Applications

Election in a Ring

Principle: Organize processes into a (logical) ring. Process with the highest priority should be elected as coordinator.

Any process can start an election by sending an election message to its successor. If a successor is down, the message is passed on to the next successor.

If a message is passed on, the sender adds itself to the list.

The initiator sends a coordinator message around the ring containing a list of all living processes. The one with the highest priority is elected as coordinator.

EECE 411: Design of Distributed Software Applications

The Ring Algorithm

• Question: What happens if two processes initiate an election at the same time? Does it matter?

• Question: What happens if a process crashes during the election?

EECE 411: Design of Distributed Software Applications

Summary so far …

A distributed system is: a collection of independent computers that

appears to its users as a single coherent system

Components need to: Communicate

Point to point: sockets, RPC/RMI Point to multipoint: multicast, epidemic

Cooperate Naming to enable some resource sharing

Naming systems for flat (unstructured) namespaces: consistent hashing, DHTs

Naming systems for structured namespaces: EECE456 for DNS

Synchronization: physical clocks, logical clocks, mutual exclusion, leader election