Coordenaçao˜ Aula 9: Sincronizaçao de rel˜ ogios´ Aula ...lucas/teaching/mc714/... · MC714: Sistemas Distribu´ıdos Prof. Lucas Wanner Instituto de Computaçao, Unicamp˜

MC714: Sistemas Distribuıdos

Prof. Lucas Wanner

Instituto de Computacao, Unicamp

Coordenacao

Aula 9: Sincronizacao de relogiosAula 10: Relogios logicos e vetoriais

Aula 11: Exclusao mutuaAula 12: Eleicao de lider

Clock Synchronization

Physical clocksLogical clocksVector clocks

Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 2 / 47

Time and Frequency

Types of Information

Time of dayTime intervalFrequency

QuestionHow is time defined?

3 / 47

Time and Frequency

DefinitionSecond:

131,556,925.9747 of the tropical year for 1900.The duration of 9192631770 periods of the radiation corresponding to the transitionbetween the two hyperfine levels of the ground state of the cesium 133 atom

Frequency: Events per second (Hz).

GenerationOscillators can create signals that alternate periodically at a certain frequency

Inductor-Capacitor (LC circuit)Resistor-Capacitor (RC circuit)Crystal Oscillator

5 / 47

Crystal Oscillators

PrinciplesCrystal resonator that strains (expands or contracts) when a voltage is appliedWhen the voltage is reversed, the strain is reversedVoltage signal is taken from the resonator, amplified, and fed back to it.Rate of expansion and contraction is determined by size and cut of the crystal

Sources of innaccuracyCrystal oscillators may deviate from their nominal frequency

Cut (manufacturing)Environmental: temperature, pressure, vibrationAccuracy is measured in PPM–parts per million

6 / 47

Clock Innacuracies

Dimensionless frequency offset values can be converted to units of frequency (Hz) if the nominalfrequency is known. To illustrate this, consider an oscillator with a nominal frequency of 5 MHz and afrequency offset of

+

1.16

×

10

-

11

. To find the frequency offset in hertz, multiply the nominal frequencyby the offset:

(5

×

10

6

) (

+

1.16

×

10

-

11

)

=

5.80

×

10-5 = +0.0000580 Hz

Then, add the offset to the nominal frequency to get the actual frequency:

5,000,000 Hz + 0.0000580 Hz = 5,000,000.0000580 Hz

StabilityStability indicates how well an oscillator can produce the same time or frequency offset over a given timeinterval. It doesn’t indicate whether the time or frequency is “right” or “wrong,” but only whether it staysthe same. In contrast, accuracy indicates how well an oscillator has been set on time or on frequency. Tounderstand this difference, consider that a stable oscillator that needs adjustment might produce afrequency with a large offset. Or, an unstable oscillator that was just adjusted might temporarily producea frequency near its nominal value. Figure 17.7 shows the relationship between accuracy and stability.

Stability is defined as the statistical estimate of the frequency or time fluctuations of a signal over agiven time interval. These fluctuations are measured with respect to a mean frequency or time offset.Short-term stability usually refers to fluctuations over intervals less than 100 s. Long-term stability canrefer to measurement intervals greater than 100 s, but usually refers to periods longer than 1 day.

Stability estimates can be made in either the frequency domain or time domain, and can be calculatedfrom a set of either frequency offset or time interval measurements. In some fields of measurement,stability is estimated by taking the standard deviation of the data set. However, standard deviation only

FIGURE 17.6 A sample phase plot.

FIGURE 17.7 The relationship between accuracy and stability.

©2002 CRC Press LLC

ObservationTwo unsynchronized clocks will drift apart.

7 / 47

Physical clocks

ProblemSometimes we simply need the exact time, not just an ordering.

SolutionUniversal Coordinated Time (UTC):

Based on the number of transitions per second of the cesium 133 atom (pretty accurate).At present, the real time is taken as the average of some 50 cesium-clocks around the world.Introduces a leap second from time to time to compensate that days are getting longer.

NoteUTC is broadcast through short wave radio and satellite. Satellites can give an accuracy of about±0.5 ms.


Physical clocks

ProblemSuppose we have a distributed system with a UTC-receiver somewhere in it⇒ we stillhave to distribute its time to each machine.

Basic principleEvery machine has a timer that generates an interrupt H times per second.There is a clock in machine p that ticks on each timer interrupt. Denote the value ofthat clock by Cp(t), where t is UTC time.Ideally, we have that for each machine p, Cp(t) = t , or, in other words, dC/dt = 1.


Physical clocks

Fast

clo

ck

Perfe

ct clo

ck

Slow clock

Clock time, C

dCdt

> 1dCdt

= 1

dCdt

< 1

UTC, t

In practice: 1−ρ ≤ dCdt ≤ 1 + ρ.

GoalNever let two clocks in any system differ by more than δ time units⇒ synchronize at leastevery δ/(2ρ) seconds.


Global positioning system

Basic ideaYou can get an accurate account of time as a side-effect of GPS.

Height

x

(-6,6)

r = 10

(14,14)r = 16

Point to be ignored



ProblemAssuming that the clocks of the satellites are accurate and synchronized:

It takes a while before a signal reaches the receiverThe receiver’s clock is definitely out of synch with the satellite



Principal operation

∆r : unknown deviation of the receiver’s clock.xr , yr , zr : unknown coordinates of the receiver.Ti : timestamp on a message from satellite i∆i = (Tnow −Ti ) + ∆r : measured delay of the message sent by satellite i .Measured distance to satellite i : c×∆i(c is speed of light)Real distance is

di = c∆i −c∆r =√

(xi −xr )2 + (yi −yr )2 + (zi −zr )2

Observation4 satellites⇒ 4 equations in 4 unknowns (with ∆r as one of them)


Clock synchronization principles

Principle IEvery machine asks a time server for the accurate time at least once every δ/(2ρ)seconds (Network Time Protocol).

NoteOkay, but you need an accurate measure of round trip delay, including interrupt handlingand processing incoming messages.


Clock synchronization principles

Principle IILet the time server scan all machines periodically, calculate an average, and inform eachmachine how it should adjust its time relative to its present time.

NoteOkay, you’ll probably get every machine in sync. You don’t even need to propagate UTCtime.

FundamentalYou’ll have to take into account that setting the time back is never allowed⇒ smoothadjustments.


Cristian’s algorithm

Assuming we have a process P a server S with UTC

Algorithm1 P requests the time from S and measures T1 locally2 After receiving the request from P, S prepares a response and appends the time Ts

from its own clock.3 P receives the response, measures T2 locally and then sets its time to be

Tnew = Ts + (T2−T1)/2

ProblemsSingle point of failure / bottleneckImpostor or faulty server

16 / 47

NTP – Network Time Protocol

17 / 47


Algorithm1 A reads local timestamp t0, sends to A2 B reads local timestamp t1 at reception3 B reads local timestamp t2 at transmision, sends t1 and t2 to B4 B computes round trip delay δ and offset θ .

18 / 47


Round trip delay

δ = (t3− t0)− (t2− t1)

Offset

θ =(t1− t0) + (t2− t3)

2

19 / 47

The Happened-before relationship

ProblemWe first need to introduce a notion of ordering before we can order anything.

The happened-before relation

If a and b are two events in the same process, and a comes before b, then a→ b.If a is the sending of a message, and b is the receipt of that message, then a→ bIf a→ b and b→ c, then a→ c

NoteThis introduces a partial ordering of events in a system with concurrently operating processes.


Logical clocks

ProblemHow do we maintain a global view on the system’s behavior that is consistent with thehappened-before relation?

SolutionAttach a timestamp C(e) to each event e, satisfying the following properties:

P1 If a and b are two events in the same process, and a→ b, then we demand that C(a) < C(b).P2 If a corresponds to sending a message m, and b to the receipt of that message, then also

C(a) < C(b).

ProblemHow to attach a timestamp to an event when there’s no global clock⇒ maintain a consistent set oflogical clocks, one per process.


Logical clocks

SolutionEach process Pi maintains a local counter Ci and adjusts this counter according to the followingrules:

1: For any two successive events that take place within Pi , Ci is incremented by 1.2: Each time a message m is sent by process Pi , the message receives a timestamp ts(m) = Ci .3: Whenever a message m is received by a process Pj , Pj adjusts its local counter Cj to

max{Cj , ts(m)}; then executes step 1 before passing m to the application.

Notes

Property P1 is satisfied by (1); Property P2 by (2) and (3).It can still occur that two events happen at the same time. Avoid this by breaking ties throughprocess IDs.


Logical clocks – example

0 6 12 18 24 30 36 42 48 54 60

0 8

16 24 32 40 48 56 64 72 80

0 10 20 30 40 50 60 70 80 90

100

m1

m2

m3

m4

0 6

12 18 24 30 36 42 48 70 76

0 8 16 24 32 40 48 61 69 77 85

0 10 20 30 40 50 60 70 80 90

100

m1

m2

m3

m4

P adjusts its clock

P adjusts its clock

(b)(a)

P1 P2 P3 P1 P2 P3

2

1


Logical clocks – example

NoteAdjustments take place in the middleware layer

Application layer

Middleware layer

Network layer

Message is delivered to application

Adjust local clock

Message is received

Adjust local clock and timestamp message

Application sends message

Middleware sends message


Example: Totally ordered multicast

ProblemWe sometimes need to guarantee that concurrent updates on a replicated database are seen inthe same order everywhere:

P1 adds $100 to an account (initial value: $1000)P2 increments account by 1%There are two replicas

Update 1 Update 2

Update 1 isperformed before

update 2

Update 2 isperformed before

update 1

Replicated database

ResultIn absence of proper synchronization:replica #1← $1111, while replica #2← $1110.Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 25 / 47

Example: Totally ordered multicast

Solution

Process Pi sends timestamped message msgi to all others. The message itself is put in alocal queue queuei .Any incoming message at Pj is queued in queuej , according to its timestamp, andacknowledged to every other process.

Pj passes a message msgi to its application if:

(1) msgi is at the head of queuej(2) for each process Pk , there is a message msgk in queuej with a larger timestamp.

NoteWe are assuming that communication is reliable and FIFO ordered.


Vector clocks

ObservationLamport’s clocks do not guarantee that if C(a) < C(b) that a causally preceded b

0 6

12 18 24 30 36 42 48 70 76

0 8 16 24 32 40 48 61 69 77 85

0 10 20 30 40 50 60 70 80 90

100

m1m2

m3

m5

m4

P1 P2 P3

ObservationEvent a: m1 is received at T = 16;Event b: m2 is sent at T = 20.

NoteWe cannot conclude that a causally precedes b.


Vector clocks

Solution

Each process Pi has an array VCi [1..n], where VCi [j] denotes the number of events thatprocess Pi knows have taken place at process Pj .When Pi sends a message m, it adds 1 to VCi [i], and sends VCi along with m as vectortimestamp vt(m). Result: upon arrival, recipient knows Pi ’s timestamp.When a process Pj delivers a message m that it received from Pi with vector timestampts(m), it

(1) updates each VCj [k ] to max{VCj [k ], ts(m)[k ]}(2) increments VCj [j] by 1.

QuestionWhat does VCi [j] = k mean in terms of messages sent and received?


Vector Clocks

Comparing events with Vector ClocksVC1 ≤ VC2 iff VC1[i]≤ VC2[i] ∀i = 1, ...,NVC1 < VC2 iffVC1 ≤ VC2 &∃j such that 1≤ j ≤ N and VC1[j] < VC2[j]VC1 and VC2 are causally related if VC1 < VC2VC1|||VC2 iff VC1 � VC2 and VC2 � VC1VC1 and VC2 are concurrent if VC1|||VC2

29 / 47

Causally ordered multicasting

ObservationWe can now ensure that a message is delivered only if all causally preceding messageshave already been delivered.

AdjustmentPi increments VCi [i] only when sending a message, and Pj “adjusts” VCj when receivinga message (i.e., effectively does not change VCj [j]).

Pj postpones delivery of m until:

ts(m)[i] = VCj [i] + 1.ts(m)[k ]≤ VCj [k ] for k 6= i .


Causally ordered multicasting

Example

P0

P1

P2

VC = (0,0,0)2 VC = (1,0,0)2

VC = (1,1,0)1

VC = (1,0,0)0 VC = (1,1,0)0

VC = (1,1,0)2

m

m*

ExampleTake VC2 = [0,2,2], ts(m) = [1,3,0] from P0. What information does P2 have, and whatwill it do when receiving m (from P0)?


Mutual exclusion

ProblemA number of processes in a distributed system want exclusive access to some resource.

Basic solutionsVia a centralized server.Completely decentralized, using a peer-to-peer system.Completely distributed, with no topology imposed.Completely distributed along a (logical) ring.


Mutual exclusion: centralized

(a) (b) (c)

0 0 01 1 1

3 3 3

2 2

2

2

RequestRequest ReleaseOK

OK

Coordinator

Queue isempty

No reply


Decentralized mutual exclusion

PrincipleAssume every resource is replicated n times, with each replica having its own coordinator⇒ access requires a majority vote from m > n/2 coordinators. A coordinator alwaysresponds immediately to a request.

AssumptionWhen a coordinator crashes, it will recover quickly, but will have forgotten aboutpermissions it had granted. In the book: robustness analysis showing low probability ofviolation under reasonable availability assumptions.


Mutual exclusion Ricart & Agrawala

Principle

The same as Lamport except that acknowledgments aren’t sent. Instead, replies (i.e. grants) aresent only when

The receiving process has no interest in the shared resource; orThe receiving process is waiting for the resource, but has lower priority (known throughcomparison of timestamps).In all other cases, reply is deferred, implying some more local administration.

0 0 0

1 1 12 2 2

8

88 12

12

12

OK OK

OK

OK

Accesses resource

Accesses resource

(a) (b) (c)


Mutual exclusion: Token ring algorithm

EssenceOrganize processes in a logical ring, and let a token be passed between them. The onethat holds the token is allowed to enter the critical region (if it wants to).

1

00

2

3

4

5

6

7

2 4 7 1 6 53

(a) (b)


Mutual exclusion: comparison

Algorithm # msgs per Delay before entry Problemsentry/exit (in msg times)

Centralized 3 2 Coordinator crashDecentralized 2mk + m, k = 1,2,... 2mk Starvation, low efficiencyDistributed 2 (n – 1) 2 (n – 1) Crash of any processToken ring 1 to ∞ 0 to n – 1 Lost token, process crash


Election algorithms

Principle

An algorithm requires that some process acts as a coordinator. The question is how to select thisspecial process dynamically.

NoteIn many systems the coordinator is chosen by hand (e.g. file servers). This leads to centralizedsolutions⇒ single point of failure.

QuestionIf a coordinator is chosen dynamically, to what extent can we speak about a centralized ordistributed solution?

QuestionIs a fully distributed solution, i.e. one without a coordinator, always more robust than anycentralized/coordinated solution?Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 38 / 47

Election by bullying

PrincipleEach process has an associated priority (weight). The process with the highest priorityshould always be elected as the coordinator. Issue How do we find the heaviest process?

Any process can just start an election by sending an election message to all otherprocesses (assuming you don’t know the weights of the others).If a process Pheavy receives an election message from a lighter process Plight, itsends a take-over message to Plight. Plight is out of the race.If a process doesn’t get a take-over message back, it wins, and sends a victorymessage to all other processes.


Election by bullying

1

2

4

0

5

6

3

7

1

2

4

0

5

6

3

7

1

2

4

0

5

6

3

7

1

2

4

0

5

6

3

7

Election

Ele

ctio

nElection

Election

OK

OK

Previous coordinatorhas crashed

Electio

n

Election

1

2

4

0

5

6

3

7

OKCoordinator

(a) (b) (c)

(d) (e)


Election in a ring

PrincipleProcess priority is obtained by organizing processes into a (logical) ring. Process with thehighest priority should be elected as coordinator.

Any process can start an election by sending an election message to its successor. Ifa successor is down, the message is passed on to the next successor.If a message is passed on, the sender adds itself to the list. When it gets back to theinitiator, everyone had a chance to make its presence known.The initiator sends a coordinator message around the ring containing a list of allliving processes. The one with the highest priority is elected as coordinator.


Election in a ring

QuestionDoes it matter if two processes initiate an election?

QuestionWhat happens if a process crashes during the election?


Superpeer election

IssueHow can we select superpeers such that:

Normal nodes have low-latency access to superpeersSuperpeers are evenly distributed across the overlay networkThere is be a predefined fraction of superpeersEach superpeer should not need to serve more than a fixed number of normal nodes


Superpeer election

DHTsReserve a fixed part of the ID space for superpeers. Example: if S superpeers areneeded for a system that uses m-bit identifiers, simply reserve the k = dlog2 Se leftmostbits for superpeers. With N nodes, we’ll have, on average, 2k−mN superpeers.

Routing to superpeerSend message for key p to node responsible for p AND 11 · · ·11︸︷︷︸

k

00 · · ·00︸︷︷︸m−k


Exercıcios

1 Considere dois servidores em um sistema distribuıdo. Ambos tem relogios quedeveriam contar 1000 ticks a cada milissegundo. Um deles tem um relogio preciso, eo outro tem um relogio que conta apenas 990 ticks a cada milissegundo. Depois deuma hora, qual sera o desvio de tempo entre os dois servidores?

2 Quando uma maquina sincroniza o seu relogio com uma outra maquina, em geral euma boa ideia manter um historico de medicoes. Porque? De um exemplo onde estehistorico pode ser usado para melhorar o processo de sincronizacao?

3 Explique como relogios logicos (Lamport) funcionam.4 Explique como multicast totalmente ordenado pode ser implementado com relogios

logicos.5 Crie uma estrategia alternativa para multicast totalmente ordenado.6 No multicast completamente ordenado com relogios logicos (Lamport), e

estritamente necessario que a recepcao de cada mensagem seja confirmada (ACK)?

45 / 47

Revisao: Exercıcios

7 Escreva um algoritmo para exclusao mutua decentralizada.8 Considerando seus vetores de relogio, responda se os eventos abaixo sao

concorrentes (a ||| b), se a precede b (a→ b) ou b precede a (b→ a).a(4, 3, 1, 1) b(1, 3, 1, 2)a(4, 3, 1, 1) b(1, 3, 1, 1)a(4, 3, 1, 1) b(4, 4, 1, 2)a(4, 3, 1, 1) b(5, 4, 1, 2)a(4, 3, 1, 1) b(1, 1, 1, 5)

46 / 47

Exercıcios

9 Quatro processos falharam, mas deixaram logs possivelmente incompletos de suasexecucoes. Cada coluna representa o log de um processo, em que estao registradoseventos internos (e) e de envio (send) e recepcao (rec) de mensagens, com o relogiodo evento entre parenteses:

P0 P1 P2 P3e(0) e(0) e(0) e(0)send m1 to P1 (1) rec m1 from P0 (2) e(1) send m3 to P1(1)e(2) send m2 to P2 (3) rec m2 from P1 (4) e(2)e(3) rec m3 from P3 (4) send m4 to P1 (5) rec m5 from P0 (5)

Um corte C do estado de execucao de multiplos processos e consistente se paratodos eventos a, b, (b ∈ C∧a→ b)⇒ (a ∈ C). Intuitivamente, se um evento b e partede um corte, entao todos os eventos que aconteceram antes de b tambem devemser parte do corte.Apresente o diagrama espaco-tempo referente aos logs acima e indique seterminam em um corte consistente. Caso contrario, qual(is) evento(s) precisa(m) serdescartado(s) para se conseguir um estado consistente?

47 / 47

Documents

Coordenaçao˜ Aula 9: Sincronizaçao de rel˜ ogios´ Aula ...lucas/teaching/mc714/... · MC714: Sistemas Distribu´ıdos Prof. Lucas Wanner Instituto de Computaçao, Unicamp˜