CPSC 668Set 13: Clocks1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch

CPSC 668 Set 13: Clocks 1

CPSC 668Distributed Algorithms and Systems

Fall 2009

Prof. Jennifer Welch


Hardware Clocks

• Suppose processors have access to some approximation of real time.

• Mechanism is through hardware clocks, one at each processor.

• pi 's hardware clock HCi is modeled as a function from real times to clock times.

• Consider timed executions: associate a real time with each event (increasing).

• During pi 's computation event at real time t, the value of HCi(t) can be used as input to pi 's transition function.


Possible H/W Clock Properties• HCi is increasing

– a minimal property

• HCi(t) = number of steps taken by pi through real time t– easy to implement in software

• HCi(t) = t– perfect

• HCi(t) = t + ci

– h/w clock runs at same rate as real time but offset

• HCi(t) = ait + bi

– h/w clock drifts away from real time


Adjusted Clocks

• Clocks are particularly useful if they are synchronized.

• But typically hardware clocks cannot be changed.

• Instead, consider adjusted clock, obtained by adding some value to the hardware clock value:

ACi(t) = HCi(t) + adji(t)

• adji is adjustment variable of pi


Measuring Clock Differences

• How to evaluate how close together clocks are?

• Skew: how far apart clock times are at a given real time, or

• Precision: how far apart in real time clocks reach same clock time

• These are the same when there is no drift…


Skew and Precision

real time

clocktime

skew

ACi

ACj

precision

T

t


Synchronizing Clocks

If hardware clocks don't drift, then once clocks are adjusted, they stay the same distance apart.

Achieving -synchronized clocks:

• Termination: no processor assigns to its adj variable after some real time tf

-bounded skew: for all i and j, and all real times t ≥ tf, |ACi(t) - ACj(t)| ≤ .


Bounded Message Delays

• We'll study the clock synchronization problem in message passing with bounded delays.

• Define a timed execution to be admissible if:– every processor takes an infinite number of

steps (no failures)– every message has delay in the range

[d–u,d]; call u the uncertainty


Two Processor Algorithm

• Consider this simple algorithm:

• p0 uses its hardware clock as its adjusted clock

• p1 adopts (its best estimate of) p0's adjusted clock as its adjusted clock

• How does p1 do this? p0 sends its clock time to p1 in a message

• How to handle uncertain delay? Assume delay is in the middle of the range: d – u/2


Code for Two Processor Algorithm

p0:

adj0 := 0

send HC0 to p1

p1:

when receive T from p0:

adj1 := (T + d – u/2) – HC1


Analysis of Two Proc. Algorithm

• What is the skew attained by the algorithm?

• If message really did take d – u/2 time to arrive, skew is 0 (best case).

• If message took d or d – u time, skew is u/2 (worst case).

• Can we do better, perhaps with a more complicated algorithm?


Proving Lower Bounds on Skew

• A useful technique for proving lower bounds on skew for clock synchronization is that of shifting executions.

• To define it, we first need to look at some modeling issues.


Modeling Executions: Two Ways

• We've been modeling an execution as a sequence of events.

step by p2step by p0 step by p1



• An alternative approach is to model with a set of sequences, one sequence per processor.

p2

p0

p1



• Having one sequence per processor is technically convenient for lower bound proofs

• Can convert back and forth between the two modeling styles


Processor Views

• A view of processor pi is:

– an initial state of pi

– a sequence of events (computation and delivery) occurring at pi

– a hardware clock value for each event

• A timed view of pi is a view with a real time associated with each event (increasing)


Views vs. Timed Views

Two different timed views with the same (untimed) view:

11:15 11:20 11:45 11:52

3:00 3:05 3:10 4:00 h/w clock times

real times

8:08 9:00 9:10 10:10

3:00 3:05 3:10 4:00 h/w clock times

real times


Extracting Views from Executions

• Given a timed execution, straightforward to extract timed views for all the processors:– get initial state of a processor from the

initial configuration– get sequence of events occurring at that

processor and their times from the events in the execution


Merging Views into an Execution

Given a set of timed views, one per proc:

1. initial config is combination of initial states

2. obtain sequence of events by interleaving events from views in real-time order (break ties with ids)

3. apply events in order to initial config to obtain the other configs.


But is Result Admissible?

• The result might not be admissible.

• Biggest issue is the message delays: must be in range d – u to d.


Why Care About Views?

To prove lower bounds on skew:1. Start with a (carefully chosen) timed

execution2. Modify processors' views (in a carefully

chosen way)3. Merge resulting views to get a new

execution:• check that it is admissible• show that it violates some bound


Shifting Timed Executions

Given timed execution and real numbers x0, x1, …, xn-1,

shift(,(x0, x1, …, xn-1)) is created by:

• extracting timed views v0, …, vn-1 from

• adding xi to the real time of each event in each vi

• merging the resulting timed views


Shifting Examples

h/w clock times

real times

HCi(t) = T

t

h/w clock times

real times

HCi(t+x) = T

t + x

h/w clock times

real times

HCi(t+x) = T

t + x

shift bypositiveamount

shift bynegativeamount


Facts About Shifted Executions

Result of shifting and merging might not be admissible: could shift receipt of a message earlier than its sending, for example.

But these facts hold:

1 New hardware clock HC'i satisfies:

HC'i(t) = HCi(t – xi) = HCi(t) – xi

2 Delay of a msg from pi to pj goes from to

– xi + xj since msg is sent xi later and received xj later


Lower Bound for 2 Processors

• Let A be any 2-proc. alg that achieves -clock synchronization.

• Let be the timed admissible execution of A in which – every msg from p0 to p1 has delay d – u

– every msg from p1 to p0 has delay d

• After A terminates in ,

(1) AC0 ≥ AC1 –



p0

p1

d-ud

p0

p1

d-ud

shift p0 backwards by u



• Let ' = shift(,(–u,0)).

• Shift p0 earlier by u, leave p1 alone.

• In ',– every msg from p0 to p1 has delay d

– every msg from p1 to p0 has delay d – u

• After A terminates in ',

AC'1 ≥ AC'0 –



AC'1 ≥ AC'0 – implies

AC1 ≥ (AC0 + u) – since AC'1 = AC1 and

AC'0 = AC0 + uRemember inequality (1):

AC0 ≥ AC1 – ≥ (AC0 + u – ) – (from just above)

Implies ≥ u/2


Star Algorithm for n Processors

• Assume the network topology is a clique and message delay range for every edge is d – u to d.

• Pick one proc (say p0) and let every other proc try to adopt p0's clock using the 2-processor algorithm.

• Worst-case skew can be as large as u (one proc is u/2 behind p0's clock and another is u/2 ahead)


Improved Algorithm for n Processors

• All processors exchange h/w clock values.

• Each processor estimates the difference between its own h/w clock and that of each other processor.

• Each processor computes the average of the differences and sets its adj variable to the result


Code for Processor pi

initially diff[i] = 0

send HCi to all procs

when receive T from pj:

diffi[j] := (T + d – u/2) – HCi

when heard from all procs:

adji := (1/n)∑diffi[k]k = 0

n-1


Analysis of n-Processor Algorithm

• To bound the skew, start with

|ACi – ACj|• Then substitute the formula for each AC

from the code:

HCi + (1/n)∑diffi[k]• Then do some algebra (rearranging

terms and using properties of absolute value) to get…


Analysis of n-Processor Algorithm|ACi – ACj| ≤ (X + Y + Z)/n where• X = |diffj[i] – (HCi – HCj)|

error in pj's estimate of the difference between its own clock and pi's clock, at most u/2

• Y = |diffi[j] – (HCj – HCi)|

error in pi's estimate of the difference between its own clock and pj's clock, at most u/2

• Z = sum over all k other than i and j of

|diffi[k] – (HCk – HCi)| + |diffj[k] – (HCk – HCj)|

error in pi's estimate of pk's clock plus error in pj's estimate of pk's clock, at most u/2 + u/2 = u.


Analysis of n-Processor Algorithm

To finish up,

|ACi – ACj| ≤ (u/2 + u/2 + (n–2)u)/n

= u(1 – 1/n).


Lower Bound for n-Processor CS

Theorem (6.17): No algorithm can achieve -synchronized clocks for < u(1–1/n).

Proof: • Choose any algorithm A that achieves

-synchronized clocks.• Let be a timed admissible exec. s.t.

– every msg from pi to pj has delay d – u, i < j.

– every msg from pj to pi has delay d, i < j.


Example of Reference Execution

For n = 4, the message delays in can be represented schematically like this:

p0

p1

p2

p3

d-u

d-u

d-u

d-u

d-u

d-u

d

d

d

d

d

d


Additive Lemma

Lemma (6.18): ACk-1 ≤ ACk – u + , for all k.Proof:

Take and shift p0 through pk-1 earlier by u: ' = shift(,(–u,…, –u,0,…,0))

Verify that ' is admissible by checking that message delays are in range:– if sender and recipient were both shifted, then

delays are same as in – if one is shifted and other is not, then delays that

used to be d–u become d and delays that used to be d become d–u.


Example of Shifted Execution

p0

p1

p2

p3

d-u

d-u

d-u

d-u

d-u

d-u

d

d

d

d

d

d

p0

p1

p2

p3

d-u

d-u

d-u

d-u

d-u

d-u

d

d

d

d

d

d

shift p0 and p1 earlier by u


Additive Lemma Completed

• Since ' is admissible and algorithm achieves -synchronized clocks, after termination

ACk-1' ≤ ACk' +

• By shifting facts,

ACk-1' = ACk-1 + u and ACk' = ACk

• Thus ACk-1 ≤ ACk – u + .


Back to Main Lower Bound Proof

After termination in :

ACn-1 ≤ AC0 + by correctness of algorithm

≤ AC1 – u + 2 by Additive Lemma

≤ AC2 – 2u + 3 by Additive Lemma

…

≤ ACn-1 – (n–1)u + n by Additive Lemma

Thus ≥ u(1 – 1/n).


Message Delays in the Real World• In reality, message delays are not uniformly

distributed between a minimum and a maximum.• Typically the distribution has a spike close to the

minimum and a long tail going to infinity.• One approach to deal with the lack of a

maximum is to fix a "timeout" value d and consider any msg taking longer to be lost.

• But if d is chosen to be fairly large (to reduce the number of slow msgs incorrectly classified as lost), most msgs will take significantly less than d, and even significantly less than d – u/2.


Estimating Clock Differences

• Take advantage of small delays that occur most of the time.

• pi sends a query to pj, which pj answers immediately with its current clock value.

• When pi gets the response, it assumes pj's response took half the round trip time.

• If the round trip time is small, error is reduced compared to original approach.

• pi can query repeatedly until getting a round trip time that is "sufficiently" small.


Clock Drift

• Hardware clocks typically suffer from drift (gain or lose time).

• Usually the drift is bounded, though.• Bounded Drift: There exists > 0 such that

for all i, and all real times t1 and t2,

(1 + )–1(t2 – t1) ≤ HCi(t2) – HCi(t1)

≤ (1 + )(t2 – t1)

• That is, hardware clocks measure elapsed real time approximately correctly.


Hardware Clock Drift

For quartz crystal clocks, is about 10–6

hardwareclock HCi

real time t

HCi(t)

max slope< 1+

1+

min slope< (1+)-1 (1+)-1


Clock Synchronization with Drift

• When clocks can drift, processors must continually resynchronize. Two problems:

1. Establish: Get clocks close together.

2. Maintain: Keep clocks close together.

• We will focus on the maintenance problem, assuming clocks are initially within some B of each other.


Maintaining Clock Synchronization with DriftClock Agreement: There exists s.t. for all i

and j, and all real times t:

|ACi(t) – ACj(t)| ≤ Clock Validity: There exists > 0 s.t. for all i

and all real times t:

(1 + )–1(HCi(t) – HCi(0)) ≤ ACi(t) – ACi(0)

≤ (1 + )(HCi(t) – HCi(0))When taking the "long view", adjusted clocks

measure elapsed time approximately as well as the hardware clocks.


Byzantine Failures and Clock Synchronization

• Suppose up to f processors can exhibit Byzantine failures.

• Modify definition of maintaining clock synchronization with drift so that clock agreement and clock validity only need to hold for nonfaulty processors.

• To solve the problem, total number of processors n must satisfy n > 3f.


Lower Bound on Number of Processors• The n > 3f condition is also true of consensus.• The consensus problem and the clock

maintenance problem are similar.• Can we use the n > 3f bound for consensus via

a reduction?• No one knows how. Instead, we'll do a direct

proof, but using familiar ideas– scaling (similar to shifting)– specify faulty behavior with a big ring


Scaling Clocks

• Given a timed execution and a real number s > 0, scale(,s) is the result of multiplying every real time in by s.

• If s > 1, scaling causes clocks to slow down and delays to increase.

• If s < 1, scaling causes clocks to speed up and delays to decrease.


Scaling Example

real time

HC0(t) = 3t

HC1(t) = 4t

HC'0(t) = (3/2)t

HC'0(t) = 2t

scale by s = 2

2:00 3:00 4:00 6:00

6:00

12:00

6:00

12:00

delay = 1:00

delay = 2:00

p0

p1

p0

p1


Scaling Clocks

Lemma (13.1): In ' = scale(,s),

• HCi'(t) = HCi(t/s)

• ACi'(t) = ACi(t/s)

• if a msg has delay in , then it has delay s in '.

Lemma (13.2): If satisfies -clock agreement and -clock validity for a set of procs, then so does scale(,s).


Processor Lower Bound for CS

Assume

• f = 1 – extend to larger f with reduction

• u ≥ d(1 – (1 + )–4) – needed for calculations to work out– since is tiny, this is not a significant

restriction (uncertainty must be at least slightly larger than 0)


Processor Lower Bound for CS

• Assume in contradiction there is an algorithm (A,B,C) for n = 3 and f = 1 that achieves -clock agreement and -clock validity.

• Consider a ring of k processors, where– k is a multiple of 3– (1 + )2(k-1) > (1 + )2

• needed for the calculations to work out


Big Ringp0

p1

p2

p3

pi-1

pi

pi+1

pk-1

A

B

C

A

AB

C

C


Execution on Big Ring

p0

p1

p2

p3

pi-1

pi

pi+1

pk-1

A

B

C

A

AB

C

C

t(1+)

t(1+)1-2(i-1)t(1+)1-2i

t(1+)1-2(i+1)

t(1+)1-2(k-1) t(1+)-1

t(1+)-3

t(1+)-5

d(1+)0 = d

hardware clocks

message delays

d(1+)2

d(1+)-2d(1+)-4

d(1+)2i-2

d(1+)2i-4

d(1+)2k-6

local algorithms

and adj. varsare initially 0


Execution on Big Ring

• We cannot rely on satisfying the clock synch properties:– more than 3 processors– some h/w clock drift rates are out of range– some message delays are out of range

• However, we can make some deductions about how processors behave in : – show that pieces of the ring "look like" certain

systems in which the algorithm is supposed to be correct.


Behavior in Big Ring

Lemma (13.4): In , for all t:

a) |ACi(t) - ACi+1(t)| ≤ b) (1+)-1HCi(t) ≤ ACi(t) ≤ (1+ )(HCi(t))

Proof: Take pi and pi+1 from big ring and put them in a triangle in which 3rd processor is faulty and acts like the rest of the big ring. Call this execution .


Triangle Based on Big Ring

pipi+1

t(1+)1-2it(1+)1-2(i+1)

d(1+)2(i+1)-4

d(1+)2i-4d(1+)2(i+2)-4

acts like pi-1

toward pi in acts like pi+2

toward pi+1 in


Relationship of Triangle and Ring

Claim: pi and pi+1 behave the same in (the execution on the triangle with the Byzantine processor) as they do in (the execution on the big ring).


Scaled Triangle

Scale by (1 + )-2i to get ' :

pipi+1

t(1+)t(1+)-1

d(1+)-2

d(1+)-4d

acts like pi-1

toward pi in acts like pi+2

toward pi+1 in

≥ d - u by assump.


Relating the Three Executions

• Since ' is admissible, it satisfies -clock agreement and -clock validity for pi and pi+1.

• By Scaling Lemma (13.2), also satisfies those conditions for pi and pi+1.

• Since and look the same to pi and pi+1, also satisfies those conditions for pi and pi+1.


Finishing the Main Lower Bound

Referring back to ,

AC0(t) ≤ AC1(t) + by Lemma 13.4(a)

≤ AC2(t) + 2 by Lemma 13.4(a) …

≤ ACk-1(t) + (k-1) by Lemma 13.4(a)

So ACk-1(t) ≥ AC0(t) - (k-1) ≥ (1+)-1HC0(t) - (k-1) by Lemma 13.4(b)

= (1+)-1(1+)2(k-1)HCk-1(t) - (k-1)


Finishing the Main Lower Bound

From previous slide:

ACk-1(t) ≥ (1+)-1(1+)2(k-1)HCk-1(t) - (k-1)By Lemma 13.4(b):

ACk-1(t) ≤ (1+)HCk-1(t)

Combining and rearranging gives:

HCk-1(t) [(1+)-1(1+)2(k-1)- (1+)] ≤ (k-1)grows w/obound

positive, by assumption about k constant


Fault-Tolerant Clock Synchronization Algorithms• Continue to focus on maintenance

algorithms.• Assume clocks are initially close together

– different algorithms state this condition differently

• Processors resynchronize every P time units:– different algorithms have different

constraints on P.


A Fault-Tolerant CS Algorithm

[Welch & Lynch, 1988]• Assume adjusted clocks reach clock

time 0 within B real time of each other• Resynch every P time units; choose P

– large enough to avoid confusion between resynchronizations

– small enough to prevent skew due to drift from becoming too large


Code for a Processor

when AC = kP (k = 1, 2, …): send AC to all set timer for (1 + )(B + d) in the future

when receive T msg from pj: diff[j] := (T + d – u/2) – ACwhen timer goes off: adj := adj + midpoint(trim(f,diff)) clear diff array

discard f largestand f lowest values


Explanation of Timer Value

• Why wait (1 +)(B + d) time to collect messages?

• Want to hear from all nonfaulty processors before adjusting.– All nonfaulty procs will reach clock time kP within

B time of each other (true for k = 0 by assumption, shown by induction for k > 0)

– Maximum msg delay is d– Waiting B + d clock time might not be long enough

if your clock is fast. To be safe, wait extra factor of (1 + )


Clock Agreement

Claim: Nonfaulty clocks reach each kP within B real time of each other.– Proved by induction.

Claim: After adjusting their clocks in each resynch period, the new (nonfaulty) clocks reach kP within real time B/2 + u + O() of each other. See figure.– Proved using properties of the trim and midpoint

functions: difference is roughly halved.


Figure for Resynchronization

real time

kP

kP+(B+d)(1+)

(k+1)P

(k+1)P+(B+d)(1+)

at most B

at most B/2 + u + O()

ACiACj


Clock Agreement

• Due to drift, new clocks reach (k+1)P (start of next resynch) within real time B/2 + u + 2P of each other.

• B/2 + u + 2P ≤ B

implies B ≥ 2u + 4P

= 2u + O()

• So B cannot be any smaller than 2u plus terms of order .


Clock Agreement

Claim: The algorithm achieves -clock agreement, where

= B + u/2 + O()

Using the smallest possible B, the best this algorithm gives is

= 5u/2 + O().


Clock Validity• Paper analyzes drift of adjusted clocks

with respect to real time, not hardware clock time.

• Adjusted clock drift rate is calculated to be + O(1/P), as opposed to for the hardware clocks.– The more frequently the processors

resynchronize, the more they degrade the drift rate (tradeoff with Clock Agreement)

• Careful analysis for the version of clock validity given in textbook is open.

Documents

CPSC 668Set 13: Clocks1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch