Distributed Computing 5. Snapshot Shmuel Zaks [email protected] ©

Distributed Computing 5. Snapshot

Shmuel [email protected]

©

2

The snapshot algorithm (Candy and Lamport)

3

4

5

Goal: design a snapshot (=global-state-detection) algorithm that:

will record a collection of states of all system components (which forms a global system state),

will not change the underlying computation,

will not freeze the underlying computation

6

A Process Can… record its own state, send and receive messages, record messages it sends and receives,

cooperate with other processes

Processes do not share clocks or memory

Processes cannot record their state

precisely at the same instant

7

Motivation

Many problems in distributed systems can be stated in terms of the problem of detecting global states:

Stable property detection problems : termination detection, deadlock detection etc.

Checkpointing

8

Stable Property Detection Problem

D - distributed systemy - a predicate function defined on the set of global states of DS, S’ – global states of D

y is stable if y(S) implies y(S’) for all S’ reachable from S

many distributed algorithms are structured as a sequence of phases

A phase: transient part, then a stable part

phase termination vs. computation termination

our view on the problem:i. detect the termination of a phaseii. initiate a new phase

Notice that “the kth phase has terminated” is a stable property

9

10

Model

Distributed system D is a finite, labeled, directed graph.

p q

C2

C1

Channels have infinite buffers, are error-free and preserve FIFO

Message delay is bounded, but unknown

11

State of a Channel

1p q

C1

23 1

[1, 2, 3] – sequence X of messages that were sent

[1] – sequence Y of received messages ( prefix of X )

[2, 3] – state of C1: X \ Y

p q

C2

C1

12

Example: System

Distributed system: pC2

C1

Initial global state: B A Ø

Ø

State transitions

(same for p and q):

A Bsend

receive

q

13

A A

Ø

A A

Ø

A B Ø

Ø

B A

Ø

Ø

A computation corresponds to a path in the diagram

p q qp

p sends

q receives

q sends

p receives q sends

C1

pC2

q

deterministic

A B

send

receive

Global state transition diagram

14

Distributed system:

State transition: p :

q : C Dsend

receive

A Bsend

receive

p

C2

C1

q

Example: System

15

qp

C2

C1

A D Ø

B C Ø

B D

A C Ø

Ø

p q qp

p sends

q sends

p receives

Global state transition diagram

q re

ceiv

es

non-deterministic

q sends

A Bsend

receiveC D

send

receive

q receives

16

qp

C2

C1

A D Ø

B C Ø

B D

A C Ø

Ø

p q qp

p sends

q sends

p receives

We look at the following sequence of events:

A Bsend

receiveC D

send

receive

17

Each process records its own statep and q cooperate to record the state of

C.

pC

q

in the snapshot algorithm:

18

B A Ø

p q

Example: System

A A

A A

Recorded state:

pC

q

Ø

No token

C1

pC2

qA B

send

receive

Record C

Record qRecord p

19

B A

Ø

Ø

p q

Example: System

B A

A A

Ø

Recorded state:

pC1

q

Two tokens

Record p

Record CRecord q

C1

pC2

qA B

send

receive

C’s state recorded

time

P sends a message on C

P’s state recorded

C’s state recorded

P sends a message on C

P’s state recorded

20

Record p

Record CRecord q

Record C

Record qRecord p

21

q will record the state of C

q starts recording C after it records its state

pC

q

p and q have to coordinate ; using a special

marker

q stops when receiving from p

But: how does q know when to record its state?

22

Who starts?

We assume one process.

The snapshot algorithm

Hw: extend discussion + proof to any number of startes.

Who will record the state of channel C? q

How q knows when to stop recording?

p sends right after it records its state, and before sending any other message

q starts recording after it records its state

(Intuition for the Algorithm)

pC

q

23

24


Ends when q receives along C

Starts when q records itself

channel recordingp

Cq

Note : for any q p0, the channel along which arrived first is recorded as

25

p0 starts.


p0 recoreds its state, and then broadcasts .

Shout-algorithm = PI (Propogation-of-information)= hot potato = … When q receives for the first time, it

records its own state

State recording

26

1. record the state of p2. send along c before sending any other messageMarker-Receiving Rule for a process q

if q’s state is not recorded: 1. record state; 2. record c’s state = ;else: c’s state is the sequence of messages received since q recorded its state


on receiving along channel c:

Marker-Sending Rule for a process q

Termination

Assumption No marker remains forever in an input channel

Claim: If the graph is strongly connected and at least one process records its state, then all processes will record their state in finite time

Proof: by induction

27

28

The Recorded Global State

State transition: p :

q : C Dsend

receive

A Bsend

receive

p

C2

C1

q

Ex: System

29

A D

B C

B D

A C

p q qp

p sends

q sends

p receives

A D

qp

C2

C1A Bsend

receiveC D

send

receive

A

30

What did we get?

31

Event e in process p is an atomic action: can change the state of p, and a state of at most one channel c incident on p (by sending/receiving message M along c )

e is defined by < p, s, s’, M, c > e =<p, s, s’, M, c> may occur in global state S

if 1. the state of p in S is s. 2 a. if c is directed towards p: c’s state has M in its head, and is deleted after applying e . b. if c is directed from p: c’s state has M in its tail after applying e . 3. the state of p after applying e is s’.

32

Process State and Global State A process: set of states, an initial state set of events A global state S: collection of process

states and channel states initially, each process is in its initial state and

all channels are empty

next(S, e) is the global state after event e in applied to global state S

33

Process State and Global State

seq = (ei : i = 0…n) is a computation of the system iff

ei may occur in Si , 0 i n

Si+1 = next(Si, ei)

(S0 is the initial global

state)

34

seq = (ei: i ≥ 0) a distributed computation

Si – the state of the system right before ei occurs

S0 – the initial state of the system

St – the state of the system at the termination of

the algorithm

S* - the recorded global state

The Recorded Global State

35

Definition Event ej is called pre-recording if ej is in a process p and p records its state after ej in seq .Event ej is called post-recording if ej is in a process p and p records its state before ej in seq .Assume that ej-1 is a post-recording event before Pre-recording event ej in seq.

pre-recording

post-recording

36

Lemma:

Proof: ej-1 occurs in p and ej in q , and q ≠p(since ej-1 is and ej is .)

1

1

1 2

3

1 3 3 4

24

I f , then

. canbe applied in ,say ,

. canbe applied in ,say , and

c. S =S .

j j

j

j

e e

je

je

S S S

a e S S S

b e S S S

37

The only scenario that might prevent interchanging the two events is that a message M is sent at ej-1 and received at ej .

but this cannot be possible: if M is sent at ej-1 , then M is , so a marker was sent to q before M, so when it is received in ej q already recorded its state, so ej Is ,a , a contradiction.

38

Hence, event ej can occur in global state Sj-1. The state of process p is not altered by ej, hence ej-1 can occur after ej.

39

We have to show that the states of all Processes and channels are the same in S2 and S4 .This clearly holds for proceses and channels That do not take part in ej-1 and ej .

40

states: the states of p and q in S2 and in S4 are the same.

channels: whether ej-1/ej send/receive(/neither) a message along a channel, the same is done in both scenarios, So the states of the channels in S2 and S4 are the same.(End of proof. )

(The Recorded Global State)

j '

jj '

j '

, where

1.

seq' = (e : j 0)

j < i j t : e = e

(e | i2. j <t)

: Given an execution seq, and an

output of the snapshot algorithm S*, there

exists a computation

For all or

The subseq

Theorem

uence

j

jj '

k

(e | i j <t)

j < i j t : S = 3

4. , such that

S

k, i k t

S * =

.

S

is a

permutation of the subsequence

For all or

There exists

42

Proof Using the lemma, swap the events till all events appear after all events. The acquired computation is seq’. All that is left to show: S* is a global state after all events and before all events.1. Process states2. Channel states

43

Claim: The state of a channel in S* is(sequence of messages corresp. to pre-recorded receives)-(sequence of messages corresp. to prerecorded sends) Proof: The state of channel c from process p to process q recorded in S* is the sequence of messages received on c by q after q records its state and before q receives a marker on c. The sequence of messages sent by p is the sequence corres. to prerecording sends on c.

44

A D

B C

D

A C

p q qp

p sends

q sends

p receives

A D

B

post

pre

post

qp

C2

C1A Bsend

receiveC D

send

receive

45

A D

A D

D

A C

p q qp

q sends

p sends

p receives

A D

A

(Another execution)

pre

post

post

B

qp

C2

C1A Bsend

receiveC D

send

receive

What did we get?

A configuration that could have happened

46

seq = (ei: i ≥ 0) a distributed computation

Si – the state of the system right before ei occurs

S0 – the initial state of the system

St – the state of the system at the termination of

the algorithm

S* - the recorded global state

47

Stable Detection

D - distributed systemy - a predicate function defined on the set of global states of DS, S’ – global states of D

y is a stable property of D if y(S) implies y(S’) for all S’ reachable from S

48

49

Input: A stable property yOutput: a boolean value b with the property: y(S0) b and b y(St)

Algorithm

Algorithm: begin

record a global state S* b := y(S*) end

50

Correctness 1. S* is reachable from S0

2. St is reachable from S*3. y(S) y(S’) for all S’ reachable from S

S0 S* St

y(S*)=true y(St)=true

y(S*)=false

y(S0)=false

References

K. M. Chandy and L. Lamport,Distributed Snapshots:Determining Global States of Distributed Systems

51

Documents

Distributed Computing 5. Snapshot Shmuel Zaks [email protected] ©