51
Distributed Computing 5. Snapshot Shmuel Zaks [email protected] ©

Distributed Computing 5. Snapshot Shmuel Zaks [email protected] ©

Embed Size (px)

Citation preview

Page 1: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

Distributed Computing 5. Snapshot

Shmuel [email protected]

©

Page 2: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

2

The snapshot algorithm (Candy and Lamport)

Page 3: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

3

Page 4: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

4

Page 5: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

5

Goal: design a snapshot (=global-state-detection) algorithm that:

will record a collection of states of all system components (which forms a global system state),

will not change the underlying computation,

will not freeze the underlying computation

Page 6: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

6

A Process Can… record its own state, send and receive messages, record messages it sends and receives,

cooperate with other processes

Processes do not share clocks or memory

Processes cannot record their state

precisely at the same instant

Page 7: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

7

Motivation

Many problems in distributed systems can be stated in terms of the problem of detecting global states:

Stable property detection problems : termination detection, deadlock detection etc.

Checkpointing

Page 8: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

8

Stable Property Detection Problem

D - distributed systemy - a predicate function defined on the set of global states of DS, S’ – global states of D

y is stable if y(S) implies y(S’) for all S’ reachable from S

Page 9: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

many distributed algorithms are structured as a sequence of phases

A phase: transient part, then a stable part

phase termination vs. computation termination

our view on the problem:i. detect the termination of a phaseii. initiate a new phase

Notice that “the kth phase has terminated” is a stable property

9

Page 10: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

10

Model

Distributed system D is a finite, labeled, directed graph.

p q

C2

C1

Channels have infinite buffers, are error-free and preserve FIFO

Message delay is bounded, but unknown

Page 11: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

11

State of a Channel

1p q

C1

23 1

[1, 2, 3] – sequence X of messages that were sent

[1] – sequence Y of received messages ( prefix of X )

[2, 3] – state of C1: X \ Y

p q

C2

C1

Page 12: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

12

Example: System

Distributed system: pC2

C1

Initial global state: B A Ø

Ø

State transitions

(same for p and q):

A Bsend

receive

q

Page 13: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

13

A A

Ø

A A

Ø

A B Ø

Ø

B A

Ø

Ø

A computation corresponds to a path in the diagram

p q qp

p sends

q receives

q sends

p receives q sends

C1

pC2

q

deterministic

A B

send

receive

Global state transition diagram

Page 14: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

14

Distributed system:

State transition: p :

q : C Dsend

receive

A Bsend

receive

p

C2

C1

q

Example: System

Page 15: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

15

qp

C2

C1

A D Ø

B C Ø

B D

A C Ø

Ø

p q qp

p sends

q sends

p receives

Global state transition diagram

q re

ceiv

es

non-deterministic

q sends

A Bsend

receiveC D

send

receive

q receives

Page 16: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

16

qp

C2

C1

A D Ø

B C Ø

B D

A C Ø

Ø

p q qp

p sends

q sends

p receives

We look at the following sequence of events:

A Bsend

receiveC D

send

receive

Page 17: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

17

Each process records its own statep and q cooperate to record the state of

C.

pC

q

in the snapshot algorithm:

Page 18: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

18

B A Ø

p q

Example: System

A A

A A

Recorded state:

pC

q

Ø

No token

C1

pC2

qA B

send

receive

Record C

Record qRecord p

Page 19: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

19

B A

Ø

Ø

p q

Example: System

B A

A A

Ø

Recorded state:

pC1

q

Two tokens

Record p

Record CRecord q

C1

pC2

qA B

send

receive

Page 20: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

C’s state recorded

time

P sends a message on C

P’s state recorded

C’s state recorded

P sends a message on C

P’s state recorded

20

Record p

Record CRecord q

Record C

Record qRecord p

Page 21: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

21

q will record the state of C

q starts recording C after it records its state

pC

q

p and q have to coordinate ; using a special

marker

q stops when receiving from p

But: how does q know when to record its state?

Page 22: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

22

Who starts?

We assume one process.

The snapshot algorithm

Hw: extend discussion + proof to any number of startes.

Page 23: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

Who will record the state of channel C? q

How q knows when to stop recording?

p sends right after it records its state, and before sending any other message

q starts recording after it records its state

(Intuition for the Algorithm)

pC

q

23

Page 24: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

24

The snapshot algorithm

Ends when q receives along C

Starts when q records itself

channel recordingp

Cq

Note : for any q p0, the channel along which arrived first is recorded as

Page 25: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

25

p0 starts.

The snapshot algorithm

p0 recoreds its state, and then broadcasts .

Shout-algorithm = PI (Propogation-of-information)= hot potato = … When q receives for the first time, it

records its own state

State recording

Page 26: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

26

1. record the state of p2. send along c before sending any other messageMarker-Receiving Rule for a process q

if q’s state is not recorded: 1. record state; 2. record c’s state = ;else: c’s state is the sequence of messages received since q recorded its state

The snapshot algorithm

on receiving along channel c:

Marker-Sending Rule for a process q

Page 27: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

Termination

Assumption No marker remains forever in an input channel

Claim: If the graph is strongly connected and at least one process records its state, then all processes will record their state in finite time

Proof: by induction

27

Page 28: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

28

The Recorded Global State

State transition: p :

q : C Dsend

receive

A Bsend

receive

p

C2

C1

q

Ex: System

Page 29: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

29

A D

B C

B D

A C

p q qp

p sends

q sends

p receives

A D

qp

C2

C1A Bsend

receiveC D

send

receive

A

Page 30: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

30

What did we get?

Page 31: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

31

Event e in process p is an atomic action: can change the state of p, and a state of at most one channel c incident on p (by sending/receiving message M along c )

e is defined by < p, s, s’, M, c > e =<p, s, s’, M, c> may occur in global state S

if 1. the state of p in S is s. 2 a. if c is directed towards p: c’s state has M in its head, and is deleted after applying e . b. if c is directed from p: c’s state has M in its tail after applying e . 3. the state of p after applying e is s’.

Page 32: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

32

Process State and Global State A process: set of states, an initial state set of events A global state S: collection of process

states and channel states initially, each process is in its initial state and

all channels are empty

next(S, e) is the global state after event e in applied to global state S

Page 33: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

33

Process State and Global State

seq = (ei : i = 0…n) is a computation of the system iff

ei may occur in Si , 0 i n

Si+1 = next(Si, ei)

(S0 is the initial global

state)

Page 34: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

34

seq = (ei: i ≥ 0) a distributed computation

Si – the state of the system right before ei occurs

S0 – the initial state of the system

St – the state of the system at the termination of

the algorithm

S* - the recorded global state

The Recorded Global State

Page 35: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

35

Definition Event ej is called pre-recording if ej is in a process p and p records its state after ej in seq .Event ej is called post-recording if ej is in a process p and p records its state before ej in seq .Assume that ej-1 is a post-recording event before Pre-recording event ej in seq.

pre-recording

post-recording

Page 36: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

36

Lemma:

Proof: ej-1 occurs in p and ej in q , and q ≠p(since ej-1 is and ej is .)

1

1

1 2

3

1 3 3 4

24

I f , then

. canbe applied in ,say ,

. canbe applied in ,say , and

c. S =S .

j j

j

j

e e

je

je

S S S

a e S S S

b e S S S

Page 37: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

37

The only scenario that might prevent interchanging the two events is that a message M is sent at ej-1 and received at ej .

but this cannot be possible: if M is sent at ej-1 , then M is , so a marker was sent to q before M, so when it is received in ej q already recorded its state, so ej Is ,a , a contradiction.

Page 38: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

38

Hence, event ej can occur in global state Sj-1. The state of process p is not altered by ej, hence ej-1 can occur after ej.

Page 39: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

39

We have to show that the states of all Processes and channels are the same in S2 and S4 .This clearly holds for proceses and channels That do not take part in ej-1 and ej .

Page 40: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

40

states: the states of p and q in S2 and in S4 are the same.

channels: whether ej-1/ej send/receive(/neither) a message along a channel, the same is done in both scenarios, So the states of the channels in S2 and S4 are the same.(End of proof. )

Page 41: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

(The Recorded Global State)

j '

jj '

j '

, where

1.

seq' = (e : j 0)

j < i j t : e = e

(e | i2. j <t)

: Given an execution seq, and an

output of the snapshot algorithm S*, there

exists a computation

For all or

The subseq

Theorem

uence

j

jj '

k

(e | i j <t)

j < i j t : S = 3

4. , such that

S

k, i k t

S * =

.

S

is a

permutation of the subsequence

For all or

There exists

Page 42: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

42

Proof Using the lemma, swap the events till all events appear after all events. The acquired computation is seq’. All that is left to show: S* is a global state after all events and before all events.1. Process states2. Channel states

Page 43: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

43

Claim: The state of a channel in S* is(sequence of messages corresp. to pre-recorded receives)-(sequence of messages corresp. to prerecorded sends) Proof: The state of channel c from process p to process q recorded in S* is the sequence of messages received on c by q after q records its state and before q receives a marker on c. The sequence of messages sent by p is the sequence corres. to prerecording sends on c.

Page 44: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

44

A D

B C

D

A C

p q qp

p sends

q sends

p receives

A D

B

post

pre

post

qp

C2

C1A Bsend

receiveC D

send

receive

Page 45: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

45

A D

A D

D

A C

p q qp

q sends

p sends

p receives

A D

A

(Another execution)

pre

post

post

B

qp

C2

C1A Bsend

receiveC D

send

receive

Page 46: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

What did we get?

A configuration that could have happened

46

Page 47: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

seq = (ei: i ≥ 0) a distributed computation

Si – the state of the system right before ei occurs

S0 – the initial state of the system

St – the state of the system at the termination of

the algorithm

S* - the recorded global state

47

Page 48: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

Stable Detection

D - distributed systemy - a predicate function defined on the set of global states of DS, S’ – global states of D

y is a stable property of D if y(S) implies y(S’) for all S’ reachable from S

48

Page 49: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

49

Input: A stable property yOutput: a boolean value b with the property: y(S0) b and b y(St)

Algorithm

Algorithm: begin

record a global state S* b := y(S*) end

Page 50: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

50

Correctness 1. S* is reachable from S0

2. St is reachable from S*3. y(S) y(S’) for all S’ reachable from S

S0 S* St

y(S*)=true y(St)=true

y(S*)=false

y(S0)=false

Page 51: Distributed Computing 5. Snapshot Shmuel Zaks zaks@cs.technion.ac.il ©

References

K. M. Chandy and L. Lamport,Distributed Snapshots:Determining Global States of Distributed Systems

51