40
SNAPSHOT ALGORITHM A paper by k. Mani Chady Leslie Lamport Presenting Einat Zuker

S NAPSHOT A LGORITHM. W HAT IS A S NAPSHOT - INTUITION Given a system of processors and communication channels between them, we want each processor to

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

SNAPSHOT ALGORITHMA paper by

k. Mani ChadyLeslie Lamport

PresentingEinat Zuker

2

WHAT IS A SNAPSHOT - INTUITION

Given a system of processors and communication channels between them, we want each processor to have a “picture” of the global system state.

Each processor however can only take a “small picture” of the global system (only itself…)

But, if we put together all the “small pictures”, we would have a complete description of the global state of the system.

The “big picture” we are putting together must be meaningful and informative to be called a snapshot of the system.

3

SNAPSHOT - WHY DO WE WANT IT

Stability detection A stable system - the system in a given state

holds a certain propriety means that all the possible next states of the system will hold that property too, then we can call the system stable.

Examples of stability: Deadlock No tokens in a token ring Computation has terminated

4

THE DISTRIBUTED SYSTEM MODEL

Representation – a directed graph. Vertices - represent the processors Edges - represent the communication channels

Assumptions: no synchronization (no clocks) Channels have infinite buffers Channels are error-free Channels deliver messages in the order sent (FIFO) A message in a channel can be delayed for an

arbitrary but finite time (all messages will eventually arrive at their destination)

5

THE DISTRIBUTED SYSTEM MODEL - DEFINITIONS

State of a channel - the sequence of messages sent along the channel, excluding the messages received along the channel.

State of a processor – a single element of some finite set.

no messages sent.state of c is: empty

processor p sent M1 state of c is: M1

processor p sent M2

state of c is: M2 M1

qpc

qpc

qpc

1M

6

THE DISTRIBUTED SYSTEM MODEL – DEFINITIONS CONT’D

Event – an event e is the tuple: <p, s, s’, M, c> where: p – the processor in which the event occurs s – the state of p before the event s’ – the state of p after the event c – the channel whose state was changed by the

event (can be null) M – the message sent (or received) from p throw the

channel c (can be null) Less formally:

an event is an atomic action of a processor, that may change the state of the processors, and the state of at most one channel connected to p.

7

EXAMPLE – THE SINGLE TOKEN CONSERVATION SYSTEM

The system properties:two processors, two communication channels, one tokenprocessors states:

s0 – no tokens1 – has token

initial state for p: s1, initial state for q: s0, initial state for channels: emptyevents in the system can be:

e1 = <p, s1, s0, pass token, c>e2 = <q, s0, s1, receive token, c> etc’…

qp

c

c’

S1 S0 qp

c

c’

S0 S1

qp

c

c’

S0 S0e1 e2

qp

c

c’

8

THE DISTRIBUTED SYSTEM MODEL – DEFINITIONS CONT’D

Global state – the set of the processors states and the channels states. initial global state – a global state where each processor

is in it’s initial state and each channel is in an empty state.

Next(S,e) – a function which value is the global state immediately after the occurrence of the event e in the global state S. next() is defined only if event e can occur in the global

state S. for a global state S, and an event e = <p,s,s’,M,c >

if next(S,e) = S’ thenthe state of p in S’ is s’the state of the channel c in S’ is it’s state in S with

the message M added to it’s tail or removed from it’s head

9

e0= <p, s1, s0, pass token, c>

EXAMPLE – THE SINGLE TOKEN CONSERVATION SYSTEMthe possible global states of the single token conservation system

S0

e0= <p, s1, s0, pass token, c>next(S0,e0) = S1

e3=<p, s0, s1, receive token, c’> next(S3,e3) = S0

e1 =<q, s0, s1, receive token, c> next(S1,e1) = S2

e2=<q, s1, s0, pass token, c’> next(S2,e2) = S3

S1

S2 S3

qp

c

c’

s1 s0 qp

c

c’

s0 s0

qp

c

c’

s0 s1 qp

c

c’

s0 s0

10

THE DISTRIBUTED SYSTEM MODEL – DEFINITIONS CONT’D

Computation of the system – a sequence of events in the system.

more formally: given a sequence of eventsseq = (e0,e1,…,ei,…en)

seq is a computation of the system iff event ei can occur in state Si and

next(Si, ei) = Si+1

(S0 is the initial global state)

in the previews example: the computation of the system was: (e0,e1,e2,e3)

but the sequence (e0, e2) can not be.

11

THE ALGORITHM REQUIREMENTS

The snapshot algorithm must run concurrently with the system computation.

The snapshot algorithm can not alter the computation in any way.

Any messages sent for recording purpose must not interfere with the computation of the system.

12

SNAPSHOT ALGORITHM - FIRST IDEA

Each processor will add its state to the recorded snapshot at some point of the computation (let’s assume we can see the channels states also and record them in the same fashion)

What can happen?

13

e0= <p, s1, s0, pass token, c>

e0= <p, s1, s0, pass token, c>

FIRST IDEA - THE PROBLEM

the system is in global state

S0 - “token in p”.p decides to record itself

the snapshot received

there is no such

global state

reachable from S

0 !

Lets take a look at the single token conservation system:

S0 S1

qp

c

c’

s1 s0 qp

c

c’

s0 s0

the system moves to global stateS1 - “token in c”

c, c’, q decide to record themselves

S*

qp

c

c’

s1 s0

14

FIRST IDEA - THE PROBLEM CONT’D

What happened? p was recorded before it sent a message. c was recorded after p sent a message. the snapshot had too many messages in it.

Let us denote: n - # of messages in channel right before it’s

source was recorded n’ - # of messages in channel right before

recording the channel In our case: n=0, n’=1 Can we conclude that if n < n’ the snapshot is

inconsistent?

15

e0= <p, s1, s0, pass token, c>

e0= <p, s1, s0, pass token, c>

FIRST IDEA - THE PROBLEM CONT’D

the system is in global state

S0 - “token in p”.c decides to record itself

the snapshot received

there is no such

global state

reachable from S

0 !

Lets take a look again at the single token conservation system:S0 S1

qp

c

c’

s1 s0 qp

c

c’

s0 s0

the system moves to global stateS1 - “token in c”

p, c’, q decide to record themselves

S*

qp

c

c’

s0 s0

16

FIRST IDEA - THE PROBLEM CONT’D

What happened? c was recorded before p sent a message. p was recorded after it sent a message. we lost messages in the snapshot.

Remember the denotation: n - # of messages in channel right before it’s

source was recorded n’ - # of messages in channel right before

recording the channel In our case: n=1, n’=0 Can we conclude that if n > n’ the snapshot is

inconsistent?

17

FIRST IDEA - CONCLUSIONS

the problem in both cases was that we didn’t had a means to monitor the messages that went throw the channel when the recording was done.

we need the algorithm to insure that the snapshot we take will reflect the messages passing in the channel

18

THE SNAPSHOT ALGORITHM CONDITIONS denotations:

for two processor p, q and a channel c between them from p to q n - # of messages sent throw c before p was recorded n’ - # of messages sent throw c before c was recorded m – # of messages received from c before q was recorded m’ – # of messages received from c before c was recorded

the following conditions are required from the snapshot: n = n’ m = m’ n’ ≥ m’ n ≥ m if n’ = m’, the recorded state of c must be the empty

sequence if n’ > m’, the recorded state of c must contain the

messages: [tail] (n’),…,(m’+1)[head] messages sent by p along c

the n’-th message the (m’+1)-th message

m’

n’

M1M2M3M4M5M6

19

THE SNAPSHOT ALGORITHM CONDITIONS CONT’D

M6 M5 M4 M3 M2 M1

p recorded q recorded

the recording of c

In less formal way:

The recorded state of c must be the sequence of messages sent along c before the state of p is recorded,excluding the sequence of messages received along c before the state of q is recorded

20

THE ALGORITHM OUTLINE

p will send a special message called a marker after the n message it sent (and before sending other message)

q will record channel c’s state. the recorded sate will be the messages received by q after q recorded it’s state and before q received the marker.

q will record it’s state spontaneously, or immediately after the marker is received that is, before receiving (or sending) any other messages

21

THE ALGORITHM CREATORS

k. Mani Chandy

Leslie Lamport

E. W. Dijkstra

22

THE ALGORITHM Marker-Sending Rule for a Processor p:

For each channel c directed away from p, p sends one marker along c right after p records its state and before p sends further messages along c.

Marker-Receiving Rule for a processor q: On receiving a marker along a channel C

if q has not recorded its state then q records its state q records the state of c as the empty sequence

else q records the state of c as the sequence of messages

received along c after q’s state was recorded and before q received the marker along c.

23

THE ALGORITHM - RUNNING EXAMPLE

p sends the token, then record itself

c’ q c pthe

snapshot

p sends a markerq receives the token,

and then receives the marker.

q records itself and the incoming channel

c

q sends a marker

p receives the marker.

it already recorded itself, so it only

needs to record the state of it’s incoming

channel c’

S0 – no token

empty

empty

S1 – has token

qp

c

c’

s1 s0 qp

c

c’

s0 s0 qp

c

c’

s0 s1

qp

c

c’

s0 s1 qp

c

c’

s0 s1

24

SOME NOTES ABOUT THE ALGORITHM The algorithm can be initiated by one or

more processors. each processor records its state spontaneously

(without receiving markers from other processors)

the collection of the snapshot “pieces” from each processor is a topic for a separate discussion but, if we will recall the synchronization

algorithm for asynchronies system (with some variations), we can come up with ways to form the “big picture” for each processor.

25

TERMINATION OF THE ALGORITHM

do we have a snapshot of the system in a finite time?that is, do we have a recording of each processor and

channel in a finite time?

Lemma 1: if there is a path in the system from p to q, and p recorded itself, then q will record itself in finite time.

proof: if p is directly connected to q then p will send a marker to

q and q will record itself once the marker has reached (remember that all messages sent throw a channel will reach their destination in finite time).

so, if p records its state and there is a path from p to q, then q will record its state in finite time because, by induction, every processor along the path will record its state in finite time and will send a marker in all of it’s outgoing channels.

26

Lemma 2: the algorithm terminates in finite time, with a recording of each processor and channel

proof: all the processors will eventually record their state

(spontaneously, or because some other processor recorded itself as we know from Lemma 1)

this means every processor will send a marker throw all of it’s outgoing channels so, a marker will be sent throw all channels.

once the marker reaches it’s destination the channel will be recorded. this is true for all channels since all of them had a marker sent throw them.

thus, all the channels are recorded in finite time too.

TERMINATION OF THE ALGORITHM CONT’D

27

EXAMPLE – NON DETERMINISTIC SYSTEM

note that the calculation in this case is not deterministic.for example, from S0 the event occurred could have been also: e0=<q,C,D,N,c’>

initial global statee0 = <p,A,B,M,c>e1 = <q,C,D,N,c’>e2 = <p,B,A,N,c’>

the system properties: two processors: p, q. two communication channels: c, c’ p has 2 states {A,B} q has 2 states {C,D}

p can send the message M while in state A. sending the message cusses it to move to state B.

p can receive the message N while in state B. receiving the message cusses it to move back to state A.

q works symmetrically to p.

a possible computation of the system:

qpc

c’

A CM N

qpc

c’

A DM

N

S0 S1 S2 S3

qpc

c’

BM N

C qpc

c’

BM

ND

28

the system is in global state S0

p records itself and sends the marker

c’ q c pthe

snapshot

system goes to global state S1

p receives the marker.it already recorded itself so it needs to record the

state of c’

A Nempty D

system goes to global state S2

system goes to global state S3

q receive the marker.

q records itself and the incoming channel c.q sends the marker

THE ALGORITHM - RUNNING EXAMPLE 2

qp

c

c’

A CM N

qp

c

c’

B CM N

qp

c

c’

A DM

Nqp

c

c’

A DM

N

qp

c

c’

B DM N

qp

c

c’

A DM

N

what is strange

in this

snapshot?

29

the snapshot the algorithm takes is not necessarily a global state the system was in.

so, what does the snapshot represent then?

the answer is, that the snapshot is a reachable global state of the system.

in addition, if the events were to occur in a different order, the snapshot would be one of the global states reached.

this makes the snapshot consistent with it’s system.

THE NON DETERMINISTIC EXAMPLE - ANALYSIS

30

Given: seq = (ei, i ≥ 0) a computation of some system Si the global state of the system before event ei

Sj the initial global state of the system Sk the global state of the system when the algorithm terminated (0

≤ j ≤ k) S* the global state the algorithm recorded (the snapshot)

then there is a computation of the system seq’ that: for all i, i < j or i ≥ k, ei’ = ei for all i, i ≤ j or i ≥ k, Si’=Si

the sub sequence (ei’, j ≤ i < k) is a permutation of the sub sequence (ei, j ≤ i < k)

there exists some t, j ≤ t ≤ k, such that S* = St’

Sj Sk

THEOREM

seq:

e0 e1ej-1 ej ek-1 ek ei

31

pre-recording event – an event that occurred in processor p before p recorded it’s state.

post-recording event - an event that occurred in processor p after p recorded it’s state.

note: for event ei in seq : if i < j then ei is a pre-recording event if i ≥ k then ei is a post-recording event

note: for event ei in seq such that j < i < k the event ei-1 can be a post-recording event and the event

ei can be a pre-recording event if they occurred in different processors.

if they occurred in the same processor and e i-1 is a post-recording event then both must be post-recording events

PROOF - DEFINITIONS

32

lets denote ei-1=<p,a,b,M,c>, ei=<q,a’,b’,M’,c’> lets assume:

ei-1 is a post-recording event ei is a pre-recording event

can M=M’ and c’=c?

that is, can q be receiving the message p sent?

the answer is no.

ei-1 is a post-recording event which means that a marker was sent in c before M was sent.

the same marker was received by q before M reached it. when q received the marker it recorded itself so if ei =

<q,a’,b’,M,c> it can only be a post-recording event. in contradiction to the fact that ei is a pre-recording event

PROOF - DETAILS

33

we saw that ei-1 and ei are independent of each other

this means we can swap their order in the computation seq

the new computation: ei-2,ei,ei-1

will end with the same global state as the original computation: ei-2,ei-1,ei

PROOF – DETAILS CONT’D

ej ej+1ei-1 ei ek-1 ek

ej ej+1ei ei-1 ek-1 ek

swap

SkSiSi-1

SkS’iSi-1

Si+

1

Si+

1

34

let seq’ be a computation were every post- recording event that occur right before a pre-recording event are swapped

we repeat the swapping until seq’ has all pre-recording events before post-recording events

note: seq’ is a computation of the system for all i, i < j or i ≥ k, ei’=ei for all i, i ≤ j or i ≥ k, Si’=Si

PROOF – DETAILS CONT’D

ej ej+1ei-1 ei ek-1 ek

swap

e0

e’j e’j+1e’i-1 e’i e’k-1 eke0

35

lets look at the global system state after the last pre-recording event and before the first post-recording event. we will denote this state St (j ≤ t ≤ k)

for some processor p let us assume the last state p was in before recording is a. (that means p recorded a as it’s state)

in the global state St we will see that p is in state a

in the snapshot S* we also see that the state of p is a (because p recorded a)

we conclude that the state of each processor in St is the same as in S*

PROOF – DETAILS CONT’D

36

for some channel c from p to q: in St the messages in c are the ones p send

before sending a marker in c (before p recorded itself) without the messages q received before recording itself

in the snapshot S* c contains all the messages q received in c after it recorded itself and before it received a marker in c

we conclude that the messages in c in the global state St and in the snapshot S* are the same.

PROOF – DETAILS CONT’D

37

it is now clear that we have proven our Theorem: there is a computation of the system seq’ that:

for all i, i < j or i ≥ k, ei’ = ei for all i, i ≤ j or i ≥ k, Si’=Si

the sub sequence (ei’, j ≤ i < k) is a permutation of the sub sequence (ei, j ≤ i < k)

there exists some t, j ≤ t ≤ k, such that S* = St’

PROOF – CONCLUSIONS

38

EXAMPLE – PERMUTE A COMPUTATION

recall the non deterministic example:the computation we saw was:

Next(S0,e0)=S1 Post-recording

e0=<p,A,B,M,c>

S0

Next(S1,e1)=S2 Pre-recording e1=<q,C,D,N,c’>

S1

Next(S2,e2)=S3 Post-recording

e2=<p,B,A,N,c’>

S2and the recorded global state was

c’ q c pS*N D empt

yA

now, lets swap the events so all pre-recordings will precede post-recordings:

the global state S’1 of this computation is exactly the snapshot of the original computation.

Next(S’0,e’0)=S’1 Pre-recording e’0 =<q,C,D,N,c’>

S0’

Next(S’1,e’1)=S’2 Post-recording e’1=<p,A,B,M,c>

S1’=S*

Next(S’2,e’2)=S’3 Post-recording e2=<p,B,A,N,c’>

S’2

39

THE ALGORITHM - FINAL CONCLUSIONS

we saw that St=S*. from this we can see: that the snapshot S* is reachable from Sj

that Sk is reachable from the snapshot S*

we saw S* could have been a global state of the computation if events were to occur in a different order

this means the snapshot is indeed valuable and informative when judging stability of a system

40

REFERANCE

Chandy, K. M and Lamport, L. Distributed Snapshots: Determining Global States of Distributed Systems

Dijkstra, E. W. The distributed snapshot of K. M. Chandy and L. Lamport.