Global States in a Distributed System By John Kor and Yvonne Cheng

Preview:

Citation preview

Global States in a Distributed System

By John Kor and Yvonne Cheng

Initial Problem Example

Garbage CollectorFree’s up memory which is no longer in useCheck’s if a reference to memory still exists

What about in a distributed system

Initial Problem Example (cont’d)

A distributed system consists of multiple processes

Each process is located on a different computerNo sharing of processor or memory

Initial Problem Example (cont’d)

Each process can only determine its own “state”

Problem: How do we determine when to garbage collect in a distributed system?How do we check whether a reference to

memory still exists?

System Model

A distributed system consists of multiple processes Each process is located on a different computer Each process consists of “events” An event is either sending a message, receiving a

message, or changing the value of some variable Each process has a communication channel in and

out

Our Garbage Collection Problem

In order to test whether a certain property of our system is true, we cannot just look at each process individually

A “snapshot” of the entire system must be taken to test whether a certain property of the system is true

This “snapshot” is called a Global State

Definition

The global state of a distributed system is the set of local states of each individual processes involved in the system plus the state of the communication channels.

Determinism

Deterministic ComputationAt any point in computation there is at most

one event that can happen next.

Non-Deterministic ComputationAt any point in computation there can be

more than one event that can happen next.

Deterministic Computation

Non-Deterministic Computation

Determinism

Deterministic computationA local event would reveal everything about

the global state!The process will know other process’ state

Non-Deterministic computationBecause of branching, a local event cannot

reveal what the next step will be

Simple Algorithm

Create a new process that collects the states of every other process

Every process will save their state at an arbitrary time and send it to this new process

Advantages

Very simple

Easy to implement

Problems?

Based on the assumption that all processes work on a synchronized global clock

Wrong assumption!

Problems (cont’d)

State recorded by p

m

p q

Problems (cont’d)

p q

m

Problems (cont’d)

State recorded by q

p q

m

Problems (cont’d)

Global state recorded

m

p q

m

Another view

p

q

m

Another view

Process p has no record of sending m

Process q HAS record of receiving mProblem?Global state does not show p sending m,

therefore there is confusion as to where m came from

Breaks the Consistency concept

Consistency

A global state is consistent if it could have been observed by an external observer

If e e` , then both e and e` must reside within the same state

For a successful Global State, all states must be consistent

Solution

Need to develop an asynchronous algorithm

Cannot depend on a clock

Must ensure consistency in all global states

Assumptions

Distributed system: Finite set of processes and channels; described by graph

Processes Set of states, initial state, set of events

Channels FIFO, error-free, infinite buffers, arbitrary but finite

delay

PART 2

Presented By: Yvonne

Idea of a global state recording algorithm

- each process records its own state

- the two processes incident by one channel cooperate in recording the channel state

Challenge

- No global clock

- Need a meaningful result

- Superimposed on underlying computation

Meaningful: The notion of Consistency

- it could have been observed by an external observer

- All feasible states are consistent

An Example

p q

p

q

Sp0 Sp

1 Sp2 Sp

3

Sq0 Sq

1 Sq2 Sq

3

m1

m2

m3

A Consistent State?

p q

Sp1 Sq

1

p

q

Sp0 Sp

1 Sp2 Sp

3

Sq0 Sq

1 Sq2 Sq

3

m1

m2

m3

Yes

p

q

p q

Sp0 Sp

1 Sp2 Sp

3

Sq0 Sq

1 Sq2 Sq

3

m1

m2

m3

Sp1 Sq

1

A Consistent State?

p q

Sp2 Sq

3

m3

p

q

Sp0 Sp

1 Sp2 Sp

3

Sq0 Sq

1 Sq2 Sq

3

m1

m2 m3

Yes

p

q

p q

Sp0 Sp

1 Sp2 Sp

3

Sq0 Sq

1 Sq2 Sq

3

m1

m2 m3

Sp2 Sq

3

m3

An inconsistent State

p

q

p q

Sp0 Sp

1 Sp2 Sp

3

Sq0 Sq

1 Sq2 Sq

3

m1

m2

m3

Sp1 Sq

3

Conducting algorithm: Using An Example

- Processes: p and q- Channels: c and c’- Token: t

p q

c

c’

An Example

- p records its state

t

p q

c

c’

An Example

- q, c, and c’ record their states

t

p q

c

c’

An Example

- The composite global state!

t

p q

c

c’

t

An Example

- n: number of messages sent along c before p’s state is recorded

- n’: number of message sent along c before c’s state is recorded

p q

c

c’

An Example

- Reason of inconsistency: n<n’

t

p q

c

c’

t

p q

c

c’

n = 0

n’ = 1

Similar scenario

c is recorded when the token is at process p.

p sends the token through channel c, and the states of c’, p, and q are recorded.

The recorded global state : no tokens in the system.

The reason of inconsistency : n>n’

Conclusion from the example

A consistent global state

requires

n = n’

Similar Conclusion

m : number of messages received along c before q’s state is recorded

m’ : number of messages received along c before c’s state is recorded

To be consistency: m=m’

Some other equations

m’ : number of messages received along c before c’s state is recorded

n’ : number of messages sent along c before c’s state is recorded

m : number of messages received along c before p’s state is recorded

n : number of messages sent along c before p’s state is recorded

n = n’ m = m’

n’ >= m’

n >= m

Other Fact

The state of channel c that is recorded must be the sequence of messages sent along the channel before the sender’s state is recorded, excluding the sequence of messages received along the channel before the receiver’s state is recorded.

Two cases:n’=m’ : c is emptyn’>m’: c must be the (m’+1)st…n’th messages sent by p

along c

Put All Together:A brief sketch of the algorithm

p sends a marker message along all its outgoing channels after it records its state and before it sends any other messages.

On receipt of a marker message from channel celse

state ( c ) = messages received on c since it had recorded its state excluding the marker.

if p has not recorded its staterecord the statestate ( c ) = EMPTY

Chandy and Lamport Algorithm

Features: Does not promise us to give us exactly what is

there But gives us consistent state!!

Algorithm in Action

p

qSq

0 Sq1 Sq

2 Sq3

Sp0 Sp

1 Sp2 Sp

3

m1 m2 m3

Algorithm in Action

p

qSq

0 Sq1 Sq

2 Sq3

Sp0 Sp

1 Sp2 Sp

3

m1 m2 m3

q records state as Sq1 , sends marker to p

Algorithm in Action

p

qSq

0 Sq1 Sq

2 Sq3

Sp0 Sp

1 Sp2 Sp

3

m1 m2 m3

p records state as Sp2, channel state as empty

Algorithm in Action

p

qSq

0 Sq1 Sq

2 Sq3

Sp0 Sp

1 Sp2 Sp

3

m1 m2 m3

q records channel state as m3

Algorithm in Action

p

qSq

0 Sq1 Sq

2 Sq3

Sp0 Sp

1 Sp2 Sp

3

m1 m2 m3

Recorded Global State = ((Sp2, Sq

1), (0,m3) )

Algorithm in Action

p

qSq

0 Sq1 Sq

2 Sq3

Sp0 Sp

1 Sp2 Sp

3

m1 m2 m3

Recorded Global State = ((Sp2, Sq

1), (0,m3) )

Computation may not even have passed through the state recorded!

What have we recorded

The recorded consistent state can be anything!

Properties of the recorded global state

Si : global state when the algorithm starts

Sj : global state when the algorithm finishs

S*: state recorded by the algorithm

Then S* is reachable from Si

Sj is reachable from S*

S* Is reachable from Si

Si

Sj

Sj Is reachable from S*

Si

Sj

Still what good is it?

Stable PropertiesA property Y is called a stable property

iff for all states S` reachable from S Y(S) -> Y(S’)

Detection of Stable Properties

Outcome = false;

while ( outcome == false )

{

determine Global State S;

outcome = Y (S);

}

Checkpoint

S* serves as a checkpoint

On a failure, restart the computation from S*

Problem! Not able to restore to Sj

Si

Sj

S*

Solution: Publishing

A Broadcast medium

A central recorder process records all the messages received by each process

Processes record their states at their own time and send it to the recorder

Determining Global State

Recorder can construct global state from Checkpointed States of all processes

Plus Messages recorded since last checkpoint

Problems

Publishing keeps track of all messages received by each process

Expensive!Solution

recorder takes checkpoint of process p at time t

deletes all messages recd by p before t.

Comparison

SNAPSHOT PUBLISHING

NetworkStronglyconnected

Need not be

Mode Distributed Centralized

Scalability Yes No

Restorability No Yes

Conclusion

Global State detection difficult in Distributed Systems

Snapshot algorithm may not give an actual state but is very helpful in detecting Stable Properties

Publishing gives an asynchronous way of determining global states but is unscalable