30
Symmetrical Global- Snapshot Algorithms for Large-Scale Distributed Systems Guided by Presented by Shri. R.S. Sharma Ashutosh Jaiswal Associate Prof, CSE M.tech 2 nd Yr

Flexible Symmetric Global Snapshot

Embed Size (px)

Citation preview

Page 1: Flexible Symmetric Global Snapshot

Flexible Symmetrical Global-Snapshot

Algorithms for Large-Scale Distributed

Systems

Guided by Presented by Shri. R.S. Sharma Ashutosh JaiswalAssociate Prof, CSE M.tech 2nd Yr

Page 2: Flexible Symmetric Global Snapshot

A snapshot of a distributed system is a global state

where the local states of all processes and of all communication channels are recorded simultaneously

What is a snapshot

Page 3: Flexible Symmetric Global Snapshot

Detection of deadlock of a distributed system Compute monotonic functions of the global

state such as lower bounds on the simulation time.

Check pointing and recovery of distributed data bases

Monitoring and debugging of distributed systems.

Where is a snapshot used?

Page 4: Flexible Symmetric Global Snapshot

Global State

“The global state of a distributed system is the set of local states of all individual processes involved in the computation plus the state of the communication channels.”

Determining Global States

Page 5: Flexible Symmetric Global Snapshot

Distributed State : Have to collect

information that is spread across several machines!!

Only Local knowledge : A process in the computation does not know the state of other processes.

The lack of globally shared memory, global clock and unpredictable message delays in a distributed system make this problem non-trivial.

Why global state determination is difficult in

Distributed Systems?

Page 6: Flexible Symmetric Global Snapshot

A cut is consistent if no message arrow starts in

future and ends in past. (e.g. ) AB Otherwise it is inconsistent ( e.g.) CD

Consistent and Inconsistent cuts

A

B

2

1

3

4

D

C

Page 7: Flexible Symmetric Global Snapshot

A global state computed along a consistent

cut is correct The global state of a consistent cut comprises

the local state of each process at the time the cut event happens and the set of all messages sent but not yet received

The snapshot problem consists in designing an efficient protocol which yields only consistent cuts and to collect the local state information

Global States of Consistent Cuts

Page 8: Flexible Symmetric Global Snapshot

How to distinguish between the messages to

be recorded in the snapshot from those not to be recorded. -Any message that is sent by a process before

recording its snapshot, must be recorded in the global snapshot

-Any message that is sent by a process after recording its snapshot , must not be recorded in the global snapshot

Issues in recording a global state

Page 9: Flexible Symmetric Global Snapshot

A process pj must record its snapshot before

processing a message mij that was sent by process pi after recording its snapshot.

How to determine the instant when a process takes its

snapshot

Page 10: Flexible Symmetric Global Snapshot

There are two models of communication : FIFO, non-FIFO In FIFO model, each channel acts as a first-in

first-out message queue and thus, message ordering is preserved by a channel.

In non-FIFO model, a channel acts like a set in which the sender process adds messages and the receiver process removes messages from it in a random order.

Models of communication

Page 11: Flexible Symmetric Global Snapshot

The system consists of a collection of n processes

p1, p2, ..., pn that are connected by channels. There are no globally shared memory and

physical global clock and processes communicate by passing messages through communication channels.

Cij denotes the channel from process pi to process pj and its state is denoted by SCij .

The actions performed by a process are modeled as three types of events: Internal events,the message send event and the

message receive event. For a message mij that is sent by process pi to process

pj , let send(mij ) and rec(mij ) denote its send and receive events.

System Model for Global Snapshots

Page 12: Flexible Symmetric Global Snapshot

At any instant, the state of process pi ,

denoted by LSi , is a result of the sequence of all the events executed by pi till that instant.

For an event e and a process state LSi , e∈LSi iff e belongs to the sequence of events that have taken process pi to state LSi .

For an event e and a process state LSi , e (not in) LSi iff e does not belong to the sequence of events that have taken process pi to state LSi .

Process States and Messages in transit

Page 13: Flexible Symmetric Global Snapshot

Chandy-Lamport algorithm

The Chandy-Lamport algorithm uses a control message, called a marker whose role in a FIFO system is to separate messages in the channels.

After a site has recorded its snapshot, it sends a marker, along all of its outgoing channels before sending out any more messages.

A marker separates the messages in the channel into those to be included in the snapshot from those not to be recorded in the snapshot.

A process must record its snapshot no later than when it receives a marker on any of its incoming channels.

Snapshot algorithms for FIFO

channels

Page 14: Flexible Symmetric Global Snapshot
Page 15: Flexible Symmetric Global Snapshot

In a non-FIFO system, a marker cannot be

used to delineate messages into those to be recorded in the global state from those not to be recorded in the global state.

Snapshot algorithms for non-FIFO channels

Page 16: Flexible Symmetric Global Snapshot

The Lai-Yang algorithm fulfills this role of a marker in a non-

FIFO system by using a coloring scheme on computation messages that works as follows: Every process is initially white and turns red while

taking a snapshot. The equivalent of the “Marker Sending Rule” is executed when a process turns red.

Every message sent by a white (red) process is colored white (red) indicating if it was sent before(after) snapshot.

Each process (which is initially white) becomes red as soon as it receives a red message for the first time and starts a virtual broadcast algorithm to ensure that all processes will eventually become red.

Lai-Yang algorithm

Page 17: Flexible Symmetric Global Snapshot

White process records history of white msgs

sent/received on each channel. When a process turns red, it sends these histories

along with its snapshot to the initiator process that collects the global snapshot.

Initiator process evaluates transit(LSi , LSj ) to compute state of a channel Cij :

SCij = white messages sent by pi on Cij − white messages received by pj on Cij =

{send(mij )|send(mij ) ∈ LSi } − {rec(mij )|rec(mij ) ∈ LSj }.

Determining Messages in transit

Page 18: Flexible Symmetric Global Snapshot

Assumptions:

Author assumes a pre-established spanning tree on the distributed system for an initiator to perform a one-to-all broadcast to inform all other processes the initiation of a new global snapshot.

In doing so, it can be ensured that even without receiving any red message, a white process will learn that one snapshot execution has been initiated and then will execute the global-snapshot algorithm to construct the global snapshot cooperatively with all other processes

The Proposed Algorithm

Page 19: Flexible Symmetric Global Snapshot

There are three phases in the algorithm Phase 1: This is the phase of snapshot

initiation. The initiator performs a one-to-all broadcast of INIT control messages along the pre-established spanning tree to inform all other processes .

Page 20: Flexible Symmetric Global Snapshot

Phase 2: This is the phase of accumulating numbers of

white messages. Upon the receipt of an INIT control message or a red computation message, a white process turns red and records its local state.

Subsequently, the algorithm directs the whole processes to calculate in a symmetrical manner the total number of white messages that each process is supposed to receive.

In particular, every process pi maintains a vector of size N, wmsg senti[], to count in wmsg senti[j] the number of white messages that it has sent to another process pj .

Page 21: Flexible Symmetric Global Snapshot

Another vector of size N, sum wmsgi[] is used to

accumulate for process pj in sum wmsgi[j] the numbers of white messages sent to pj from distinct processes, and is initialized to wmsg senti[].

At the end of the last round, the sum of number of white messages sent to any process pi is accumulated in sum wmsgi[i], i.e. sum wmsgi[i] = Thus if wmsg receivedi = sum wmsgi[i], namely, all white messages supposed to be received have been received, pi turns white and then terminates the algorithm locally.

Page 22: Flexible Symmetric Global Snapshot

Phase 3: The is the phase of recording channel

states. When a red process pi receives a white computation message along a channel, such a message is added to the state of the channel. In addition, when the second phase is done and all white messages supposed to be received by pi have been received by pi, i.e. wmsg receivedi = sum wmsgi[i], pi turns white and then terminates the algorithm locally.

Page 23: Flexible Symmetric Global Snapshot

Algorithm

2 4

3

1

Initiator - 1

Page 24: Flexible Symmetric Global Snapshot

2 4

3

1

Initiator - 1

Control Messages & a Red message sent

Page 25: Flexible Symmetric Global Snapshot

4

3

1

Initiator - 1

Red Message Received at 2

2

Page 26: Flexible Symmetric Global Snapshot

2 4

1

Initiator - 13 sends white

Mesg to 1Ctrl Mesg

from 1 reaches 3

3

Page 27: Flexible Symmetric Global Snapshot

2

1

Initiator - 1

White Mesg received at

1

Red Ctrl Mesg

received

3

4

Page 28: Flexible Symmetric Global Snapshot

Global State detection difficult in Distributed

Systems Snapshot algorithm may not give an actual

state but is very helpful in detecting Stable Properties

Summary

Page 29: Flexible Symmetric Global Snapshot

[1] Jichiang Tsai, “Flexible Symmetrical Global-Snapshot Algorithms for Large-Scale Distributed Systems,” IEEE Transactions On Parallel And Distributed Systems, vol. Xx, no. Y, apr. 2012[2] R. Garg, V. Garg, and Y. Sabharwal, “Efficient algorithms for global snapshots in large distributed systems,” IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 5, pp. 620–630, May 2010.[3] A. D. Kshemkalyani, “Fast and message-efficient global snapshot algorithms for large-scale distributed systems,” IEEE Trans. Parallel and Distributed Systems, vol. 21, no. 9, pp. 1281–1209, Sept. 2010.[4]F. Mattern, “Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation,” J. Parallel and Distributed Computing, pp. 423-434, Aug. 1993.[5] K.M. Chandy and L. Lamport, “Distributed Snapshots: Determining Global States of Distributed Systems,” ACM Trans. Computer Systems, vol. 3, no. 1, pp. 63-75, Feb. 1985.

References

Page 30: Flexible Symmetric Global Snapshot

Thank You !