34
Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris and Daniel Suo [Content adapted from I. Gupta]

Chandy-Lamport Snapshotting

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chandy-Lamport Snapshotting

Chandy-Lamport Snapshotting

COS 418: Distributed SystemsPrecept 8

Themis Melissaris and Daniel Suo

[Content adapted from I. Gupta]

Page 2: Chandy-Lamport Snapshotting

Agenda

• Whatareglobalsnapshots?• TheChandy-Lamport algorithm• WhydoesChandy-Lamport work?

2

Page 3: Chandy-Lamport Snapshotting

Globalsnapshots

3

Page 4: Chandy-Lamport Snapshotting

Exampleofaglobalsnapshot

4

Page 5: Chandy-Lamport Snapshotting

Butthatwaseasy

• Inoursystemofworldleaders,wewereabletocapturetheir‘state’(i.e.,likeness)easily– Synchronizedinspace– Synchronizedintime

• Howwouldwetakeaglobalsnapshotiftheleaderswereallathome?

• WhatifObamatoldTrudeauthatheshouldreallyputonashirt?

• Thismessageispartofoursystemstate!5

Page 6: Chandy-Lamport Snapshotting

Globalsnapshotisglobalstate

• Eachdistributedapplicationhasanumberofprocesses(leaders)runningonanumberofphysicalservers

• Theseprocessescommunicatewitheachotherviachannels(textmessaging)

• Asnapshot capturesthelocalstatesofeachprocess(e.g.,programvariables)alongwiththestateofeachcommunicationchannel

6

Page 7: Chandy-Lamport Snapshotting

Whydoweneedsnapshots?

• Checkpointing:restartiftheapplicationfails• Collectinggarbage:removeobjectsthatdon’thaveanyreferences

• Detectingdeadlocks:canexaminethecurrentapplicationstate

• Otherdebugging:alittleeasiertoworkwiththanprintf…

7

Page 8: Chandy-Lamport Snapshotting

Wecouldjustsynchronizeclocks

• Eachprocessrecordsstateattimesomeagreedupont– Butclocksskew– Andwewouldn’trecordmessages

• Doweneedsynchronization?• WhatdidLamport realizeaboutorderingevents?

8

Page 9: Chandy-Lamport Snapshotting

• Twoprocesses:P1andP2

Exampleofglobalsnapshotsv2

9

P1 P2

Page 10: Chandy-Lamport Snapshotting

• ChannelC12 fromP1toP2• ChannelC21 fromP2toP1

Exampleofglobalsnapshotsv2

10

P1 P2

C12

C21

Page 11: Chandy-Lamport Snapshotting

• ProcessstatesforP1andP2

Exampleofglobalsnapshotsv2

11

P1 P2

C12

C21

X:0Y:0Z:0

X:1Y:2Z:3

Page 12: Chandy-Lamport Snapshotting

• Channelstates(i.e.,messages)forC12andC21• Thisisourinitialglobalstate• Alsoaglobalsnapshot

Exampleofglobalsnapshotsv2

12

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:1Y2:2Z2:3

Page 13: Chandy-Lamport Snapshotting

• P1 tellsP2 tochangeitsstatevariable,X2,from1to4

• Thisisanotherglobalsnapshot

Exampleofglobalsnapshotsv2

13

P1 P2

C12:[X2 → 4]

C21:[Empty]

X1:0Y1:0Z1:0

X2:1Y2:2Z2:3

Page 14: Chandy-Lamport Snapshotting

• P2 receivesthemessagefromP1• Anotherglobalsnapshot

Exampleofglobalsnapshotsv2

14

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:1Y2:2Z2:3

X2 → 4

Page 15: Chandy-Lamport Snapshotting

• P2 changesitsstatevariable,X2,from1to4• Andanotherglobalsnapshot

Exampleofglobalsnapshotsv2

15

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

Page 16: Chandy-Lamport Snapshotting

• Theglobalstatechangeswheneveraneventhappens– Processsendsmessage– Processreceivesmessage– Processtakesastep

• Movingfromstatetostateobeyscausality

Summary

16

Page 17: Chandy-Lamport Snapshotting

Chandy-Lamport algorithm

17

Page 18: Chandy-Lamport Snapshotting

• Problem:recordaglobalsnapshot(stateforeachprocessandchannel)

• Model– N processesinthesystemwithnofailures– TherearetwoFIFOunidirectionalchannelsbetweeneveryprocesspair(Pi →Pj andPj →Pi)

– Allmessagesarrive,intact,notduplicated• Futureworkrelaxestheseassumptions

Systemmodel

18

Page 19: Chandy-Lamport Snapshotting

• Takingasnapshotshouldn’tinterferewithnormalapplicationbehavior– Don’tstopsendingmessages– Don’tstoptheapplication!

• Eachprocesscanrecorditsownstate• Collectstateinadistributedmanner• Anyprocesscaninitiateasnapshot

Systemrequirements

19

Page 20: Chandy-Lamport Snapshotting

• Let’ssayprocessPi initiatesthesnapshot• Pi recordsitsownstateandpreparesaspecialmarkermessage(distinctfromapplicationmessages)

• Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)

• StartrecordingallincomingmessagesfromchannelsCji forj notequaltoi

Initiatingasnapshot

20

Page 21: Chandy-Lamport Snapshotting

• ForallprocessesPj (includingtheinitiator),consideramessageonchannelCkj

• Ifweseemarkermessageforthefirsttime– Pj recordsownstateandmarksCkj asempty– Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)

– StartrecordingallincomingmessagesfromchannelsClj forl notequaltojork

• Elseaddallmessagesfrominboundchannelssincewebeganrecordingtotheirstates

Propagatingasnapshot

21

Page 22: Chandy-Lamport Snapshotting

• Allprocesseshavereceivedamarker(andrecordedtheirownstate)

• AllprocesseshavereceivedamarkeronalltheN-1 incomingchannels(andrecordedtheirstates)

• Later,acentralservercangatherthepartialstatetobuildaglobalsnapshot

Terminatingasnapshot

22

Page 23: Chandy-Lamport Snapshotting

• P1 initiatesasnapshot

Example

23

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

Page 24: Chandy-Lamport Snapshotting

• First,P1 recordsitsstate

Example

24

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

Page 25: Chandy-Lamport Snapshotting

• Then,P1 sendsamarkermessagetoP2 andbeginsrecordingallmessagesoninboundchannels

• Meanwhile,P2 sentamessagetoP1

Example

25

P1 P2

C12:[<marker>]

C21:[M1]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

Page 26: Chandy-Lamport Snapshotting

• P2 receivesamarkermessageforthefirsttime,sorecordsitsstate

• P2 thensendsamarkermessagetoP1

Example

26

P1 P2

C12:[Empty]

C21:[<marker>]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

<marker>

M1

Page 27: Chandy-Lamport Snapshotting

• P1 hasalreadysentamarkermessage,soitrecordsallmessagesitreceivedoninboundchannelstotheappropriatechannel’sstate

Example

27

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

M1

Page 28: Chandy-Lamport Snapshotting

• Bothprocesseshaverecordedtheirstateandallthestateofallincomingchannels

• Oursnapshottedstateishighlightedinblue

Example

28

P1 P2

C12:[Empty]

C21:[Empty]

X1:0Y1:0Z1:0

X2:4Y2:2Z2:3

M1

Page 29: Chandy-Lamport Snapshotting

ReasoningabouttheChandy-Lamport algorithm

29

Page 30: Chandy-Lamport Snapshotting

• RelatedtotheLamport clockpartialordering• Aneventispresnapshot ifitoccursbeforethelocalsnapshotonaprocess

• Postsnapshot ifafterwards• IfeventA happenscausallybeforeeventB,andB ispresnapshot,thenA istoo

Causalconsistency

30

Page 31: Chandy-Lamport Snapshotting

• IfA andB happenonthesameprocess,thenthisistriviallytrue

• ConsiderwhenA isthesendandB isthecorrespondingreceiveeventonprocessespandq,respectively– SinceB ispresnapshot,q can’thavereceivedamarkerandp can’thavesentamarker

– Amustalsohappenpresnapshot• SimilarlogicforA happeningpostsnapshot

Proof

31

Page 32: Chandy-Lamport Snapshotting

• Inorderforanapplicationmessagem inthechannelfromprocessp toprocessq tobeinthesnapshot– Musthappenafterq hasreceiveditsfirstmarker– Beforep hassentitsmarkertoq

• Amessagem willonlybeinthesnapshotifthesendingprocesswaspresnapshot andthereceivingprocesswaspostsnapshot

Pokingtheproof:PartI

32

Page 33: Chandy-Lamport Snapshotting

• Howdoweorderconcurrentevents?– Remember,allprocessescommunicate

• Whatifaprocessreceivesamarkerinbetweensendingamarkerandsomeevent?– Theseshouldhappenatomically

• Whatifsomethinghappensonaprocessindependentlyofmessagesafterthewall-clocktimeofwhenthesnapshotstarts?– Snapshotsarecausallyconsistent

Pokingtheproof:PartII

33

Page 34: Chandy-Lamport Snapshotting

Mondaytopic:StreamingDataProcessing

34