29
UNDERSTANDING AND IMPLEMENTING CHORD Pamela Zave Princeton University Princeton, New Jersey

UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

UNDERSTANDING AND IMPLEMENTING CHORD

Pamela Zave

Princeton University

Princeton, New Jersey

Page 2: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

1

8

14

21

32

42

51

here, the identifier spaceconsist of all 6-bit binary values

THE CHORD PROTOCOL MAINTAINS A PEER-TO-PEER NETWORK identifier of a node (assumedunique) is an m-bit hashof its IP address

nodes are arranged ina ring, each nodehaving a successorpointer to the nextnode (in integer orderwith wraparound at 0)

the protocol preservesthe ring structure asnodes join, leave silently,or fail

redundant pointerssupport fault-tolerance(extra successors,predecessors)

successor

successor2

predecessor

“consistent hashing” meansthat all IP addresses are spread

evenly around the identifer space

Page 3: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

1

8

14

21

32

42

51

CHORD AS A DISTRIBUTED HASH TABLE

a hash table stores (key, value) pairs

the keys are the same as identifiers

keys 22 through 32 are stored at 32

keys 33 through 42 are stored at 42,

etc.

to look up a value, you only need toknow the IP address of one Chordmember

a list of accessible members isthe only central administration needed!

your query to that member is forwarded around the ringuntil it gets to the member that has the value

as an optimization, “finger” pointers go across the ring,make lookup faster

Page 4: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

OPERATIONS OF THE PROTOCOL

710

16

10JOINS

10STABILIZES

710

16

16GETS

NOTIFIED

10GETS

NOTIFIED

7

10

16

710

16

7STABILIZES

7

10

16

an operation changesthe state of one member

Join and Stabilize are scheduled autonomously,GetsNotified is caused by another member’s Stabilize

now 10 is completelyincorporated intothe ring!

Page 5: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

MORE OPERATIONS OF THE PROTOCOL

9

16

22 2235

9

16

35

22FAILS

OR LEAVES

9

16

35

16UPDATES

22 hasno pointers,

does not respondto queries, soother nodes

know that it has failed

9

16

35

35FLUSHES

9

16

35

9RECONCILES

now the holeleft by 22 isrepaired

Page 6: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

16

16

16

16FAILS

3STABILIZES

3

20 3

20

3

20

WHAT YOUR IMPLEMENTATION MIGHT DO

now 3 has no pointer toanother Chord member

the ring is broken,and the protocolcannot recover it

Page 7: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

16

16

16

16

16FAILS

3STABILIZES

3

20 3

20

3

320

20

WHAT YOUR IMPLEMENTATION MIGHT DO

nodes shouldreconcile more often,

so their redundantsuccessors havegood information

before over-writing with ,

should have checkedthat the new node is live

Page 8: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

DO YOU THINK YOU WOULD FIND AND FIXALL THESE LITTLE PROBLEMS?

DHTS (there are others besides Chord) have areputation for being unreliable

when a distributed system fails,it is very difficult to find out why!

Page 9: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

“As part of a lab project on distributed systems, I asked the 8 studentsto implement the Chord protocol based on the Chord paper [same one you read].

Each student had his/her own implementation. The ETH students are pretty goodsystems builders so they all did fairly well and got something running: We had a small test suite and all implementations passed that.

Then, I asked the students to do the second part of the project: Run a Chord ring with your own implementation X.

Then add a new node to this ring, but that new node uses Chord implementation Y from another student. There was no combination of X and Y (not even if there was only one X node andone Y node) in which two nodes with different Chord implementations could evenexchange a message. The interesting thing was that the problem was notserialization or message format or use of different TCP libraries or so. Thosewould have been easy fixes. There was no way for the students to fix the problemand get anything running even though they would get extra credit if they succeeded.

We allocated 10 weeks for the first part and 4 weeks for the second part.4 weeks were not enough!”

Donald Kossmann writes about an experiment he did with M.S. students incomputer science at ETH in 2008:

Page 10: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

WHY IS CHORD IMPORTANT TO YOU?

the 2001 SIGCOMM paper introducing Chordis one of the most-referenced

papers in computer science, . . .

. . . and won SIGCOMM’s 2011 Test of Time Award

APPLICATIONS

“Three features that distinguish Chordfrom many other peer-to-peer lookupprotocols are . . .

. . . its simplicity,

. . . provable correctness,

. . . and provable performance.”

allows millions of ad hoc peers tocooperate

the best-known application isBitTorrent

WIDELY used as a building blockin distributed fault-tolerantapplications

if you are a Ph.D. student,your dissertationmay use Chord!

WHY MIGHT YOU CHOOSE CHORD TOIMPLEMENT YOUR DISSERTATIONIDEA?

Page 11: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

OOPS!

WHY IS CHORD IMPORTANT TO YOU?

the 2001 SIGCOMM paper introducing Chordis one of the most-referenced

papers in computer science, . . .

. . . and won SIGCOMM’s 2011 Test of Time Award

APPLICATIONS

“Three features that distinguish Chordfrom many other peer-to-peer lookupprotocols are . . .

. . . its simplicity,

. . . provable correctness,

. . . and provable performance.”

allows millions of ad hoc peers tocooperate

the best-known application isBitTorrent

WIDELY used as a building blockin distributed fault-tolerantapplications

if you are a Ph.D. student,your dissertationmay use Chord!

WHY MIGHT YOU CHOOSE CHORD TOIMPLEMENT YOUR DISSERTATIONIDEA?

TRUE!

TRUE!

Page 12: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

THE CLAIMS

THE REALITY

even with simple bugs fixed andoptimistic assumptions aboutatomicity, the original protocol isnot correct

of the seven properties claimedinvariant of the original version, notone is actually an invariant

Correctness Property:

In any execution state, IF there areno subsequent Join or Fail events, . . .

. . . THEN eventually . . .

. . . all pointers in the network will beglobally correct, and remain so.

not surprisingly, due to sloppyinformal specification and proof

I found these problems byanalyzing a small Alloy model

Chris Newcombe and others atAWS credit this work withovercoming their bias againstformal methods, which they nowuse to find bugs.

[CACM, April 2015]

Page 13: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

“EVENTUAL REACHABILITY” IS NOT THE ONLY ISSUE

OrderedMerges . . .

. . . means that appendages mergein the correct places, as theydo here

12

6

10

16

6

10

1216

VIOLATIONS OF OrderedMerges

invalidate someassumptions used inperformance analysis

6STABILIZES,

12GETS

NOTIFIED

OrderedMergesis easilyviolated

are not incorrect

can be demonstrated inChord networks with3 nodes

how could they go unknownfor ten years?

this is whyformal methodsare so important

Page 14: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

BASIC CORRECTNESS STRATEGY 1

7

19

16

13

263

29

29

55

55

41

41

6

appendagesbest successor

(first livesuccessor)

ring

dead

29 2932

extended successor list (ESL)of 29 (with L = 2):

memberitself

Original operating assumption:

No failure leaves a member without a live successor.

But if an ESL with L = 2 is . . .

. . . then 32 cannot fail!Clearly, with L = 2, Chord isintended to tolerate one failurein a neighborhood, but not two.

Page 15: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

BASIC CORRECTNESS STRATEGY 2

7

19

16

13

263

29

29

55

55

41

41

6

appendages

best successor(first live

successor)

ring

dead

extended successor list (ESL)of 29 (with L = 2):

memberitself

Definition of FullSuccessorLists:The extended successor list of eachmember has L+1 distinct entries.

New operating assumption:

If a Chord network has the propertyFullSuccessorLists, then no failure leavesa member without a live successor.

Page 16: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

BASIC CORRECTNESS STRATEGY 2

7

19

16

13

263

29

29

55

55

41

41

6

appendages

best successor(first live

successor)

ring

dead

extended successor list (ESL)of 29 (with L = 2):

memberitself

Definition of FullSuccessorLists:The extended successor list of eachmember has L+1 distinct entries.

New operating assumption:

If a Chord network has the propertyFullSuccessorLists, then no failure leavesa member without a live successor.

if not satisfied for the real failure rate,

. . . increase rate of stabilization,

. . . or increase redundancy

Page 17: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

BASIC CORRECTNESS STRATEGY 3

7

19

16

13

263

29

55

41

6

appendages

ring

TO MAKE ORIGINAL CHORD CORRECT:

alter the initialization to satisfy FullSuccessorListswith all members live

alter the operations to populate successor lists moreeagerly, preserve FullSuccessorLists at all times

now it isroughly correct(in hindsight)

but how do weprove it

without aninvariant?

requires L+1 members

Page 18: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

w

x

y

u

z

ringx

fails

w

y

u

z

ring

WHY IS FINDING AN INVARIANT SO DIFFICULT?

there is a ring of best successors

there is no more than one ring

on the unique ring, the membersare in identifier order

from each appendage member, thering is reachable through bestsuccessors

THE KNOWN, NECESSARY PROPERTIES ARE STATED IN TERMS OF THE RING . . .

about the ring ofbest successors

about the appendages

. . . BUT “RING VERSUS APPENDAGE” IS CONTEXT-DEPENDENT AND FLUID:

Page 19: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

AN INTERMEDIATE RESULT

THE INDUCTIVE INVARIANT:

OneOrderedRing

ConnectedAppendages

BaseNotSkipped

and

and

ANOTHER OPERATING ASSUMPTION:

A chord network has a stable baseof L+1 nodes that are alwaysmembers.

no successor list skips overa member of the stable base

THE PROOF OF CORRECTNESS:

by exhaustive enumeration, in Alloy,for all model instances up to N = 9, L = 3

expensive to implement thesehigh-availability nodes!

a stable base would have 3-6members, while a Chord networkcan have millions of members—what is the base doing?

I believe it is just preventinganomalies in small networks,

but how can we know for sure?

Page 20: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

THE FINAL RESULT

THE INDUCTIVE INVARIANT:

OneLiveSuccessor

and SufficientPrincipals

ANOTHER OPERATING ASSUMPTION:

THE PROOF OF CORRECTNESS:

None

informal and intuitive, but

. . . a real proof (no size limits)

. . . backed up by an Alloy model checked up to N = 9, L = 3 (as a protection against human error)

. . .

this is just a formalizationof the original operatingassumption

Definition of a principal member:A member that is not skipped by anymember’s successor list.

Definition of SufficientPrincipals: There are at least L+1 principal nodes.

the “stable base” has become something we can prove, rather thanan assumption!

Page 21: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

CONCLUSIONS

SPECIFICATION OF A CORRECTVERSION OF CHORD

initialization is more difficult thanoriginal Chord, but a simple protocolwill get networks off to a safe start

otherwise correct Chord is justas efficient as original Chord

these peer-to-peer protocols have a(justified) reputation for unreliability

it is an impressivepattern for fault-tolerance

a correct specification could pave theway for a new generation of reliable,more useful implementations

it also provides a firm foundationfor work on better failure

detection and security

FOR YOUR DISSERTATION,ONLY THE BEST WILL DO!

use an implementation based on

“Reasoning about identifierspaces: How to make Chordcorrect”, Pamela Zave,IEEE Transactions on Software Engineering

Page 22: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

APPENDIX:

AN OUTLINE OF THE PROOF

Page 23: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

identifierspace

hypothesize adisordered extended

successor list[. . . x, . . . y, . . . z, . . .]

[x, . . . y, . . . z] mustinclude L + 1 principalnodes

Proof of OrderedSuccessorLists

CONTRADICTION!

but the length of anESL is always L + 1

picture,principal

nodes notskipped,

Sufficient Principals

x is either not aprincipal node,or is duplicated in[y, . . . z]

between [y, x, z]in identifier

space

same reasoningfor z

so the length of[x, . . . y, . . . z] is atleast L + 3

(L + 1)plus one xand one z

ORDERED SUCCESSOR LISTS . . .. . . ARE IMPLIED BY THE INVARIANT

x

y z

Definition of OrderedSuccessorLists:For all distinct identifiers x, y, z, and sublists [x, y, z] of an ESL(whether the sublist is contiguousor not) . . . between [x, y, z].

Page 24: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

every member has a bestsuccessor (first live successor)

there are sufficientprincipal nodes

here is a graph of bestsuccessors:

7

7

7

23

23

23

46

46

460

0

0

15

15

9

9

12

12

37

37

24

24

35

35

33

33

51

51

ssss

s

d d d d

these paths . . .

. . . do not skipprincipal nodes

. . . are acyclic

. . . are ordered by identifiers

each tree has exactly one p , whichis unique to it

so the re-arrangedgraph mustlook likethis

automaticallysatisfyingOneOrderedRingandConnected-Appendages

THE PRINCIPAL NODES MAKE THE SHAPE OF THE RING

Page 25: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

CONCURRENCY AND COMMUNICATION

node X node Y

atomic step at X

X changes state

step can beassumed to occur

at this instant

AN OPERATION IS A SEQUENCE OFATOMIC STEPS

EACH ATOMIC STEP IS AN INTERNALSTATE CHANGE OR THIS:

querymessage

replymessage

formal model has ashared-stateabstraction

while waiting for a reply (or timeout),X cannot answer queries about its state

because of the structure of operations,queries cannot form circular waits

Page 26: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

extended successor list ofstabilizing node, before Stabilize

extended successor listof its new successor

some Chord operations need multiple atomic steps

in the new, provably correct, specification,every intermediate operation state is alsoconstructed in this safe way

N N N NS S SS

S

0

0 0

00

01

1

12 23 3

N N N0 0 1 2

because invariant holds, nocurrent principal nodes are skippedhere or here

precondition guaranteesthat between [S , N , S ],so no current principalnodes are skipped fromS to N

therefore, no former principalnodes are skipped by this newsuccessor list, and the numberof principal nodes has notdecreased

HOW STABILIZE PRESERVES THE INVARIANT JOIN ANDRECTIFY

ARE SIMILAR

Page 27: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

HOW FAIL PRESERVES THE INVARIANT

PRESERVATION OFOneLiveSuccessor

PRESERVATION OFSufficientPrincipals

The operatingassumption is thatno failure leaves amember with no livesuccessor, . . .

. . . so the invariantis assumed to bepreserved.

Why can’t failure of a principal node leave the networkwith fewer than L+1 principals?

Lemma: The only operation that can cause a node tochange from principal to non-principal is its own failure.

The life history of a long-lived member:

Joinbecome principal because all neighbors know youenjoy life as a principal nodeFail

1234

Therefore the number of principal nodes isproportional to the number of nodes.

Once the network has grown (especially to millions ofmembers!) it is overwhelmingly improbable that it willhave fewer than 3-6 principal nodes.

Page 28: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

PROVING PROGRESS

IF THERE ARE NO MOREJOIN OR FAIL EVENTS . . .

dead successors areremoved, so that everymember’s firstsuccessor is live

. . . WHILE MEMBERSCONTINUE TO STABILIZE . . .

every member’s firstsuccessor andpredecessor becomeglobally correct

tails of all successorlists become correct

1

2

3

as with construction of intermediatesuccessor lists, operations must bespecified precisely to ensurecorrectness

here preconditions must ensurethat no operation reverses theprogress of a past or current phase

Page 29: UNDERSTANDING AND IMPLEMENTING CHORD · in distributed fault-tolerant applications if you are a Ph.D. student, your dissertation may use Chord! ... prove it without an invariant?

CONCLUSIONS

THE PRODUCT THE PROCESS

initialization is more difficult thanoriginal Chord, but a simpleprotocol will get networks off toa safe start

otherwise correct Chord is justas efficient as original Chord

these peer-to-peer protocolshave a (justified) reputation forunreliability

it is an impressivepattern for fault-tolerance

a correct specification couldpave the way for a new generation of reliable, moreuseful implementations

it also provides a firm foundation for work on better failure detection

and security

Chord is a very interesting protocol—note that the invariantlooks nothing like the propertieswe care about!

results would have been impossibleto find without model-checking toexplore bizarre cases and get ideasfrom them

the best result was impossible tofind without the insights that camefrom the proof process

that is where the idea of astable base came from

pamelazave.com> How to Make Chord Correct