62
Lecture 12 Page 1 CS 188,Winter 2015 Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Lecture 12 - Distributed Systems

Embed Size (px)

DESCRIPTION

Distributed Systems course lecture notes

Citation preview

  • Lecture 12 Page 1 CS 188,Winter 2015

    Agreement in Distributed Systems CS 188

    Distributed Systems February 19, 2015

  • Lecture 12 Page 2 CS 188,Winter 2015

    Introduction

    We frequently want to get a set of nodes in a distributed system to agree

    Commitment protocols and mutual exclusion are particular cases

    The approaches we discussed for those work in limited situations

    In general, when can we reach agreement in a distributed system?

  • Lecture 12 Page 3 CS 188,Winter 2015

    Basics of Agreement Protocols

    What is agreement? What are the necessary conditions for

    agreement?

  • Lecture 12 Page 4 CS 188,Winter 2015

    What Do We Mean By Agreement?

    In simplest case, can n processors agree that a variable takes on value 0 or 1? Only non-faulty processors need

    agree More complex agreements can be built

    from this simple agreement

  • Lecture 12 Page 5 CS 188,Winter 2015

    Conditions for Agreement Protocols

    Consistency All participants agree on same value

    and decisions are final Validity

    Participants agree on a value at least one of them wanted

    Termination/Progress All participants choose a value in a

    finite number of steps

  • Lecture 12 Page 6 CS 188,Winter 2015

    Challenges to Agreement Delays

    In message delivery In nodes responding to messages

    Failures And recovery from failures

    Lies by participants Or innocent errors that have similar

    effects

  • Lecture 12 Page 7 CS 188,Winter 2015

    Failures and Agreement

    Failures make agreement difficult Failed nodes dont participate Failed nodes sometimes recover at

    inconvenient times At worst, failed nodes participate in

    harmful ways Real failures are worse than fail-stop

  • Lecture 12 Page 8 CS 188,Winter 2015

    Types of Failures

    Fail-stop A nice, clean failure Processor stops executing anything

    Realistic failures Partitionings Arbitrary delays

    Adversarial failures Arbitrary bad things happen

  • Lecture 12 Page 9 CS 188,Winter 2015

    Election Algorithms

    If you get everyone to agree a particular node is in charge,

    Future consensus is easy, since he makes the decisions

    How do you determine whos in charge? Statically Dynamically

  • Lecture 12 Page 10 CS 188,Winter 2015

    Static Leader Selection Methods

    Predefine one process/node as the leader

    Simple Everyone always knows whos the

    leader Not very resilient

    If the leader fails, then what?

  • Lecture 12 Page 11 CS 188,Winter 2015

    Dynamic Leader Selection Methods

    Choose a new leader dynamically whenever necessary

    More complicated But failure of a leader is easy to handle

    Just elect a new one Election doesnt imply voting

    Not necessarily majority-based

  • Lecture 12 Page 12 CS 188,Winter 2015

    Election Algorithms vs. Mutual Exclusion Algorithms

    Most mutual exclusion algorithms dont care much about failures

    Election algorithms are designed to handle failures

    Also, mutual exclusion algorithms only need a winner

    Election algorithms need everyone to know who won

  • Lecture 12 Page 13 CS 188,Winter 2015

    A Typical Use of Election Algorithms

    A group of processes wants to periodically take a distributed snapshot

    They dont want multiple simultaneous snapshots

    So they want one leader to order them to take the snapshot

  • Lecture 12 Page 14 CS 188,Winter 2015

    Problems in Election Algorithms

    Some of the nodes may have failed before the algorithm starts

    Some of the nodes may fail during the algorithm

    Some nodes may recover from failure Possible at inconvenient times

    What about partitions?

  • Lecture 12 Page 15 CS 188,Winter 2015

    Election Algorithms and the Real Work

    The election algorithm is usually overhead Theres a real computation you want to

    perform The election algorithm chooses someone to

    lead it Having two leaders while real computation

    is going on is bad

  • Lecture 12 Page 16 CS 188,Winter 2015

    The Bully Algorithm

    The biggest kid on the block gets to be the leader

    But what if the biggest kid on the block is taking his piano lesson?

    The next biggest kid gets to be leader Until the piano lesson is over . . .

  • Lecture 12 Page 17 CS 188,Winter 2015

    Electing a Bully The kids come out to play

    Hey, Spike!

    Spikes Mom hasnt let him out yet

    Hey, Butch!

    Im here, who else is? Peewee! Cuthbert!

    Im the leader, lets play tag!

    The piano lesson ends

    Cuthbert Peewee! Butch! Im the leader, and were playing

    baseball!

    Hey, Spike! Hey,

    Spike! Im here, where are you sissies?

  • Lecture 12 Page 18 CS 188,Winter 2015

    Assumptions of the Bully Algorithm

    A static set of possible participants With an agreed-upon order

    All messages are delivered with Tm seconds All responses are sent within Tp seconds of

    delivery These last two imply synchronous behavior

  • Lecture 12 Page 19 CS 188,Winter 2015

    The Basic Idea Behind the Bully Algorithm

    Possible leaders try to take over If they detect a better leader, they agree

    to its leadership Keep track of state information about

    whether you are electing a leader Only do real work when you agree on a

    leader

  • Lecture 12 Page 20 CS 188,Winter 2015

    The Bully Algorithm and Timeouts

    Call out the biggest kids name If he doesnt answer soon enough,

    call out the next biggest kids name Until you hear an answer Or the caller is the biggest kid Then take over, by telling everyone

    else youre the leader

  • Lecture 12 Page 21 CS 188,Winter 2015

    The Bully Algorithm At Work

    One node is currently the coordinator It expects a certain set of nodes to be up and

    participating The coordinator asks all other nodes If an expected node doesnt answer, start an

    election Also if it answers in the negative

    If an unexpected node answers, start an election

  • Lecture 12 Page 22 CS 188,Winter 2015

    The Practicality of the Bully Algorithm

    The bully algorithm works reasonably well if the timeouts are effective A timeout occurring really means the

    site in question is down And there are no partitions at all

    If there are, what happens?

  • Lecture 12 Page 23 CS 188,Winter 2015

    The Invitation Algorithm

    More practical than bully algorithm Doesnt depend on timeouts

    But its results are not as definitive An asynchronous algorithm

  • Lecture 12 Page 24 CS 188,Winter 2015

    The Basic Idea Behind the Invitation Algorithm

    A current coordinator tries to get all other nodes to agree to his leadership

    If more than one coordinator around, get together and merge groups

    Use timeouts only to allow progress, not to make definitive decisions

    No set priorities for who will be coordinator

  • Lecture 12 Page 25 CS 188,Winter 2015

    The Invitation Algorithm and Group Numbers

    The invitation algorithm recruits a group of nodes to work together More than one group can exist

    simultaneously Group numbers identify the group Why not identify with coordinator ID?

    Because one node can serially coordinate many groups

  • Lecture 12 Page 26 CS 188,Winter 2015

    The Basic Operation of the Invitation Algorithm

    Coordinators in a normal state periodically check all other nodes

    If any other node is a coordinator, try to merge the groups

    If timeouts occur, dont worry about it Also dont worry if a response to

    check comes from this or earlier request

  • Lecture 12 Page 27 CS 188,Winter 2015

    Merging in the Invitation Algorithm

    Merging always requires forming new group May have same coordinator, but

    different group number Coordinator who initiates merge asks

    all other known coordinators to merge They ask their group members Original group members also asked

  • Lecture 12 Page 28 CS 188,Winter 2015

    A Simplified Example

    1

    1

    1

    2

    3

    3

    3

    4

    Node 1 checks for other

    coordinator

    AreYouCoordinator?

    AreYouCoordinator?

    Yes

    No

    So node 1 finds another coordinator Node 1 asks the other coordinator and his old node to join his group

    Invite

    Invite

    Invite on behalf of node

    1

    1 1

    1 1

    Accept

    Accept

    UP ={1,2,3,4}

    Ready

    Ready

    If all members of UP{} respond, were fine Node 1 forms a new group

  • Lecture 12 Page 29 CS 188,Winter 2015

    The Reorganization State Nodes enter the reorganization state

    after getting their answer Whats the point of this state?

    Why not just start up the group? After all, we all know whos going

    to be a member Or do we?

  • Lecture 12 Page 30 CS 188,Winter 2015

    Why We Need Another Round of Messages

    1

    2

    3

    4

    1 1

    1 1

    Invitation

    Invitation

    Who does 1 think will join the group, at this point? 2 and 3

    Invitation

    Assuming no timeouts, 4 will also join And 2 needs to know that And what if someone crashes? Presumably not accepting the invitation?

  • Lecture 12 Page 31 CS 188,Winter 2015

    Timeouts in the Merge

    Dont worry too much about them Some nodes respond before the timeout

    Some dont If you dont catch them this time, you

    might the next

  • Lecture 12 Page 32 CS 188,Winter 2015

    Straggler Messages

    This algorithm is asynchronous So messages may come in late

    What do we do when messages arrive late?

    Mostly, reject them How do we tell?

    Messages contain group number

  • Lecture 12 Page 33 CS 188,Winter 2015

    Multiple Simultaneous Groups

    The invitation algorithm allows multiple simultaneous groups to exist Each with a proper coordinator

    Is this a good thing? No, but what are the alternatives?

    No node ever belongs to more than one group, at least

  • Lecture 12 Page 34 CS 188,Winter 2015

    Paxos A family of algorithms that allow a

    distributed system to reach agreement In the face of delays and failures Cant perfectly guarantee progress

    But makes progress in realistic conditions Does guarantee consistency Usually defined to reach consensus on some

    value v

  • Lecture 12 Page 35 CS 188,Winter 2015

    Paxos Assumptions Processors are of variable speed and may

    fail Might recover after failure But they dont lie

    Any processor can send a message to any other processor

    Messages can be lost, arbitrarily delayed, reordered, or duplicated But never corrupted

  • Lecture 12 Page 36 CS 188,Winter 2015

    Paxos Processor Roles Client

    Issues a request, waits for a response Acceptor/voter

    Remembers things for the protocol Proposer (simpler if theres only one)

    Assists client in getting a response Learner

    Actually executes a request Leader

    One of the proposers that leads the process One processor can play several roles

    Usually, all processes are acceptors, proposers, and learners

  • Lecture 12 Page 37 CS 188,Winter 2015

    Paxos Quorums Collections of acceptors that make decisions

    Several different quorums in system Messages are sent to quorums, not single

    acceptors Messages only effective if all quorum members

    receive it Similarly, all acceptors in a quorum must send

    a message for to be effective If any member of the quorum survives, its

    decisions survive

  • Lecture 12 Page 38 CS 188,Winter 2015

    Quorum Membership All quorums must contain a majority of

    all acceptors in the system Any two quorums must share at least

    one acceptor E.g., if there are four acceptors

    {1,2,3,4}, quorums might be: {1,2,3}, {1,2,4}, {2,3,4}, {1,3,4}

  • Lecture 12 Page 39 CS 188,Winter 2015

    Paxos Rounds

    Paxos proceeds in rounds In response to a client request If the round reaches agreement, the

    client gets a response If not, you start another round Continue till a round reaches

    agreement

  • Lecture 12 Page 40 CS 188,Winter 2015

    A Simple Paxos Round

    C P

    A1

    A2

    A3

    L1

    L2

    1. request

    2. prepare(N)

    3. promise(N,null) 3. promise(N,null) 3. promise(N, null)

    4. accept(N,Vres)

    5. accepted(N,Vmax)

    Vres is a result chosen by P, if

    no promise had a value

    6. response

    N is a bigger number than P has ever used or seen

    before

    If an acceptor ever promised on this

    item before, it returns the

    generation and value from that run of Paxos, not null

  • Lecture 12 Page 41 CS 188,Winter 2015

    The Point of Different Paxos Roles

    C P

    A1

    A2

    A3

    L1

    L2

    The client wants to get

    something done

    The proposer coordinates

    protocol activities

    The acceptors ensure proper concurrent

    behavior and handle proposer failures

    The learners ensure redundant memory of the

    result of a decision Remember!

    One machine can play multiple roles

  • Lecture 12 Page 42 CS 188,Winter 2015

    Paxos Error Handling Some cases simple, some complex A simple case:

    One of the acceptors fails If theres still a quorum, no problem Go ahead without him

    Another simple case: One of the learners failed If any learners are left, theyll provide the

    right response to the client

  • Lecture 12 Page 43 CS 188,Winter 2015

    More Complex Error Cases

    Things like failure of proposer in middle of a round

    Paxos chooses a new leader and uses him from this point

    What if old leader comes back? Even more complex, but it works out

  • Lecture 12 Page 44 CS 188,Winter 2015

    Paxos and Overheads

    Generally quite expensive In messages and thus delays

    Many optimizations possible Some dont alter the protocol

    characteristics Some trade off handling some error

    conditions for better performance

  • Lecture 12 Page 45 CS 188,Winter 2015

    Byzantine Agreement

    Life can be a lot worse than merely being unable to rely on timeouts

    What if one of the nodes were working with is lying?

    How can we reach agreement if we cant trust all the participants?

  • Lecture 12 Page 46 CS 188,Winter 2015

    The Purpose of Byzantine Agreement

    Well, why would one of our distributed system components lie?

    It probably wouldnt But it might contain a bug If it contains the worst possible bug,

    what can it do? Essentially, inadvertently lie

  • Lecture 12 Page 47 CS 188,Winter 2015

    The Realism of Byzantine Agreement

    It isnt realistic It doesnt really happen No one really uses it But it demonstrates a limit on how

    badly things can go while still allowing agreement

  • Lecture 12 Page 48 CS 188,Winter 2015

    Why Is It Called Byzantine?

    After the fall of Rome itself, the empire lived on in the east Called Byzantium

    Byzantium survived for around 1000 years

    The Byzantines were famous for their treachery and double-dealing

  • Lecture 12 Page 49 CS 188,Winter 2015

    The Byzantine General Problem Several Byzantine generals each command

    their own army They are far apart and communicate with

    messengers The emperor wants to attack the Turks If all generals attack, theyll win

    Even if a majority attack, theyll win Retreating is OK, if everyone does it

    But the Turks may have bribed some generals

  • Lecture 12 Page 50 CS 188,Winter 2015

    The Complete Problem Statement Messages are point-to-point Messages are reliably delivered, with a

    predictable timeout Failure to receive message in time

    means sender is a traitor Traitors can send any messages they

    please But cannot forge their identities

  • Lecture 12 Page 51 CS 188,Winter 2015

    How Many Traitors Is Too Many?

    Can all the loyal generals reach agreement on whether to attack or retreat?

    Or can the traitors prevent them from reaching any agreement?

    How many generals must the Turks bribe before no agreement is possible?

  • Lecture 12 Page 52 CS 188,Winter 2015

    The Answer

    If the Turks bribe 1/3 of the generals, the remaining 2/3s cannot reach agreement

    How can that be? Why not just a majority? Easiest to consider in the case of a

    commander

  • Lecture 12 Page 53 CS 188,Winter 2015

    The 3-General Byzantine Problem

    Commander

    What if theyre all loyal?

    Attack Attack

    Everyone attacks and the Turk is vanquished

    But what if the commander is a traitor?

    Attack Retreat

    One general attacks, one retreats, the traitor pockets the bribe, and the Turks win

  • Lecture 12 Page 54 CS 188,Winter 2015

    Cant the Loyal Generals Check Their Orders?

    Commander

    Attack Retreat

    1

    2 3

    Generals 2 and 3 check their orders

    Retreat

    Attack

    They figure out 1 is a traitor and come to their own agreement

  • Lecture 12 Page 55 CS 188,Winter 2015

    But What if the Commander Wasnt the Traitor?

    Commander

    Attack Attack

    1

    2 3

    3 is the traitor, this time Generals 2 and 3 check their orders

    Retreat

    Attack

    They figure out 1 is a traitor and come to their own agreement But 1 isnt the traitor, 3 is the traitor He convinces 2 to retreat, 1 is slaughtered attacking, and 3 pockets the bribe

  • Lecture 12 Page 56 CS 188,Winter 2015

    Can General 2 Tell Which Scenario Is Occurring?

    When 1 was the traitor, 2 saw: When 3 was the traitor, 2 saw:

    1

    2 3 Retreat

    Attack 1

    2 3 Retreat

    Attack

    2 cant tell the difference, so he cant decide whether to attack or retreat

  • Lecture 12 Page 57 CS 188,Winter 2015

    What If There Were 4 Generals?

    1

    2

    Commander

    3 4

    What if the commander (1) is the traitor? If he doesnt send some messages, hell be seen as the traitor But what can he send?

    Attack Attack Retreat

  • Lecture 12 Page 58 CS 188,Winter 2015

    Can the Three Loyal Generals Reach Agreement?

    1

    2

    Commander

    3 4

    Attack Attack Retreat

    They can exchange all the messages and let the majority rule Since there are only two messages, the commander must have sent the same message to two nodes If the commander is loyal and someone else is lying, the majority represents the loyal commanders will

  • Lecture 12 Page 59 CS 188,Winter 2015

    But What if There Were Five Generals?

    1 Commander

    2 3 4 Attack Attack Retreat

    5 Retreat

    Pre-arrange a tie-breaker E.g., always retreat on ties All the loyal generals then retreat And the traitor must explain his failure to the Turks

  • Lecture 12 Page 60 CS 188,Winter 2015

    What If You Dont Want a Commander?

    What if you want everyone to vote? And accept the majority?

    With the guarantee that all loyal nodes abide by the majority?

    Serially treat each node as the commander Reach agreement on his vote Then move on to the next node

  • Lecture 12 Page 61 CS 188,Winter 2015

    The Trick Behind Byzantine Agreement

    Everyone must know what everyone else thinks about everything else

    Not just what I think the commander said, but what everyone else claims the commander said

    Resulting algorithms are tricky and expensive But it could be (and will be) worse

  • Lecture 12 Page 62 CS 188,Winter 2015

    Authenticated Byzantine Agreement

    What if the messages are signed in an unforgeable way?

    Then dishonest generals cant lie about what honest general told them

    In this case, honest generals reach agreement regardless of how many are dishonest