SD T 05 ToleranciaFalhas.pdf

Embed Size (px)

Citation preview

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    1/51

    Distributed SystemsPrinciples and Paradigms

    Maarten van Steen

    VU Amsterdam, Dept. Computer ScienceRoom R4.20, [email protected]

    Chapter 08: Fault Tolerance

    Version: December 2, 2010

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    2/51

    Contents

    Chapter01: Introduction02: Architectures03: Processes04: Communication

    05: Naming06: Synchronization07: Consistency & Replication08: Fault Tolerance09: Security10: Distributed Object-Based Systems11: Distributed File Systems12: Distributed Web-Based Systems13: Distributed Coordination-Based Systems

    2/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    3/51

    Fault Tolerance

    Introduction

    Basic conceptsProcess resilienceReliable client-server communicationReliable group communicationDistributed commitRecovery

    3/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    4/51

    Fault Tolerance 8.1 Introduction

    Dependability

    BasicsA component provides services to clients . To provide services, thecomponent may require the services from other components a componentmay depend on some other component.

    SpecicallyA component C depends on C if the correctness of C s behavior depends onthe correctness of C s behavior. Note : components are processes orchannels.

    Availability Readiness for usageReliability Continuity of service deliverySafety Very low probability of catastrophesMaintainability How easy can a failed system be repaired

    4/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    5/51

    Fault Tolerance 8.1 Introduction

    Terminology

    Subtle differencesFailure: When a component is not living up to its specications, a failureoccursError: That part of a components state that can lead to a failureFault: The cause of an error

    What to do about faults

    Fault prevention: prevent the occurrence of a faultFault tolerance: build a component such that it can mask the presence offaultsFault removal: reduce presence, number, seriousness of faultsFault forecasting: estimate present number, future incidence, andconsequences of faults

    5/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    6/51

    Fault Tolerance 8.1 Introduction

    Failure models

    Failure semantics

    Crash failures: Component halts, but behaves correctly before haltingOmission failures: Component fails to respondTiming failures: Output is correct, but lies outside a specied real-time

    interval ( performance failures: too slow)Response failures: Output is incorrect (but can at least not be accountedto another component)

    Value failure: Wrong value is producedState transition failure: Execution of component brings it into awrong state

    Arbitrary failures: Component produces arbitrary output and be subjectto arbitrary timing failures

    6/65

    l l d

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    7/51

    Fault Tolerance 8.1 Introduction

    Crash failures

    Problem

    Clients cannot distinguish between a crashed component and one that is justa bit slow

    Consider a server from which a client is expecting output

    Is the server perhaps exhibiting timing or omission failures?Is the channel between client and server faulty?

    Assumptions we can make

    Fail-silent: The component exhibits omission or crash failures; clientscannot tell what went wrongFail-stop: The component exhibits crash failures, but its failure can bedetected (either through announcement or timeouts)Fail-safe: The component exhibits arbitrary, but benign failures (theycant do any harm)

    7/65

    F l T l 8 2 P R ili

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    8/51

    Fault Tolerance 8.2 Process Resilience

    Process resilience

    Basic issueProtect yourself against faulty processes by replicating and distributingcomputations in a group.

    Flat groups: Good for fault tolerance as information exchangeimmediately occurs with all group members; however, may imposemore overhead as control is completely distributed (hard toimplement).

    Hierarchical groups: All communication through a single coordinatornot really fault tolerant and scalable, but relatively easy to

    implement.

    8/65

    Fault Tolerance 8 2 Process Resilience

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    9/51

    Fault Tolerance 8.2 Process Resilience

    Process resilience

    (a) (b)

    Flat group Hierarchical group Coordinator

    Worker

    9/65

    Fault Tolerance 8 2 Process Resilience

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    10/51

    Fault Tolerance 8.2 Process Resilience

    Groups and failure masking

    K-fault tolerant groupWhen a group can mask any k concurrent member failures ( k is calleddegree of fault tolerance).

    How large does a k -fault tolerant group need to be?

    Assume crash/performance failure semantics a total of k + 1members are needed to survive k member failures.Assume arbitrary failure semantics, and group output dened by voting

    a total of 2 k + 1 members are needed to survive k member failures.

    Assumption

    All members are identical, and process all input in the same order onlythen are we sure that they do exactly the same thing.

    10/65

    Fault Tolerance 8 2 Process Resilience

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    11/51

    Fault Tolerance 8.2 Process Resilience

    Groups and failure masking

    ScenarioGroup members are not identical, i.e., we have a distributedcomputation Nonfaulty group members should reach agreement onthe same value.

    1 13

    2 2

    b b

    a ab a

    Process 2 tellsdifferent things

    Process 3 passesa different value

    3

    (a) (b)

    11/65

    Fault Tolerance 8.2 Process Resilience

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    12/51

    Fault Tolerance 8.2 Process Resilience

    Groups and failure masking

    ScenarioAssuming arbitrary failure semantics, we need 3 k + 1 group membersto survive the attacks of k faulty members. This is also known asByzantine failures .

    EssenceWe are trying to reach a majority vote among the group of loyalists, in

    the presence of k traitors need 2 k + 1 loyalists.

    12/65

    Fault Tolerance 8.2 Process Resilience

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    13/51

    Groups and failure masking

    1 2

    3 4

    1

    2

    2 4

    z

    41 x

    1

    4

    y

    2

    1234

    Got(Got(Got(Got(

    1, 2, x, 41, 2, y, 41 ,2 ,3 ,41, 2, z, 4

    ))))

    1 Got 2 Got 4 Got( ( (( ( (( ( (

    1, 1, 1,a, e, 1,1, 1, i,

    2, 2, 2,b, f, 2,2, 2, j,

    y, x, x,c, g, y,z, z, k,

    4 4 4d h 44 4 l

    ) ) )) ) )) ) )

    (a)

    (b) (c)

    Faulty process

    (a) what they send to each other(b) what each one got from the

    other(c) what each one got in second

    step

    13/65

    Fault Tolerance 8.2 Process Resilience

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    14/51

    Groups and failure masking

    1

    23

    121

    x

    y

    2

    123

    Got(Got(Got(

    1, 2, x1, 2, y1, 2, 3

    )))

    1 Got 2 Got( (( (1, 1,a, d,

    2, 2,b, e,

    y xc f

    ) )) )

    (a)

    (b) (c)

    Faulty process(a) what they send to each other(b) what each one got from the

    other(c) what each one got in second

    step

    14/65

    Fault Tolerance 8.2 Process Resilience

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    15/51

    Groups and failure masking

    IssueWhat are the necessary conditions for reaching agreement?

    Synchronous

    Asynchronous

    OrderedUnordered

    Bounded

    Bounded

    Unbounded

    Unbounded

    Unicast UnicastMulticast Multicast

    X X

    X

    X

    X

    X

    X

    X

    C omm

    uni c

    a t i on

    d el a

    y

    P r o c e s s

    b e

    h a v

    i o r

    Message ordering

    Message transmission

    Process: Synchronous operate in lockstepDelays: Are delays on communication bounded?Ordering: Are messages delivered in the order they were sent?Transmission: Are messages sent one-by-one, or multicast?

    15/65

    Fault Tolerance 8.2 Process Resilience

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    16/51

    Failure detection

    EssenceWe detect failures through timeout mechanisms

    Setting timeouts properly is very difcult and applicationdependentYou cannot distinguish process failures from network failuresWe need to consider failure notication throughout the system:

    Gossiping (i.e., proactively disseminate a failure detection)On failure detection, pretend you failed as well

    16/65

    Fault Tolerance 8.3 Reliable Communication

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    17/51

    Reliable communication

    So farConcentrated on process resilience (by means of process groups).What about reliable communication channels?

    Error detectionFraming of packets to allow for bit error detectionUse of frame numbering to detect packet loss

    Error correctionAdd so much redundancy that corrupted packets can beautomatically corrected Request retransmission of lost, or last N packets

    17/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    18/51

    Fault Tolerance 8.4 Reliable Group Communication

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    19/51

    Reliable multicasting

    ObservationIf we can stick to a local-area network, reliable multicasting is easy

    PrincipleLet the sender log messages submitted to channel c :

    If P sends message m , m is stored in a history bufferEach receiver acknowledges the receipt of m , or requestsretransmission at P when noticing message lostSender P removes m from history buffer when everyone has

    acknowledged receipt

    QuestionWhy doesnt this scale?

    24/65

    Fault Tolerance 8.4 Reliable Group Communication

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    20/51

    Scalable reliable multicasting: Feedback suppression

    Basic ideaLet a process P suppress its own feedback when it notices anotherprocess Q is already asking for a retransmission

    AssumptionsAll receivers listen to a common feedback channel to whichfeedback messages are submittedProcess P schedules its own feedback message randomly , andsuppresses it when observing another feedback message

    25/65

    Fault Tolerance 8.4 Reliable Group Communication

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    21/51

    Scalable reliable multicasting: Feedback suppression

    NACK

    NACK

    NACK NACK NACKT=3 T=4 T=1 T=2

    Sender Receiver Receiver Receiver Receiver

    Network

    Receivers suppress their feedbackSender receivesonly one NACK

    QuestionWhy is the random schedule so important?

    26/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    22/51

    Fault Tolerance 8.4 Reliable Group Communication

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    23/51

    Scalable reliable multicasting: Hierarchical solutions

    CC

    S(Long-haul) connection

    Sender

    Coordinator

    RootRReceiver

    Local-area network

    QuestionWhats the main problem with this solution?

    28/65

    Fault Tolerance 8.4 Reliable Group Communication

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    24/51

    Atomic multicast

    P1 joins the group P3 crashes P3 rejoins

    G = {P1,P2,P3,P4} G = {P1,P2,P4} G = {P1,P2,P3,P4}

    Partial multicastfrom P3 is discarded

    P1

    P2

    P3

    P4

    Time

    Reliable multicast by multiplepoint-to-point messages

    IdeaFormulate reliable multicasting in the presence of process failures interms of process groups and changes to group membership.

    29/65

    Fault Tolerance 8.4 Reliable Group Communication

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    25/51

    Atomic multicast

    P1 joins the group P3 crashes P3 rejoins

    G = {P1,P2,P3,P4} G = {P1,P2,P4} G = {P1,P2,P3,P4}

    Partial multicastfrom P3 is discarded

    P1

    P2

    P3

    P4

    Time

    Reliable multicast by multiplepoint-to-point messages

    GuaranteeA message is delivered only to the nonfaulty members of the currentgroup. All members should agree on the current group membershipVirtually synchronous multicast .

    30/65

    Fault Tolerance 8.5 Distributed Commit

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    26/51

    Distributed commit

    Two-phase commitThree-phase commit

    Essential issueGiven a computation distributed across a process group, how can weensure that either all processes commit to the nal result, or none ofthem do ( atomicity )?

    39/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    27/51

    Fault Tolerance 8.5 Distributed Commit

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    28/51

    Two-phase commit

    ModelThe client who initiated the computation acts as coordinator;processes required to commit are the participants

    Phase 1a: Coordinator sends vote-request to participants (alsocalled a pre-write )Phase 1b: When participant receives vote-request it returns eithervote-commit or vote-abort to coordinator. If it sends vote-abort , itaborts its local computationPhase 2a: Coordinator collects all votes; if all are vote-commit , it

    sends global-commit to all participants, otherwise it sendsglobal-abort Phase 2b: Each participant waits for global-commit or global-abort and handles accordingly.

    40/65

    Fault Tolerance 8.5 Distributed Commit

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    29/51

    Two-phase commit

    COMMIT

    INIT

    WAIT

    ABORT

    CommitVote-request

    Vote-abortGlobal-abort

    Vote-commitGlobal-commit

    (a)

    COMMIT

    INIT

    READY

    ABORT

    Vote-requestVote-commit

    Vote-requestVote-abort

    Global-abort ACK

    Global-commit ACK

    (b)

    Coordinator Participant

    41/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    30/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    31/51

    Fault Tolerance 8.5 Distributed Commit

    l

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    32/51

    2PC Failing participant

    Scenario

    Participant crashes in state S , and recovers to S

    Initial state: No problem: participant was unaware of protocolReady state: Participant is waiting to either commit or abort. Afterrecovery, participant needs to know which state transition it should make

    log the coordinators decisionAbort state: Merely make entry into abort state idempotent , e.g.,removing the workspace of resultsCommit state: Also make entry into commit state idempotent, e.g.,copying workspace to storage.

    ObservationWhen distributed commit is required, having participants use temporaryworkspaces to keep their results allows for simple recovery in the presence offailures.

    42/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    33/51

    Fault Tolerance 8.5 Distributed Commit

    2PC F ili i i

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    34/51

    2PC Failing participant

    Scenario

    Participant crashes in state S , and recovers to S Initial state: No problem: participant was unaware of protocolReady state: Participant is waiting to either commit or abort. Afterrecovery, participant needs to know which state transition it should make

    log the coordinators decisionAbort state: Merely make entry into abort state idempotent , e.g.,removing the workspace of resultsCommit state: Also make entry into commit state idempotent, e.g.,copying workspace to storage.

    ObservationWhen distributed commit is required, having participants use temporaryworkspaces to keep their results allows for simple recovery in the presence offailures.

    42/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    35/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    36/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    37/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    38/51

    Fault Tolerance 8.5 Distributed Commit

    3PC Failing participant

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    39/51

    3PC Failing participant

    Basic issue

    Can P nd out what it should it do after crashing in the ready or pre-commitstate, even if other participants or the coordinator failed?

    Reasoning

    Essence: Coordinator and participants on their way to commit, never differ bymore than one state transitionConsequence: If a participant timeouts in ready state, it can nd out at the

    coordinator or other participants whether it should abort, or enterpre-commit state

    Observation: If a participant already made it to the pre-commit state, it canalways safely commit (but is not allowed to do so for the sake of failingother processes)

    Observation: We may need to elect another coordinator to send off the nalCOMMIT

    47/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    40/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    41/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    42/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    43/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    44/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    45/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    46/51

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    47/51

    Fault Tolerance 8.6 Recovery

    Coordinated checkpointing

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    48/51

    p g

    Simple solutionUse a two-phase blocking protocol:

    A coordinator multicasts a checkpoint request messageWhen a participant receives such a message, it takes acheckpoint, stops sending (application) messages, and reportsback that it has taken a checkpointWhen all checkpoints have been conrmed at the coordinator, thelatter broadcasts a checkpoint done message to allow allprocesses to continue

    ObservationIt is possible to consider only those processes that depend on therecovery of the coordinator, and ignore the rest

    57/65

    Fault Tolerance 8.6 Recovery

    Coordinated checkpointing

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    49/51

    p g

    Simple solutionUse a two-phase blocking protocol:

    A coordinator multicasts a checkpoint request messageWhen a participant receives such a message, it takes acheckpoint, stops sending (application) messages, and reportsback that it has taken a checkpointWhen all checkpoints have been conrmed at the coordinator, thelatter broadcasts a checkpoint done message to allow allprocesses to continue

    ObservationIt is possible to consider only those processes that depend on therecovery of the coordinator, and ignore the rest

    57/65

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    50/51

    Fault Tolerance 8.6 Recovery

    Coordinated checkpointing

  • 8/2/2019 SD T 05 ToleranciaFalhas.pdf

    51/51

    Simple solutionUse a two-phase blocking protocol:

    A coordinator multicasts a checkpoint request messageWhen a participant receives such a message, it takes acheckpoint, stops sending (application) messages, and reports

    back that it has taken a checkpointWhen all checkpoints have been conrmed at the coordinator, thelatter broadcasts a checkpoint done message to allow allprocesses to continue

    ObservationIt is possible to consider only those processes that depend on therecovery of the coordinator, and ignore the rest

    57/65