SD T 05 ToleranciaFalhas.pdf

8/2/2019 SD T 05 ToleranciaFalhas.pdf

1/51

Distributed SystemsPrinciples and Paradigms

Maarten van Steen

VU Amsterdam, Dept. Computer ScienceRoom R4.20, [email protected]

Chapter 08: Fault Tolerance

Version: December 2, 2010


2/51

Contents

Chapter01: Introduction02: Architectures03: Processes04: Communication

05: Naming06: Synchronization07: Consistency & Replication08: Fault Tolerance09: Security10: Distributed Object-Based Systems11: Distributed File Systems12: Distributed Web-Based Systems13: Distributed Coordination-Based Systems

2/65


3/51

Fault Tolerance

Introduction

Basic conceptsProcess resilienceReliable client-server communicationReliable group communicationDistributed commitRecovery

3/65


4/51

Fault Tolerance 8.1 Introduction

Dependability

BasicsA component provides services to clients . To provide services, thecomponent may require the services from other components a componentmay depend on some other component.

SpecicallyA component C depends on C if the correctness of C s behavior depends onthe correctness of C s behavior. Note : components are processes orchannels.

Availability Readiness for usageReliability Continuity of service deliverySafety Very low probability of catastrophesMaintainability How easy can a failed system be repaired

4/65


5/51


Terminology

Subtle differencesFailure: When a component is not living up to its specications, a failureoccursError: That part of a components state that can lead to a failureFault: The cause of an error

What to do about faults

Fault prevention: prevent the occurrence of a faultFault tolerance: build a component such that it can mask the presence offaultsFault removal: reduce presence, number, seriousness of faultsFault forecasting: estimate present number, future incidence, andconsequences of faults

5/65


6/51


Failure models

Failure semantics

Crash failures: Component halts, but behaves correctly before haltingOmission failures: Component fails to respondTiming failures: Output is correct, but lies outside a specied real-time

interval ( performance failures: too slow)Response failures: Output is incorrect (but can at least not be accountedto another component)

Value failure: Wrong value is producedState transition failure: Execution of component brings it into awrong state

Arbitrary failures: Component produces arbitrary output and be subjectto arbitrary timing failures

6/65

l l d


7/51


Crash failures

Problem

Clients cannot distinguish between a crashed component and one that is justa bit slow

Consider a server from which a client is expecting output

Is the server perhaps exhibiting timing or omission failures?Is the channel between client and server faulty?

Assumptions we can make

Fail-silent: The component exhibits omission or crash failures; clientscannot tell what went wrongFail-stop: The component exhibits crash failures, but its failure can bedetected (either through announcement or timeouts)Fail-safe: The component exhibits arbitrary, but benign failures (theycant do any harm)

7/65

F l T l 8 2 P R ili


8/51

Fault Tolerance 8.2 Process Resilience

Process resilience

Basic issueProtect yourself against faulty processes by replicating and distributingcomputations in a group.

Flat groups: Good for fault tolerance as information exchangeimmediately occurs with all group members; however, may imposemore overhead as control is completely distributed (hard toimplement).

Hierarchical groups: All communication through a single coordinatornot really fault tolerant and scalable, but relatively easy to

implement.

8/65

Fault Tolerance 8 2 Process Resilience


9/51


Process resilience

(a) (b)

Flat group Hierarchical group Coordinator

Worker

9/65



10/51


Groups and failure masking

K-fault tolerant groupWhen a group can mask any k concurrent member failures ( k is calleddegree of fault tolerance).

How large does a k -fault tolerant group need to be?

Assume crash/performance failure semantics a total of k + 1members are needed to survive k member failures.Assume arbitrary failure semantics, and group output dened by voting

a total of 2 k + 1 members are needed to survive k member failures.

Assumption

All members are identical, and process all input in the same order onlythen are we sure that they do exactly the same thing.

10/65



11/51



ScenarioGroup members are not identical, i.e., we have a distributedcomputation Nonfaulty group members should reach agreement onthe same value.

1 13

2 2

b b

a ab a

Process 2 tellsdifferent things

Process 3 passesa different value

3

(a) (b)

11/65



12/51



ScenarioAssuming arbitrary failure semantics, we need 3 k + 1 group membersto survive the attacks of k faulty members. This is also known asByzantine failures .

EssenceWe are trying to reach a majority vote among the group of loyalists, in

the presence of k traitors need 2 k + 1 loyalists.

12/65



13/51


1 2

3 4

1

2

2 4

z

41 x

1

4

y

2

1234

Got(Got(Got(Got(

1, 2, x, 41, 2, y, 41 ,2 ,3 ,41, 2, z, 4

))))

1 Got 2 Got 4 Got( ( (( ( (( ( (

1, 1, 1,a, e, 1,1, 1, i,

2, 2, 2,b, f, 2,2, 2, j,

y, x, x,c, g, y,z, z, k,

4 4 4d h 44 4 l

) ) )) ) )) ) )

(a)

(b) (c)

Faulty process

(a) what they send to each other(b) what each one got from the

other(c) what each one got in second

step

13/65



14/51


1

23

121

x

y

2

123

Got(Got(Got(

1, 2, x1, 2, y1, 2, 3

)))

1 Got 2 Got( (( (1, 1,a, d,

2, 2,b, e,

y xc f

) )) )

(a)

(b) (c)

Faulty process(a) what they send to each other(b) what each one got from the

other(c) what each one got in second

step

14/65



15/51


IssueWhat are the necessary conditions for reaching agreement?

Synchronous

Asynchronous

OrderedUnordered

Bounded

Bounded

Unbounded

Unbounded

Unicast UnicastMulticast Multicast

X X

X

X

X

X

X

X

C omm

uni c

a t i on

d el a

y

P r o c e s s

b e

h a v

i o r

Message ordering

Message transmission

Process: Synchronous operate in lockstepDelays: Are delays on communication bounded?Ordering: Are messages delivered in the order they were sent?Transmission: Are messages sent one-by-one, or multicast?

15/65



16/51

Failure detection

EssenceWe detect failures through timeout mechanisms

Setting timeouts properly is very difcult and applicationdependentYou cannot distinguish process failures from network failuresWe need to consider failure notication throughout the system:

Gossiping (i.e., proactively disseminate a failure detection)On failure detection, pretend you failed as well

16/65

Fault Tolerance 8.3 Reliable Communication


17/51

Reliable communication

So farConcentrated on process resilience (by means of process groups).What about reliable communication channels?

Error detectionFraming of packets to allow for bit error detectionUse of frame numbering to detect packet loss

Error correctionAdd so much redundancy that corrupted packets can beautomatically corrected Request retransmission of lost, or last N packets

17/65


18/51

Fault Tolerance 8.4 Reliable Group Communication


19/51

Reliable multicasting

ObservationIf we can stick to a local-area network, reliable multicasting is easy

PrincipleLet the sender log messages submitted to channel c :

If P sends message m , m is stored in a history bufferEach receiver acknowledges the receipt of m , or requestsretransmission at P when noticing message lostSender P removes m from history buffer when everyone has

acknowledged receipt

QuestionWhy doesnt this scale?

24/65



20/51

Scalable reliable multicasting: Feedback suppression

Basic ideaLet a process P suppress its own feedback when it notices anotherprocess Q is already asking for a retransmission

AssumptionsAll receivers listen to a common feedback channel to whichfeedback messages are submittedProcess P schedules its own feedback message randomly , andsuppresses it when observing another feedback message

25/65



21/51

Scalable reliable multicasting: Feedback suppression

NACK

NACK

NACK NACK NACKT=3 T=4 T=1 T=2

Sender Receiver Receiver Receiver Receiver

Network

Receivers suppress their feedbackSender receivesonly one NACK

QuestionWhy is the random schedule so important?

26/65


22/51



23/51

Scalable reliable multicasting: Hierarchical solutions

CC

S(Long-haul) connection

Sender

Coordinator

RootRReceiver

Local-area network

QuestionWhats the main problem with this solution?

28/65



24/51

Atomic multicast

P1 joins the group P3 crashes P3 rejoins

G = {P1,P2,P3,P4} G = {P1,P2,P4} G = {P1,P2,P3,P4}

Partial multicastfrom P3 is discarded

P1

P2

P3

P4

Time

Reliable multicast by multiplepoint-to-point messages

IdeaFormulate reliable multicasting in the presence of process failures interms of process groups and changes to group membership.

29/65



25/51

Atomic multicast

P1 joins the group P3 crashes P3 rejoins

G = {P1,P2,P3,P4} G = {P1,P2,P4} G = {P1,P2,P3,P4}

Partial multicastfrom P3 is discarded

P1

P2

P3

P4

Time

Reliable multicast by multiplepoint-to-point messages

GuaranteeA message is delivered only to the nonfaulty members of the currentgroup. All members should agree on the current group membershipVirtually synchronous multicast .

30/65

Fault Tolerance 8.5 Distributed Commit


26/51

Distributed commit

Two-phase commitThree-phase commit

Essential issueGiven a computation distributed across a process group, how can weensure that either all processes commit to the nal result, or none ofthem do ( atomicity )?

39/65


27/51



28/51

Two-phase commit

ModelThe client who initiated the computation acts as coordinator;processes required to commit are the participants

Phase 1a: Coordinator sends vote-request to participants (alsocalled a pre-write )Phase 1b: When participant receives vote-request it returns eithervote-commit or vote-abort to coordinator. If it sends vote-abort , itaborts its local computationPhase 2a: Coordinator collects all votes; if all are vote-commit , it

sends global-commit to all participants, otherwise it sendsglobal-abort Phase 2b: Each participant waits for global-commit or global-abort and handles accordingly.

40/65



29/51

Two-phase commit

COMMIT

INIT

WAIT

ABORT

CommitVote-request

Vote-abortGlobal-abort

Vote-commitGlobal-commit

(a)

COMMIT

INIT

READY

ABORT

Vote-requestVote-commit

Vote-requestVote-abort

Global-abort ACK

Global-commit ACK

(b)

Coordinator Participant

41/65


30/51


31/51


l


32/51

2PC Failing participant

Scenario

Participant crashes in state S , and recovers to S

Initial state: No problem: participant was unaware of protocolReady state: Participant is waiting to either commit or abort. Afterrecovery, participant needs to know which state transition it should make

log the coordinators decisionAbort state: Merely make entry into abort state idempotent , e.g.,removing the workspace of resultsCommit state: Also make entry into commit state idempotent, e.g.,copying workspace to storage.

ObservationWhen distributed commit is required, having participants use temporaryworkspaces to keep their results allows for simple recovery in the presence offailures.

42/65


33/51


2PC F ili i i


34/51


Scenario

Participant crashes in state S , and recovers to S Initial state: No problem: participant was unaware of protocolReady state: Participant is waiting to either commit or abort. Afterrecovery, participant needs to know which state transition it should make

log the coordinators decisionAbort state: Merely make entry into abort state idempotent , e.g.,removing the workspace of resultsCommit state: Also make entry into commit state idempotent, e.g.,copying workspace to storage.

ObservationWhen distributed commit is required, having participants use temporaryworkspaces to keep their results allows for simple recovery in the presence offailures.

42/65


35/51


36/51


37/51


38/51




39/51


Basic issue

Can P nd out what it should it do after crashing in the ready or pre-commitstate, even if other participants or the coordinator failed?

Reasoning

Essence: Coordinator and participants on their way to commit, never differ bymore than one state transitionConsequence: If a participant timeouts in ready state, it can nd out at the

coordinator or other participants whether it should abort, or enterpre-commit state

Observation: If a participant already made it to the pre-commit state, it canalways safely commit (but is not allowed to do so for the sake of failingother processes)

Observation: We may need to elect another coordinator to send off the nalCOMMIT

47/65


40/51


41/51


42/51


43/51


44/51


45/51


46/51


47/51

Fault Tolerance 8.6 Recovery

Coordinated checkpointing


48/51

p g

Simple solutionUse a two-phase blocking protocol:

A coordinator multicasts a checkpoint request messageWhen a participant receives such a message, it takes acheckpoint, stops sending (application) messages, and reportsback that it has taken a checkpointWhen all checkpoints have been conrmed at the coordinator, thelatter broadcasts a checkpoint done message to allow allprocesses to continue

ObservationIt is possible to consider only those processes that depend on therecovery of the coordinator, and ignore the rest

57/65




49/51

p g


A coordinator multicasts a checkpoint request messageWhen a participant receives such a message, it takes acheckpoint, stops sending (application) messages, and reportsback that it has taken a checkpointWhen all checkpoints have been conrmed at the coordinator, thelatter broadcasts a checkpoint done message to allow allprocesses to continue


57/65


50/51




51/51


A coordinator multicasts a checkpoint request messageWhen a participant receives such a message, it takes acheckpoint, stops sending (application) messages, and reports

back that it has taken a checkpointWhen all checkpoints have been conrmed at the coordinator, thelatter broadcasts a checkpoint done message to allow allprocesses to continue


57/65

Documents

SD T 05 ToleranciaFalhas.pdf