Upload
irene-york
View
214
Download
1
Embed Size (px)
Citation preview
Reliable Communication Reliable Communication in the Presence of in the Presence of
FailuresFailuresBased on the paper by: Kenneth Birman and Thomas A. Based on the paper by: Kenneth Birman and Thomas A.
JosephJoseph
Cesar TalledoCesar Talledo
COEN 317COEN 317
Fall 05Fall 05
AgendaAgenda IntroductionIntroduction Challenges in Fault-Tolerant Distributed SystemsChallenges in Fault-Tolerant Distributed Systems Consistent Event Ordering in a Distributed Consistent Event Ordering in a Distributed
SystemSystem Key Aspects of Proposed ApproachKey Aspects of Proposed Approach Logical vs. Physical FailuresLogical vs. Physical Failures Proposed Broadcast PrimitivesProposed Broadcast Primitives
GBCASTGBCAST ABCASTABCAST CBCASTCBCAST
Advantages of the Proposed ApproachAdvantages of the Proposed Approach Sample Application: Updating Replicated DataSample Application: Updating Replicated Data Final ThoughtsFinal Thoughts
IntroductionIntroduction White paper written in 1987, funded by DoDWhite paper written in 1987, funded by DoD Purpose: Purpose:
Present a set of communication primitives that Present a set of communication primitives that facilitate distributed processing in the presence of facilitate distributed processing in the presence of failuresfailures
System Assumptions:System Assumptions: One computation, executed by multiple processes in
a distributed system (DS) Each process has a local state Processes communicate via broadcasts with other
processes Processes may “halt” at any time
The paper does not address byzantine failures
Challenges in Fault-Challenges in Fault-Tolerant DSTolerant DS
Challenges in fault-tolerant distributed systemsChallenges in fault-tolerant distributed systems How does the system handle exit/re-entry of processesHow does the system handle exit/re-entry of processes
A process may leave the computation (due to failure)A process may leave the computation (due to failure) A process may re-enter the computation (recovery)A process may re-enter the computation (recovery)
How do processes communicate with each other whenHow do processes communicate with each other when Messages may be lost by communication subsystemMessages may be lost by communication subsystem Messages may be re-ordered while in transitMessages may be re-ordered while in transit Some receiver processes may haltSome receiver processes may halt Sender process may haltSender process may halt
How to handle failures when system is asynchronousHow to handle failures when system is asynchronous Goal: continue the computation in the presence of Goal: continue the computation in the presence of
failuresfailures
Consistent Event Consistent Event Ordering in DSOrdering in DS
Key aspects of distributed processingKey aspects of distributed processing Processes must have a Processes must have a consistent viewconsistent view of the ordering of the ordering
of events (i.e., messages) during the computationof events (i.e., messages) during the computation A system must provide ordering, but also allow A system must provide ordering, but also allow
concurrencyconcurrency Failures can affect consistent view of event Failures can affect consistent view of event
orderingordering Example:Example:
Process ‘A’ sends a broadcast message, then fails Process ‘A’ sends a broadcast message, then fails Process ‘B’ receives the message, then notices failureProcess ‘B’ receives the message, then notices failure Process ‘C’ notices failure, then receives messageProcess ‘C’ notices failure, then receives message Process ‘D’ never receives message, but notices failureProcess ‘D’ never receives message, but notices failure Process ‘F’ receives message, but never notices failureProcess ‘F’ receives message, but never notices failure Process ‘G’ never receives message nor notices failureProcess ‘G’ never receives message nor notices failure
KeyKey: All processes must agree on the events that : All processes must agree on the events that occurred and on the order of those eventsoccurred and on the order of those events
... Consistent Event ... Consistent Event Ordering in DSOrdering in DS
Approaches to keep consistent event Approaches to keep consistent event ordering in the presence of failuresordering in the presence of failures 1) Run agreement protocol after a failure is 1) Run agreement protocol after a failure is
detecteddetected Problems: slow and requires synchronous communicationProblems: slow and requires synchronous communication
2) Use this rule: A process should discard 2) Use this rule: A process should discard messages received from a process that is known messages received from a process that is known to have failedto have failed
Problem: Processes learn of failures at different times, Problem: Processes learn of failures at different times, so system may still be inconsistentso system may still be inconsistent
Proposed Idea: Proposed Idea: ““Construct a broadcast protocol that orders Construct a broadcast protocol that orders
messages relative to failure and recovery events”messages relative to failure and recovery events”
Key Aspects of Proposed Key Aspects of Proposed Approach Approach
Failure and recovery are treated as Failure and recovery are treated as system eventssystem events, , just like local processing and messagesjust like local processing and messages
Thus, failure and recovery have an Thus, failure and recovery have an orderingordering with with respect to messages & local processingrespect to messages & local processing
The paper proposes communication primitives that The paper proposes communication primitives that maintain consistent ordering among processesmaintain consistent ordering among processes All processes experience the same sequence of events, All processes experience the same sequence of events,
including failuresincluding failures Advantages: Advantages:
When a process notices a failure, it can assume that the When a process notices a failure, it can assume that the rest of the system has noticed the order of the failure rest of the system has noticed the order of the failure consistentlyconsistently
Therefore, the process can immediately react to the failure Therefore, the process can immediately react to the failure (no agreement protocol required)(no agreement protocol required)
Logical vs. Physical Logical vs. Physical FailuresFailures
Failures (i.e., lost messages, process halts) are Failures (i.e., lost messages, process halts) are physical events, occurring in real-timephysical events, occurring in real-time Processes cannot control when a failure occursProcesses cannot control when a failure occurs
Recall that processes use Recall that processes use logical clockslogical clocks to track to track order of events in a distributed computationorder of events in a distributed computation
In order to treat failures as ordered events, In order to treat failures as ordered events, physical failures must be physical failures must be mappedmapped to logical to logical failuresfailures
How?How? Introduce “Process-Group View”: Logical snapshot of Introduce “Process-Group View”: Logical snapshot of
processes involved in the distributed computationprocesses involved in the distributed computation Changes in the properties of the group (i.e., failures, Changes in the properties of the group (i.e., failures,
recovery) are recovery) are orderedordered with respect to other events with respect to other events These changes are communicated among processes by These changes are communicated among processes by
using the proposed broadcast primitivesusing the proposed broadcast primitives
Proposed Broadcast PrimitivesProposed Broadcast Primitives
3 Broadcast Communication Primitives3 Broadcast Communication Primitives Group-Broadcast (GBCAST)Group-Broadcast (GBCAST) Atomic-Broadcast (ABCAST)Atomic-Broadcast (ABCAST) Causal-Broadcast (CBCAST)Causal-Broadcast (CBCAST)
All 3 are atomic: All processes receive All 3 are atomic: All processes receive the message or non-receive the messagethe message or non-receive the message
Emphasis on lightweight primitives: Emphasis on lightweight primitives: quick processing is desired to improve quick processing is desired to improve performance performance
GBCASTGBCAST
GBCAST GBCAST Group Broadcast Group Broadcast Used to keep consistent “process group view”Used to keep consistent “process group view” Call: GBCAST(action, G)Call: GBCAST(action, G)
action action type of event that has occurred type of event that has occurred G G process group view process group view
GBCAST satisfies the following ordering GBCAST satisfies the following ordering constraintsconstraints Delivered in the same order with respect to all Delivered in the same order with respect to all
other broadcasts at each destinationother broadcasts at each destination Delivered after any messages sent by the failed Delivered after any messages sent by the failed
processprocess
… … GBCASTGBCAST GBCAST is used to inform group member GBCAST is used to inform group member
processes that the process group view has processes that the process group view has changedchanged Each process keeps a local copy of the “process group Each process keeps a local copy of the “process group
view”view” Reception of a GBCAST updates the local copyReception of a GBCAST updates the local copy A process can assume that its local copy is consistent A process can assume that its local copy is consistent
with the rest of the groupwith the rest of the group Upon failure or recovery, a GBCAST is sent by Upon failure or recovery, a GBCAST is sent by
thethe Supervisory process executing in same machine where Supervisory process executing in same machine where
process failure or recovery occurred (if machine alive)process failure or recovery occurred (if machine alive) Failure detection software executing on other machineFailure detection software executing on other machine
The usage of GBCAST avoids execution of an The usage of GBCAST avoids execution of an agreement protocolagreement protocol
ABCASTABCAST ABCAST ABCAST Atomic Broadcast Atomic Broadcast Provides Provides sequential consistencysequential consistency on replicated on replicated
datadata Applications use ABCAST to enforce order in the way Applications use ABCAST to enforce order in the way
data is updated in the distributed system (i.e., shared data is updated in the distributed system (i.e., shared data structure)data structure)
Call: ABCAST(msg, label, dests)Call: ABCAST(msg, label, dests) msg msg message to be broadcasted message to be broadcasted label label identifies ABCASTs that are related to each identifies ABCASTs that are related to each
otherother dests dests set of processes to which broadcast is sent set of processes to which broadcast is sent
ABCASTs with the same label that have ABCASTs with the same label that have destinations in common are delivered in the destinations in common are delivered in the same order (some order) to all such destinationssame order (some order) to all such destinations
CBCASTCBCAST CBCAST CBCAST Causal Broadcast Causal Broadcast Provides causal consistency on replicated dataProvides causal consistency on replicated data
Applications use CBCAST to enforce causal order in Applications use CBCAST to enforce causal order in the way data is updated in the distributed systemthe way data is updated in the distributed system
Call: CBCAST(msg, clabel, dests)Call: CBCAST(msg, clabel, dests) msg msg message to be broadcasted message to be broadcasted clabel clabel identifies related CBCASTs and type of identifies related CBCASTs and type of
orderingordering dests dests set of processes to which broadcast is sent set of processes to which broadcast is sent
CBCASTs with the same ‘clabel’ that have CBCASTs with the same ‘clabel’ that have destinations in common are delivered in a destinations in common are delivered in a predetermined order to all such destinationspredetermined order to all such destinations
… … CBCASTCBCAST Broadcast ‘A’ causally precedes broadcast ‘B’ ifBroadcast ‘A’ causally precedes broadcast ‘B’ if
A and B are sent by the same process, and A is sent A and B are sent by the same process, and A is sent before Bbefore B
A and B are sent by different processes, and A was A and B are sent by different processes, and A was received by the process that sent B before B was sentreceived by the process that sent B before B was sent
Causal ordering is determined by the value of Causal ordering is determined by the value of ‘clabels’‘clabels’
If broadcast A causally precedes broadcast B, If broadcast A causally precedes broadcast B, then clabel(A) < clabel(B)then clabel(A) < clabel(B)
Usage of ‘clabels’ gives applications the power to Usage of ‘clabels’ gives applications the power to decide events that are causally relateddecide events that are causally related Not all CBCASTs are causally related; ordering them Not all CBCASTs are causally related; ordering them
would limit system concurrencywould limit system concurrency
Advantages of the Proposed Advantages of the Proposed ApproachApproach
Simplify applicationsSimplify applications Eliminate the need for ‘ordering protocols’ at the Eliminate the need for ‘ordering protocols’ at the
application level needed to prevent inconsistencies due application level needed to prevent inconsistencies due to potential failures to potential failures
These protocols are needed if communication were These protocols are needed if communication were done via simple atomic broadcastsdone via simple atomic broadcasts
Improve system performanceImprove system performance Application ordering protocols restrict concurrency by Application ordering protocols restrict concurrency by
imposing synchronization rulesimposing synchronization rules
NoteNote: Assumption is that GBCAST, ABCAST, and : Assumption is that GBCAST, ABCAST, and CBCAST are implemented at a level below the CBCAST are implemented at a level below the application (i.e., Kernel)application (i.e., Kernel)
Sample Application: Sample Application: Updating Replicated DataUpdating Replicated Data
All copies of the replicated data must be updated in the All copies of the replicated data must be updated in the same ordersame order
Without the proposed broadcast primitives, process Without the proposed broadcast primitives, process would need to do explicit synchronizationwould need to do explicit synchronization Send a basic atomic broadcast to the remote copiesSend a basic atomic broadcast to the remote copies Wait for the remote copies to reply with confirmation of updateWait for the remote copies to reply with confirmation of update Update local copy, and perform next updateUpdate local copy, and perform next update NoteNote: similar to 2-Phase-Commit: similar to 2-Phase-Commit
Using CBCAST, process can assume that all copies have Using CBCAST, process can assume that all copies have been updated once CBCAST returns and local copy is been updated once CBCAST returns and local copy is updatedupdated CBCAST guarantees that all copies receive update in required CBCAST guarantees that all copies receive update in required
order with respect to previous CBCASTs that update the same order with respect to previous CBCASTs that update the same datadata
CBCASTs are ordered with respect to failures (notified via CBCASTs are ordered with respect to failures (notified via GBCASTs)GBCASTs)
NoteNote: Usage of CBCASTs improves performance: Usage of CBCASTs improves performance
Final ThoughtsFinal Thoughts
The proposed broadcast primitives provideThe proposed broadcast primitives provide Implicit ordering of messagesImplicit ordering of messages
Applications need not do explicit synchronization to Applications need not do explicit synchronization to prevent ordering problems when failures are possible prevent ordering problems when failures are possible
Message ordering with respect to faults/recoveries Message ordering with respect to faults/recoveries Faults and Recoveries are treated as logical events, Faults and Recoveries are treated as logical events,
subject to ordering with respect to messagessubject to ordering with respect to messages This provides consistency among the processes in the This provides consistency among the processes in the
distributed system (all processes experience same set of distributed system (all processes experience same set of events)events)
Improved performanceImproved performance Elimination of explicit application ordering protocols Elimination of explicit application ordering protocols
allows higher concurrency in computationallows higher concurrency in computation