Upload
lylien
View
222
Download
0
Embed Size (px)
Citation preview
Distributed Systems 2Introduction
Alberto Montresor
University of Trento, Italy
2017/03/07
This work is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.
references
V. Hadzilacos and S. Toueg.A modular approach to fault-tolerant broadcasts and relatedproblems.In S. Mullender, editor, Distributed Systems (2nd ed.).Addison-Wesley, 1993.http:
//www.disi.unitn.it/~montreso/ds/papers/FTBroadcast.pdf.
A. Panconesi.The coordinated attack and the jealous amazons.http:
//www.dsi.uniroma1.it/~asd3/dispense/attack+amazons.pdf.
A. Panconesi.Coordination and the fall of Eastern Roman Empire.http://www.dsi.uniroma1.it/~asd3/dispense/fall.pdf.
Contents
1 Getting StartedTwo generalsCommon KnowledgeByzantine generalsSummary
2 Themes of the courseImpossible vs practicalClassical vs extreme distributed systemsSyllabus
3 Modeling Distributed SystemsComputationInteractionFailuresTime
Getting Started Two generals
Two generals
A thought experiment:
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 1 / 43
Getting Started Two generals
A potential solution
General A: attack at dawn!
General B: ack, attack at dawn!
General A: ack ack, attack at dawn!
General B: ack ack ack, attack at dawn!
. . .
Theorem
Under this scenario, there is no solution for the Two Generals Problem
Proof.
By contradiction on the number of messages exchanged.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 2 / 43
A potential solution
General A: attack at dawn!
General B: ack, attack at dawn!
General A: ack ack, attack at dawn!
General B: ack ack ack, attack at dawn!
. . .
Theorem
Under this scenario, there is no solution for the Two Generals Problem
Proof.
By contradiction on the number of messages exchanged.2017-0
3-0
7
DS - Introduction
Getting Started
Two generals
A potential solution
• Let’s assume (by contradiction) that there is at least one solution tothis problem under this scenario.
• If there is one, there could be many.
• If there are many, we can find one which uses the minimum number ofmessages.
• Take the last message of this protocol: it can be received or it can belost
• The protocol should work in both cases. So we could avoid sending itat all!
• The resulting protocol uses less messages than the minimum, which isa contradiction
Getting Started Two generals
Reality Check
Atomic Commit
An atomic commit is an operation in which a set of distinct changes isapplied as a single operation.
Example: ATM’s withdrawal
You whitdraw 100 euro from an ATM in Trento
Your balance should be decreased by 100 euro
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 3 / 43
Getting Started Two generals
A pragmatic solution
Probabilistic protocol
Assuming messengers are caught independently of each other withprobability p
I General A:F Send n messengersF Attacks no matter what
I General B:F Attacks if receives at least one messenger
pn is the probability that the attack will be uncoordinated
Trade-off:
We can decrease the probability of failure by increasing n...
But at the additional cost of sending more messengers!
Without be ever certain that the attack will be coordinated!Alberto Montresor (UniTN) DS - Introduction 2017/03/07 4 / 43
Getting Started Common Knowledge
Muddy children
n children go playing
Children are truthful, perceptive, intelligent
Mom says: “Don’t get muddy!”
A bunch (say, k) get mud on their forehead
Daddy comes, looks around, and says “Some of you got a muddyforehead”
Daddy repeatedly ask: “Do you know whether you have a muddyforehead?”
What happens?
Slides from Lorenzo Alvisi
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 5 / 43
Getting Started Common Knowledge
Muddy children
Theorem
The first k − 1 times Daddy asks, they children says “No”
The k-th time, the k children say “Yes”.
Proof.
By induction on k
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 6 / 43
Muddy children
Theorem
The first k − 1 times Daddy asks, they children says “No”
The k-th time, the k children say “Yes”.
Proof.
By induction on k
2017-0
3-0
7
DS - Introduction
Getting Started
Common Knowledge
Muddy children
Let k = 1.
• The first time daddy asks, the child with mud on his forehead say yes.
• Because all the other have no mud, and someone has mud on hisforehead, it must be him.
Let k > 1
• Every child with mud see k − 1 children with mud on the forehead.
• If there were k − 1 children with mud, they would have said yes at the(k − 1)-th time daddy asks, but they didn’t.
• So there are actually k children with mud, and they all say yes at thek-th time daddy asks.
Getting Started Common Knowledge
Muddy children
Variation 1
Suppose k > 1
Every one knows that someone has a dirty forehead before Dadannounces it
Does Dad still need to speak up?
Let p = “Someone’s forehead is dirty”
Every one knows p
But, unless the father speak, if k = 2 not every one knows thateveryone knows p!
Suppose A and B are dirty. Before the father speaks A does notknow whether B knows p
If k = 3 , not every one knows that every one knows that everyone knows p ...
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 7 / 43
Getting Started Common Knowledge
Muddy children
Variation 2
... the father took every child aside and told them individually(without others noticing) that someone’s forehead is muddy?
Variation 3
... every child had (unknown to the other children) put a miniaturemicrophone on every other child so they can hear what the father saysin private to them?
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 8 / 43
Getting Started Common Knowledge
Two generals, reloaded
There is an entire logic that formalizes what knowledgeparticipants acquire while running a protocol
J. Halpern and Y. Moses. Knowledge and Common Knowledge ina Distributed Environment. E.W. Dijkstra Prize 2009.
Solving the Two Generals Problem requires common knowledgeI “everyone knows that everyone knows that everyone knows...”
But:
Common knowledge cannot be achieved by communicatingthrough unreliable channels
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 9 / 43
Getting Started Common Knowledge
A common knowledge puzzle
Albert and Bernard just become friends with Cheryl, and they wantto know when her birthday is. Cheryl gives them a list of 10 dates:
May 15,16,19
June 17,18
July 14,16
August 14,15,17
Cheryl then tells Albert and Bernard separately the month and theday of her birthday respectively.
Albert: I don’t know when Cheryl’s birthday is, but I know thatBernard does not know too.
Bernard: At first I didn’t know when Cheryl’s birthday is, but Iknow now.
Albert: Then I also know when Cheryl’s birthday is
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 10 / 43
Getting Started Byzantine generals
Byzantine generals
Scenario
n Byzantine generals encircling a city
They must decide whether to attack or retreat!
Messengers are reliable and synchronous
Generals may be traitors
Nobody knows which generals are traitors
Problem specification
The generals require an algorithm to reach an agreement such that (i)all loyal generals decide on the same plan of action and (ii) a smallnumber of traitorous generals cannot cause the loyal generals to adoptdifferent plans.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 11 / 43
Getting Started Byzantine generals
A potential solution
Wait for a majority of generals to agree
WRONG! Possible scenario:
3 generals
1 vote “attack”, 1 vote “retreat”
1 traitorous general:I sends a vote “attack” to the “attack” generalI sends a vote “retreat” to the “retreat” general
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 12 / 43
Getting Started Byzantine generals
The problem is solvable
Byzantine Fault Tolerance (1982)
L. Lamport, R. Shostak, M. Pease, The Byzantine GeneralsProblem, ACM Trans. on Programming Languages and Systems,4(3):382–401, 1982.
A protocol that given n processes,can tolerate up to t traitorousgenerals with n ≥ 3t + 1
Example: 4 generals can tolerate up to 1 “byzantine” general
Practical Byzantine Fault Tolerance (2002)
M. Castro and B. Liskov, Practical Byzantine Fault Tolerance andProactive Recovery, ACM Trans. on Computer Systems,20(4):398–461, 2002.
PBFT triggered a renaissance in BFT replication research
Still going on...Alberto Montresor (UniTN) DS - Introduction 2017/03/07 13 / 43
Getting Started Byzantine generals
Reality Check
BFT sponsors in 1982I NASAI The Ballistic Missile Defense System CommandI Army Research Office
Nancy Lynch’s book on Distributed Systems:
I The agreement problem is a simplified version of a problem thatoriginally arose in the development of on-board aircraft controlsystems.
BitCoin, a peer-to-peer digital currency system, is based on BFT.
The 8-hour downtime of Amazon S3 in July 2008 is a well-knownexample of what happens when you don’t use BFT.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 14 / 43
Getting Started Summary
Take-home lessons
We need to properly model our distributed systemsI Reliable / unreliable communicationI Benign / malicious processes
Solutions depend on the underlying modelI “Approximate” or “probabilistic” solutionsI “Bounded” solutionI No solution at all!
Coordinating multiple processes is difficultI Unexpected events: failures, malicious behaviorI Lack of common knowledge
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 15 / 43
Themes of the course Impossible vs practical
Theory vs practice
Yogi Berra says:
In theory, theory and practice are the same.In practice, they are not.
Yogi Berra
“Always go to other people’s funerals, otherwisethey won’t go to yours”
“I really didn’t say everything I said”
“Nobody goes there anymore; it’s too crowded”
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 16 / 43
Themes of the course Impossible vs practical
First theme of the course
Impossible vs practical
Several papers about impossibility results:
M. Fischer, N. Lynch, M. Paterson. Impossibility of Distributed Consensuswith One Faulty Process. Journal of ACM, 32(2):374–382, 1985.
S. Gilbert, N. Lynch. Brewer’s Conjecture and the Feasibility of Consistent,Available, Partition-Tolerant Web Services. ACM SIGACT News, 33(2):51-59,2002.
Yet, many of these problems have practical solutions:
T. Chandra and S. Toueg. Unreliable failure detectors for reliable distributedsystems. Journal of the ACM, 43(2):225–267, 1996.
L. Lamport. Paxos made simple. ACM SIGACT News, 32(4):18–25, 2001.
A general tension in Computer Science:
M. Vardi. Solving the Unsolvable. Comm. of the ACM, 54(7):5, 2011.Alberto Montresor (UniTN) DS - Introduction 2017/03/07 17 / 43
Themes of the course Classical vs extreme distributed systems
Beyond technology
The Spanish Flu, 1918-1920
Pandemic: killed 20M people in a relatively short time, more thanWorld War I
Virus goal: spread itself as quickly as possible
Unreliable environment:I viruses may be killedI transmission may fail
Transmission network is a complex graph
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 18 / 43
Themes of the course Classical vs extreme distributed systems
Beyond technology
Flocks of birds
Flying in a flock is good:I probability of being killed by a predator is reduced
Flying in a flock is bad:I probability of finding (enough) food is reduced
Birds self-organize themselves in a flock
No central authority
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 19 / 43
Themes of the course Classical vs extreme distributed systems
Second theme of the course
Classical vs extreme distributed systems
Classical distributed system problems include agreement, totalorder broadcast, atomic commit, replication, etc.
Extreme distributed system problems include self-* properties,scalability, full decentralization, etc.
Special issue on Springer Computing (Sept. 2012)
Alberto Montresor, Gusz Eiben, Maarten van Steen, editors“Modern distributed systems may nowadays consist of hundreds ofthousands of computers, ranging from high-end powerful machines tolow-end resource-constrained wireless devices. We label them asextreme distributed systems, as they push scalability and complexitywell beyond traditional scenarios.”
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 20 / 43
Themes of the course Syllabus
Topics
IntroductionReliable broadcastEpidemic protocolsImpossibility of consensusConsensus and failuredetectorsComplex networksP2P
Epidemics: Beyond disseminationRollback and RecoveryPaxosPractical Byzantine FaultToleranceByzantine, altruistic, rationalmodelBlockchains
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 21 / 43
Themes of the course Syllabus
What’s next in the course?
Distributed system modelingI Which kind of failures exist?I Which kind of failures we tolerate?
Problem specificationI Formal description of the problem
Algorithms, algorithms, algorithmsI Pseudo-code descriptions of algorithms
ProofsI Just writing the code is not enough!I Sometimes, impossibility proofs
Reality checksI Learn about real systems where these protocols are applied
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 22 / 43
Modeling Distributed Systems
Contents
Modeling distributed systems
Computation: Processes, deterministic vs probabilistic behavior
Interaction: Processes interact through messages, which result in:I Communication, i.e. information flowI Coordination, i.e. synchronization and ordering of activities
Failures: Which kind of failures can occur?I Benign vs malicious (Byzantine)I Process vs communication
Time: Determining whether we can make any assumption on timebounds on communication and computation speeds.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 23 / 43
Modeling Distributed Systems Computation
Computation
Process: the unit of computation in a distributed system.Sometimes we may call it node, host, etc.
Process set: denoted by Π, it is composed by a collection of nuniquely identified processes, like p1, p2, . . . , pn.
Typical assumptions:I The set is static (n is well-defined);I Processes do know each otherI All processes run a copy of the same algorithm; the sum of all these
copies constitutes the distributed algorithm
But in extreme distributed systems:I Dynamic setI Too many, too dynamic to know them allI Multiple algorithms
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 24 / 43
Modeling Distributed Systems Computation
Deterministic vs probabilistic
Deterministic process: the local computation and the messagessent by a process is determined by the current state and themessages previously received.
Probabilistic process: processes may make used of random oraclesto choose the local computation to be performed or the nextmessage to be sent.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 25 / 43
Modeling Distributed Systems Interaction
Interaction
Processes communicate through messagesI send(m, p): sends a message m to pI receive(m): receives a messages m
In some cases, messages may be uniquely identified byI Sender of the messageI A sequence number local to the sender
General assumption: every pair of processes is connected by abi-directional communication channel
I Through routingI Not true for P2P systems
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 26 / 43
Interaction
Processes communicate through messagesI send(m, p): sends a message m to pI receive(m): receives a messages m
In some cases, messages may be uniquely identified byI Sender of the messageI A sequence number local to the sender
General assumption: every pair of processes is connected by abi-directional communication channel
I Through routingI Not true for P2P systems
2017-0
3-0
7
DS - Introduction
Modeling Distributed Systems
Interaction
Interaction
• In the receive operation, we do not specify the original sender; can be
• Fully connected topology may be obtained through routing. For
example, consider the following architectures:
– Fully connected mesh– broadcast medium (Ethernet, wireless)– Ring– “Internet” with routers
Modeling Distributed Systems Failures
Process failures
In a distributed systems, both processes and communication channelsmay fail, i.e. depart from what is considered its correct behavior.Hadzilacos and Toueg provide a taxonomy.
Benign process failures
Fail-stop: A process stops executing events, and other processesmay detect this fact.
Crash: A process stops executing events
Malicious process failures
Arbitrary failure, or Byzantine: any type of error may occur. Thismay be caused by:
I A software bugI A malicious behavior inspired by an intelligent adversary
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 27 / 43
Modeling Distributed Systems Failures
Process failures
A process that never fails is correct
A process that eventually fails is faulty
Several protocols are designed to work correctly if the number offailures f is bounded (for example, f < n/3).
In some models, processes may perform a recovery action:I After some time, a process may resume functioningI It suffers amnesia: the local state maintained in volatile memory is
lostI To limit the effects of amnesia, a log can be maintained
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 28 / 43
Process failures
A process that never fails is correct
A process that eventually fails is faulty
Several protocols are designed to work correctly if the number offailures f is bounded (for example, f < n/3).
In some models, processes may perform a recovery action:I After some time, a process may resume functioningI It suffers amnesia: the local state maintained in volatile memory is
lostI To limit the effects of amnesia, a log can be maintained
2017-0
3-0
7
DS - Introduction
Modeling Distributed Systems
Failures
Process failures
To avoid the problem of amnesia completely, every read/write would have to
pass through permanent memory; too expensive
Modeling Distributed Systems Failures
Communication failures
Benign communication failures
Process p performs send of a message m to process q
Message m is inserted in a local outgoing buffer of p(Send-omission)
Message m is transmitted from p to q (Omission)
Message m is inserted in a local incoming buffer of q(Receive-omission)
Process q performs receive of m
Malign communication failures
Messages created out of nothing, duplicated messages, etc.These problems can easily be solved through encryption techniques.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 29 / 43
Modeling Distributed Systems Failures
Communication failures
Possible causes of message failures:
Buffer overflow in the operating system
Congestion, routing errors in routers
Partitioning:I Processes are subdivided in disjoint sets called partitionsI Communication inside a partition is possibleI Communication between partitions is not possible
When a partition disappears, we say that partitions merge
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 30 / 43
Modeling Distributed Systems Failures
Modeling (faulty) communication channels
The idea: the channels cannot systematically drop a specific message.This is the minimum abstraction needed to create reliable channels.
Fair-Loss Channels
Validity – Fair Loss: If a message m is sent infinitely often by aprocess p to a process q and neither p and q crash, then q willreceive m infinitely often
Integrity – Finite Duplication: If a message m is sent a finitenumber of times by a process p to a process q, then m cannot bereceived by q an infinite number of times
Integrity – No creation: If a message m is delivered by someprocess p, then m was previously sent by some process q to p
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 31 / 43
Modeling Distributed Systems Failures
Modeling (correct) communication channels
The idea: channels are reliable, messages are never lost. It can beimplemented, but there is a price to be payed: asynchrony.
Perfect Channels
Validity – Reliable delivery: If p sends a message to q, and neitherof p and q crash, then q will eventually receive m
Integrity – No duplication: No message is delivered to a processmore than once
Integrity – No creation: If a message m is delivered by someprocess p, then m was previously sent by some process q to p
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 32 / 43
Modeling Distributed Systems Failures
An Example Algorithm
Fair-loss Channel → Perfect Channel
upon init doSet sent← ∅Set delivered← ∅startTimer(timeout)
upon timeout doforeach (m, q) ∈ sent do
fairLossSend(m, q)
startTimer(timeout)
upon perfectSend(m, q) dofairLossSend(m, q)sent← sent ∪ {(m, q)}
upon fairLossReceive(m, q) doif m /∈ delivered then
delivered← delivered ∪ {m}perfectReceive(m, q)
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 33 / 43
Modeling Distributed Systems Failures
Safety and liveness
Safety
“Something bad will never happen”
In other words, a distributed program should never enter anunacceptable state.
No message is delivered to a process more than once.
Liveness
“Something good eventually does happen”
In other words, a distributed program eventually enters a desirablestate.
If p sends a message to q, and neither of p and q crash, theneventually q will receive m.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 34 / 43
Modeling Distributed Systems Time
Time
Global clockI For presentation simplicity, it may be convenient to assume the
presence of a global real-time clock, outside the control of processes.I This can be used to provide a global ordering of steps in a
distributed systems
In reality:I Each process is associated with a local clockI Local clocks may not report the perfect timeI Clock drift rate: refers to the relative amount that a computer clock
differs from a perfect reference clock.
Synchronization is possible, but expensive:I Atomic clocksI GPSI See: Google TrueTime API:
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 35 / 43
Time
Global clockI For presentation simplicity, it may be convenient to assume the
presence of a global real-time clock, outside the control of processes.I This can be used to provide a global ordering of steps in a
distributed systems
In reality:I Each process is associated with a local clockI Local clocks may not report the perfect timeI Clock drift rate: refers to the relative amount that a computer clock
differs from a perfect reference clock.
Synchronization is possible, but expensive:I Atomic clocksI GPSI See: Google TrueTime API:
2017-0
3-0
7
DS - Introduction
Modeling Distributed Systems
Time
Time
• GPS does not work into buildings
• Atomic clocks: cost not justified
Modeling Distributed Systems Time
Time measures associated to communication
Latency: The delay between the start of message sending from oneprocess and the beginning of its receipt by another. Possiblecauses:
I the actual time for bit transmission (e.g., satellite link)I the delay for accessing the network, especially in case of congestionI the time taken by the operating system to handle the message both
at sender and receiver
Bandwidth: Total amount of information that can be transmittedover a communication channel in a given time.
Jitter: Variation in the time taken to deliver a series of messages.Mostly related with multimedia data.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 36 / 43
Modeling Distributed Systems Time
Asynchronous vs synchronous
Distributed Systems vs Time
Distributed systems make difficult to reason about time, not only forlack of clock synchronization. It is also difficult to pose time bounds onevents and communication.
We may think about several different models:
Asynchronous distributed systems
Synchronous distributed systems
Partially synchronous distributed systems
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 37 / 43
Asynchronous vs synchronous
Distributed Systems vs Time
Distributed systems make difficult to reason about time, not only forlack of clock synchronization. It is also difficult to pose time bounds onevents and communication.
We may think about several different models:
Asynchronous distributed systems
Synchronous distributed systems
Partially synchronous distributed systems
2017-0
3-0
7
DS - Introduction
Modeling Distributed Systems
Time
Asynchronous vs synchronous
• Asynchronous distributed systems
– No assumptions can be made.– Most of the problems cannot be solved
• Synchronous distributed systems
– Precise assumptions are possible on computation,communication time and clocks.
– Not really realistic / difficult to implement
• Partially synchronous distributed systems
– Some assumptions can be made, others not, OR– Assumptions can be made statistically, OR– Assumptions hold for arbitrarily long periods of time
Modeling Distributed Systems Time
Asynchronous vs synchronous
Asynchronous distributed system
There are no bounds on the relative speed of process execution.
There are no bounds on message transmission delays.
There are no bounds on clock drift.I OR, since we cannot count on their precision at all, there are no
clocks.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 38 / 43
Modeling Distributed Systems Time
Asynchronous vs synchronous
Comments
These are not assumptions! These are “lack of assumptions”!
The worst possible model: services as simple as:I failure detectionI time-based coordination
are not possible
Advantages:I simple semanticsI easier to port to more “powerful” modelsI More realistic: several sources of asynchrony are present in a
large-scale network (like the Internet)
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 39 / 43
Modeling Distributed Systems Time
Asynchronous vs synchronous
Synchronous Distributed Systems
Synchronous computation:There is a known upper bound on the relative speed of processexecution.
Synchronous communication:There is a known upper bound on message transmission delays.
Synchronous clocks:Processes are equipped with local clocks. There is a known upperbound on the drift rates of local clocks with respect to a globalreal-time clock.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 40 / 43
Modeling Distributed Systems Time
Asynchronous vs synchronous
Comments
The best possible model. Can be built, but not with standardhardware/software.
I Synchronous Ethernet vs CSMA/CD EthernetI Real-time OS vs normal OS
Many interesting properties:I Timed failure detection (e.g., ping)I Coordination based on time (e.g., lease)I Worst-case performance analysisI Synchronized clocks
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 41 / 43
Modeling Distributed Systems Time
Asynchronous vs synchronous
Partial synchrony
For most systems we know of, it is relatively easy to define physicaltime bounds that are respected most of the time. There are howeverperiods where the timing assumptions do not hold.
Delays on processes:I Machines may run out of memory, slowing down processesI A typical case of “no bound on relative speeds of processes”
Delays on messages:I Network may congested, and messages may be dropped.I Re-transmission protocols can ensure reliability, but at the price of
asynchronyMessages may be re-transmitted an arbitrary number of times.
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 42 / 43
Asynchronous vs synchronous
Partial synchrony
For most systems we know of, it is relatively easy to define physicaltime bounds that are respected most of the time. There are howeverperiods where the timing assumptions do not hold.
Delays on processes:I Machines may run out of memory, slowing down processesI A typical case of “no bound on relative speeds of processes”
Delays on messages:I Network may congested, and messages may be dropped.I Re-transmission protocols can ensure reliability, but at the price of
asynchronyMessages may be re-transmitted an arbitrary number of times.
2017-0
3-0
7
DS - Introduction
Modeling Distributed Systems
Time
Asynchronous vs synchronous
In this sense, practical systems are partially synchronous
Modeling Distributed Systems Time
Asynchronous vs synchronous
How to express partial synchrony? A possibility is the following:
Timing assumptions only hold eventually.
Theoretically, it means:
There is a time after which the system is synchronous forever
The system is initially asynchronous and only after a long timebecomes synchronous
How to read it:
The system is not always synchronous
There is no known bound to the period in which it is asynchronous
We expect that there are periods during which the system issynchronous
Some of these periods are long enough to terminate protocolexecution
Alberto Montresor (UniTN) DS - Introduction 2017/03/07 43 / 43
Reading Material
A. Panconesi. The coordinated attack and the jealous amazons.
http://www.dsi.uniroma1.it/~asd3/dispense/attack+amazons.pdf
A. Panconesi. Coordination and the fall of Eastern Roman Empire.
http://www.dsi.uniroma1.it/~asd3/dispense/fall.pdf
V. Hadzilacos and S. Toueg. A modular approach to fault-tolerant broadcastsand related problems.
In S. Mullender, editor, Distributed Systems (2nd ed.). Addison-Wesley, 1993.
http://www.disi.unitn.it/~montreso/ds/papers/FTBroadcast.pdf
Reality Check: Interesting links
The S3 incidentSolving the unsolvableThe rise and fall of CorbaNotes on Distributed Systems for Young Bloods