View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Distributed Systems 2006
Virtual Synchrony*
*With material adapted from Ken Birman
Distributed Systems 2006 2
Plan
We skip Section 18.2
Tracking group membership: We’ll base it on 2PC and 3PC
Fault-tolerant multicast: We’ll use membership
Ordered multicast: We’ll base it on fault-tolerant multicast
Tools for solving practical replication and availability problems: we’ll base them on ordered multicast
Robust Web Services: We’ll build them with these tools
2PC and 3PC: Our first “tools” (lowest layer)
Distributed Systems 2006 3
Using group communication as a building block
Communication to the group Using communication inside the group
Distributed Systems 2006 4
How to communicate to a group?
We have primarily looked at communication within groups
Often process groups provide a service that processes outside the groups need to use– How do we multicast to the group from outside the
group?
A number of solutions– Join the group...– Use a well-known member of the group– Cache a group view
Distributed Systems 2006 5
Using a well-known group member
The non-member process issues an RPC to a member process– Assigns an ID; reissue if
failure– The member processes
multicasts on behalf of the non-member
What about ordering?– Can requests submitted by
different non-members be considered concurrent?
– Complex in the event of failure
c
Distributed Systems 2006 6
Caching a group view
Non-member gets a view of the group, possibly outdated– Client communicate directly with
processes in cached view View obtained, e.g., through
– RPC to group member– By joining (larger) group to get view
Three cases of multicast arrival– 1) entirely in correct view – correct – 2) entirely late – ignore/report, client
iterates multicast– 3) partially correct, partially late – flush
at view installation will deliver message correctly
Often cheaper than RPC-based solution– Still complex with ordering
c
ignore /report
flush
Distributed Systems 2006 7
Using communication within groups
What is a ”natural” abstraction for working with group communication?
Reconsider the transactional model – ”virtual serial execution”– Transactions appear to run in isolation (cf ACID)– Database free to optimize by interleaving execution of
transactions
”Virtual Synchrony”– All group members see the same events in the same order– Group communication system free to optimize
Insight– Enables us to run the same algorithm on all processes that will
stay in same state...
Distributed Systems 2006 8
A synchronous execution
With true synchrony executions run in genuine lock-step – “state machine execution” possible
p
q
r
s
t
u
Distributed Systems 2006 9
Virtual Synchrony at a glance
With virtual synchrony executions only look “lock step” to the application
p
q
r
s
t
u
Distributed Systems 2006 10
Virtual Synchrony at a glance
We use the weakest (hence fastest) form of ordered communication possible
p
q
r
s
t
u
Distributed Systems 2006 11
Elements of Virtual Synchrony
Support for process groups– Processes may join and leave dynamically – Excluded if (thought to) fail
Various reliable, ordered multicast protocols– Strive to replace synchronous, totally-ordered, dynamically uniform
protocols View-synchronous delivery
– Two processes that are member of the same view receive same set of multicasts during that view
Identical process groups views and rankings– Identical sequences of group membership lists
Gap-freedom guarantees– If, after a failure, m1 has been delivered to a destination, then any
message m0 that should be delivered prior to m1 is also delivered State transfer for joining processes
– A new process may obtain group state from existing member– (Will develop this shortly)
Distributed Systems 2006 12
Why ”Virtual” Synchrony?
The user sees what looks like a synchronous execution– Simplifies the developer’s task
But the actual execution is rather concurrent and asynchronous– Maximizes performance– Reduces risk that lock-step execution will trigger
correlated failures
Distributed Systems 2006 13
Correlated failures
Why do we claim that virtual synchrony makes these less likely?
Recall that many programs are buggy– Often these are Heisenbugs (order-sensitive)
With lock-step execution each group member sees group events in identical order– So all die in unison
With virtual synchrony orders differ– So an order-sensitive bug might only kill one group
member!
Distributed Systems 2006 14
Why Virtual Synchrony again?
So what can we build using Virtual Synchrony? Distributed Algorithms
– Election– Consensus– Consistent snapshot
• Saw this last time
– Replicated data and synchronization– State transfer– Load-balancing– Fault tolerance
• Primary-backup• Coordinator-cohort
Distributed Systems 2006 15
Election
We need to choose (in a distributed way) some process to take a special role in an algorithm?
Processes that might participate join an appropriate group
Now the group view gives a simple leader election rule– Everyone sees the same members, in the same order, ranked
by when they joined– Leader can be, e.g., the “oldest” process
We have already used this in a number of places– E.g., abcast
Distributed Systems 2006 16
Consensus
A set of processes need to agree on some decision/value?
A group can easily solve consensus– Leader multicasts: “what’s your input”?– All reply: “Mine is 0. Mine is 1”– Initiator picks the most common value and multicasts that: the
“decision value”– If the leader fails, the new leader just restarts the algorithm
Famous, theoretical result (”FLP”):– In the asynchronous model with process failure, distributed
consensus is impossible
Does this apply here?
Distributed Systems 2006 17
Replicated Data and Synchronization
We want to to replicate data and synchronize on updates among a set of processes?
Want to support READ, UPDATE, LOCK operations on replicated data
First look at synchronous, non-uniform case
Distributed Systems 2006 18
Synchronous, Replicated Data
Use groups...– Joining processes
• Transfer data from existing processes (more later)– Maintain replica of data at each process
READ– Read local copy
UPDATE– Asynchronously multicast update using abcast – i.e., no waiting
LOCK– Nonexclusive read locks can be implemented locally– Acquire exclusive lock by abcasting to group members for them to grant the lock
• Recipient just grants requests for lock in order received – provided no non-exclusive read is pending locally
• Sender waits until all recepients have acquired lock
Fault-tolerant, consistent– Replicas start in identical state– Subsequent update and lock operations behave identically at all copies – same
events, same order
Distributed Systems 2006 19
Can we do better?
How many abcasts can be replaced with asynchronous cbcasts?
All, if we modify the protocol a bit...
All UPDATEs guarded by appropriate locks– Want mutual exclusion on data that concurrent
UPDATE could change
LOCKs established by passing write rights
Distributed Systems 2006 20
cbcast and locking
For possibly conflicting updates, there will be only one writer at a time
– The initial writer is the first process to join the group
To make a LOCK request, a process sends a cbcast and waits
– Receiving processes queue requests
When writer is done, it will send a cbcast that grants the lock
– It takes oldest process on its LOCK queue and grant to that
– Other processes dequeues request and grants (when local read locks done)
Does this work?– Do group members perform same
sequence of updates?
Distributed Systems 2006 21
cbcast and locking
Sequence of conflicting updates identical at all replicas– There is only one writer at the
time for a given lock– Updates and lock grants follow
the same causal path– (The causal order is a total order
along a causal path)
Non-conflicting updates may be applied in any order
Distributed Systems 2006 22
Performance
Mostly asynchronous– UPDATEs may be done when sending the UPDATE
cbcast• Causally prior updates will already have been delivered to
sender
– Waiting for LOCKs necessary unless we already have the lock...
”85.000 updates per second”– Cf. quorum read and update
• Forces into lockstep for reads and updates• ”100 reads and updates (combined) per second”
– Not dynamically uniform, though• E.g., safe cbcast or cbcast followed by flush
Distributed Systems 2006 23
Summary
We introduced ”Virtual Synchrony” as a powerful distributed programming abstraction– Processes appear to run in lock-step – middleware
may optimize– Based on tools developed so far: membership, group
communication
Gave examples of what can be built on top of Virtual Synchrony– More next time...