23
Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006

Virtual Synchrony*

*With material adapted from Ken Birman

Page 2: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 2

Plan

  We skip Section 18.2

Tracking group membership: We’ll base it on 2PC and 3PC

Fault-tolerant multicast: We’ll use membership

Ordered multicast: We’ll base it on fault-tolerant multicast

Tools for solving practical replication and availability problems: we’ll base them on ordered multicast

Robust Web Services: We’ll build them with these tools

2PC and 3PC: Our first “tools” (lowest layer)

Page 3: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 3

Using group communication as a building block

  Communication to the group  Using communication inside the group

Page 4: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 4

How to communicate to a group?

  We have primarily looked at communication within groups

  Often process groups provide a service that processes outside the groups need to use– How do we multicast to the group from outside the

group?

  A number of solutions– Join the group...– Use a well-known member of the group– Cache a group view

Page 5: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 5

Using a well-known group member

  The non-member process issues an RPC to a member process– Assigns an ID; reissue if

failure– The member processes

multicasts on behalf of the non-member

  What about ordering?– Can requests submitted by

different non-members be considered concurrent?

– Complex in the event of failure

c

Page 6: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 6

Caching a group view

  Non-member gets a view of the group, possibly outdated– Client communicate directly with

processes in cached view  View obtained, e.g., through

– RPC to group member– By joining (larger) group to get view

  Three cases of multicast arrival– 1) entirely in correct view – correct – 2) entirely late – ignore/report, client

iterates multicast– 3) partially correct, partially late – flush

at view installation will deliver message correctly

  Often cheaper than RPC-based solution– Still complex with ordering

c

ignore /report

flush

Page 7: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 7

Using communication within groups

  What is a ”natural” abstraction for working with group communication?

  Reconsider the transactional model – ”virtual serial execution”– Transactions appear to run in isolation (cf ACID)– Database free to optimize by interleaving execution of

transactions

  ”Virtual Synchrony”– All group members see the same events in the same order– Group communication system free to optimize

  Insight– Enables us to run the same algorithm on all processes that will

stay in same state...

Page 8: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 8

A synchronous execution

  With true synchrony executions run in genuine lock-step – “state machine execution” possible

p

q

r

s

t

u

Page 9: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 9

Virtual Synchrony at a glance

  With virtual synchrony executions only look “lock step” to the application

p

q

r

s

t

u

Page 10: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 10

Virtual Synchrony at a glance

  We use the weakest (hence fastest) form of ordered communication possible

p

q

r

s

t

u

Page 11: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 11

Elements of Virtual Synchrony

  Support for process groups– Processes may join and leave dynamically – Excluded if (thought to) fail

  Various reliable, ordered multicast protocols– Strive to replace synchronous, totally-ordered, dynamically uniform

protocols  View-synchronous delivery

– Two processes that are member of the same view receive same set of multicasts during that view

  Identical process groups views and rankings– Identical sequences of group membership lists

  Gap-freedom guarantees– If, after a failure, m1 has been delivered to a destination, then any

message m0 that should be delivered prior to m1 is also delivered  State transfer for joining processes

– A new process may obtain group state from existing member– (Will develop this shortly)

Page 12: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 12

Why ”Virtual” Synchrony?

  The user sees what looks like a synchronous execution– Simplifies the developer’s task

  But the actual execution is rather concurrent and asynchronous– Maximizes performance– Reduces risk that lock-step execution will trigger

correlated failures

Page 13: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 13

Correlated failures

  Why do we claim that virtual synchrony makes these less likely?

  Recall that many programs are buggy– Often these are Heisenbugs (order-sensitive)

  With lock-step execution each group member sees group events in identical order– So all die in unison

  With virtual synchrony orders differ– So an order-sensitive bug might only kill one group

member!

Page 14: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 14

Why Virtual Synchrony again?

  So what can we build using Virtual Synchrony?  Distributed Algorithms

– Election– Consensus– Consistent snapshot

• Saw this last time

– Replicated data and synchronization– State transfer– Load-balancing– Fault tolerance

• Primary-backup• Coordinator-cohort

Page 15: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 15

Election

  We need to choose (in a distributed way) some process to take a special role in an algorithm?

  Processes that might participate join an appropriate group

  Now the group view gives a simple leader election rule– Everyone sees the same members, in the same order, ranked

by when they joined– Leader can be, e.g., the “oldest” process

  We have already used this in a number of places– E.g., abcast

Page 16: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 16

Consensus

  A set of processes need to agree on some decision/value?

  A group can easily solve consensus– Leader multicasts: “what’s your input”?– All reply: “Mine is 0. Mine is 1”– Initiator picks the most common value and multicasts that: the

“decision value”– If the leader fails, the new leader just restarts the algorithm

  Famous, theoretical result (”FLP”):– In the asynchronous model with process failure, distributed

consensus is impossible

  Does this apply here?

Page 17: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 17

Replicated Data and Synchronization

  We want to to replicate data and synchronize on updates among a set of processes?

  Want to support READ, UPDATE, LOCK operations on replicated data

  First look at synchronous, non-uniform case

Page 18: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 18

Synchronous, Replicated Data

  Use groups...– Joining processes

• Transfer data from existing processes (more later)– Maintain replica of data at each process

  READ– Read local copy

  UPDATE– Asynchronously multicast update using abcast – i.e., no waiting

  LOCK– Nonexclusive read locks can be implemented locally– Acquire exclusive lock by abcasting to group members for them to grant the lock

• Recipient just grants requests for lock in order received – provided no non-exclusive read is pending locally

• Sender waits until all recepients have acquired lock

  Fault-tolerant, consistent– Replicas start in identical state– Subsequent update and lock operations behave identically at all copies – same

events, same order

Page 19: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 19

Can we do better?

  How many abcasts can be replaced with asynchronous cbcasts?

  All, if we modify the protocol a bit...

  All UPDATEs guarded by appropriate locks– Want mutual exclusion on data that concurrent

UPDATE could change

  LOCKs established by passing write rights

Page 20: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 20

cbcast and locking

  For possibly conflicting updates, there will be only one writer at a time

– The initial writer is the first process to join the group

  To make a LOCK request, a process sends a cbcast and waits

– Receiving processes queue requests

  When writer is done, it will send a cbcast that grants the lock

– It takes oldest process on its LOCK queue and grant to that

– Other processes dequeues request and grants (when local read locks done)

  Does this work?– Do group members perform same

sequence of updates?

Page 21: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 21

cbcast and locking

  Sequence of conflicting updates identical at all replicas– There is only one writer at the

time for a given lock– Updates and lock grants follow

the same causal path– (The causal order is a total order

along a causal path)

  Non-conflicting updates may be applied in any order

Page 22: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 22

Performance

  Mostly asynchronous– UPDATEs may be done when sending the UPDATE

cbcast• Causally prior updates will already have been delivered to

sender

– Waiting for LOCKs necessary unless we already have the lock...

  ”85.000 updates per second”– Cf. quorum read and update

• Forces into lockstep for reads and updates• ”100 reads and updates (combined) per second”

– Not dynamically uniform, though• E.g., safe cbcast or cbcast followed by flush

Page 23: Distributed Systems 2006 Virtual Synchrony* *With material adapted from Ken Birman

Distributed Systems 2006 23

Summary

  We introduced ”Virtual Synchrony” as a powerful distributed programming abstraction– Processes appear to run in lock-step – middleware

may optimize– Based on tools developed so far: membership, group

communication

  Gave examples of what can be built on top of Virtual Synchrony– More next time...