Chapter 10

Chapter 10

Consistency And Replication

Topics Motivation Data-centric consistency models Client-centric consistency models Distribution protocols Consistency protocols

Motivation Make copies of services on multiple sites,

improve … Reliability(by redundancy)

If primary FS crashes, standby FS still works Performance

Increase processing power Reduce communication delays

Scalability Prevent overloading a single server (size scalability) Avoid communication latencies (geographic scale)

However, updates are more complex When, who, where and how to propagate the

updates?

Concurrency Control on Remote Object

a) A remote object capable of handling concurrent invocations on its own.b) A remote object for which an object adapter is required to handle

concurrent invocations

Object Replication

a) A distributed system for replication-aware distributed objects.b) A distributed system responsible for replica management

Distributed Data Store

Clients point of view:

Its data store is capable of storing an amount of data

Distributed Data Store

Data Store’s point of view:

General organization of a logical data store, physically distributed and replicated across multiple tasks.

Operations on A Data Store Read:ri(x)b client i or process Pi performs a

read for data item x and it returns value b Write: wi(x)a client i or process Pi performs

a write on data x setting it to the new value a

Operations not instantaneous Time of issue (when request is sent by client) Time of execution (when request is executed at

a replica) Time of completion (when reply is received by

client)

Example

Consistency Models Defines which interleaving of operations is valid

(admissible) Different levels of consistency

strong (strict, tight) weak (loose)

Consistency model: Concerned with the consistency of a data store Specifies characteristics of valid ordering of operations

A data store that implements a particular consistency model will provide a total ordering of operations that is valid according to this model

Consistency Models Data-centric models

Described consistency experienced by all clients

Clients P1, P2, P3, … see same kind of orderings

Client centric models: Described consistency only seen by clients

who request it Clients P1, P2, P3 may see different kinds of

orderings

Data-Centric Consistency Models Strong ordering:

Strict consistency Linear consistency Sequential consistency Causal consistency FIFO consistency

Weak ordering: Weak consistency Release consistency Entry consistency

Strict Consistency Definition: A DDS (distributed data store) is

strict consistent if any read on a data item of the DDS returns the value corresponding to the result of the most recent write on x, regardless of the location of the processes doing read or write

Analysis: 1. In a single processor system strict consistency is

for nothing, that’s exact the behavior of local shard memory with atomic reads/writes

2. However, it’s hard to establish a global time to determine what’s the most recent write

3. Due to message transfer delays this model is not achievable

Example

Behavior of two processes, operating on the same data item.

(a) A strictly consistent store. (b) A store that is not strictly consistent.

Strict Consistency Problems

Assumption: y = 0 is stored on node 2, P1 and P2 are processes on node 1 and 2,

Due to message delays, r(y) at t = t2 may result in 0 or 1 and at

t = t4 may result in 0, 1 or 2

Furthermore: If y migrates to node 1 between t2 and t3 then r(y) issued at time t2 may even get value 2 (i.e. .back to the future.).

Sequential Consistency (1) Definition: A DDS offers sequential consistency, if all

processes see the same order of accesses to the DDS, whereby reads/writes of individual processes occur in program order, and reads/writes of different ones are performed in some sequential order.

Analysis: 1. Sequential consistency is weaker than strict consistency 2. Each valid permutation of accesses is allowed iff all tasks

see same permutation 2 runs of a distributed application may have different results

3. No global timing ordering is required

Example

Each task sees all writes in the same order, even though not strict consistent.

Non-Sequential Consistency

Linear Consistency Definition: A DDS is said to be linear consistent

(linearizable) when each operation is time-stamped and the following holds: The result of each execution is the same as if the (read and write) operations by all processes on the DDS were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program. In addition, if TSOP1(x) < TSOP2(y), then operation OP1(x) should precede OP2(y) in this sequence

Assumption Each operation is assumed to receive a time

stamp using a globally available clock, but with only finite precision, e.g. some loosely coupled synchronized local clocks.

Linear consistency is stricter than sequential one, i.e. a linear consistent DDS is also sequentially consistent.

With linear consistency no longer each valid interleaving of reads and writes is allowed, the ordering has also obey the order implied by the time-stamps of these operations.

Causal Consistency (1) Definition: A DDS is assumed to provide

causal consistency if, the following condition holds: Writes that are potentially causally related* must be seen by all tasks in the same order. Concurrent writes may be seen in a different order on different machines. * If event B is caused or influenced by an earlier

event A, causality requires that everyone else also sees first A, and then B.

Causal Consistency (2) Definition: write2 is potentially

dependent on write1, when there is a read between these 2 writes which may have influenced write2

Corollary: If write2 is potential dependent on write1 the only correct sequence is: write1 write2.

Causal Consistency: Example

This sequence is allowed with a casually-consistent store, but not with sequentially or strictly consistent store.

Causal Consistency: Example

a) A violation of a casually-consistent store.b) A correct sequence of events in a casually-consistent store.

Implementation Implementing causal consistency

requires keeping track of which processes have seen which writes.

Construction and maintenance of a dependency graph, expressing which operations are causally related (using vector time stamps)

FIFO or PRAM Consistency Definition: DDS implements FIFO consistency,

when all writes of one process are seen in the same order by all other processes, i.e. they are received by all other processes in the order they were issued. However, writes from different processes may be seen in a different order by different processes.

Corollary: Writes on different processors are concurrent

Implementation: Tag each write-operation of every process with: (PID, sequence number)

Example

Both writes are seen on processes P3 and P4 in a different order, they still obey FIFO-consistency, but not causal consistency because write 2 is dependent on write1.

Example (2)

Possible results A B Nil AB?

Two concurrent processes with variable x,y = 0

Process P1 Process P2

x=1; y=1;

if ( y==0 ) print(“A”); if (x==0) print(“B”);

Synchronization Variable Background: not necessary to

propagate intermediate writes. Synchronization variable

Associated with one operation synchronize(S).

Synchronize all local copies of the data store.

Compilation Optimization

int a, b, c, d, e, x, y; /* variables */int *p, *q; /* pointers */int f( int *p, int *q); /* function prototype */

a = x * x; /* a stored in register */b = y * y; /* b as well */c = a*a*a + b*b + a * b; /* used later */d = a * a * c; /* used later */p = &a; /* p gets address of a */q = &b /* q gets address of b */e = f(p, q) /* function call */

A program fragment in which some variables may be kept in registers.

Weak Consistency Definition: DDS implements weak

consistency, if the following hold: Accesses to synchronization variables obey

sequential consistency No access to a synchronization variable is

allowed to be performed until all previous writes have completed everywhere

No data access (read or write) is allowed to be performed until all previous accesses to synchronization variables have been performed

Interpretation A synchronization variable S knows just

one operation: synchronize(S) responsible for all local replicas of the data store

Whenever a process calls synchronize(S) its local updates will be updated on all replicas of the DDS and all updates of the other processes will be updated to its local replica of the DDS

All tasks see all accesses to synchronization-variables in the same order

Interpretation (2) No data access allowed until all

previous accesses to synchronization-variables have been done By doing a synch before reading shared

data, a task can be sure of getting the “ up to date value”

Unlike previous consistency models “weak consistency” forces the programmer to collect critical operations all together

Example

Via synchronization you can enforce that you’ll get up-to-date values. Each process must synchronize if its writes should be seen by others.

A process requesting a read without any synchronization

measures may get out-of-date values.

Non-weak Consistency

Release Consistency Problems with weak consistency: When a

synch-variable is accessed, the DDS doesn’t know whether this is done because a process has finished writing the shared variables or whether it is about reading them. It must take actions required in both cases,

namely making sure that all locally initiated writes have been completed (i.e. propagated to all other machines), as well as gathering in all writes from other machines.

Provide two operations: acquire and release

Details Idea:

Distinguish between memory accesses in front of a critical section (acquire) and those behind of a critical section (release).

Implementation: When a release is done, all the

protected data that have been updated within the critical section will be propagated to all replicas.

Definition Definition: A DDS offers release consistency,

if the following three conditions hold: 1. Before a read or write operation on shared

data is performed, all previous acquires done by the process must have completed successfully.

2. Before a release is allowed to be performed, all previous reads and writes by the process must have been completed

3. Accesses to synchronization variables are FIFO consistent.

Example

Valid event sequence for release consistency, even though P3 missed to use acquire and release.

Remark: Acquire is more than a lock or enter_critical_section, it waits until all updates on protected data from other nodes are propagated to its local replicas, before it enters the critical section

Lazy Release Consistency Problems with “eager” release consistency:

When a release is done, the process doing the release pushes out all the modified data to all processes that already have a copy and thus might potentially read them in the future.

There is no way to tell if all the target machines will ever use any of these updated values in the future above solution is a bit inefficient, too much overhead.

Details With “lazy” release consistency nothing

is done at a release. However, at the next acquire the

processor determines whether it already has all the data it needs. Only when it needs updated data, it needs to send messages to those places where the data have been changed in the past.

Time-stamps help to decide whether a data is out-dated.

Entry Consistency Unlike release consistency, entry consistency

requires each ordinary shared variable to be protected by a synchronization variable.

When an acquire is done on a synchronization variable, only those ordinary shared variables guarded by that synchronization variable are made consistent.

A list of shared variables may be assigned to a synchronization variable (to reduce overhead).

How to Synchronize? Every synch-variable has a current owner

An owner may enter and leave critical sections protected by this synchronization variable as often as needed without sending any coordination message to the others.

A process wanting to get a synchronization-variable has to send a message to the current owner.

The current owner hands over the synch-variable all together with all updated values of its previous writes.

Multiple reads in the non-exclusive reads are possible.

Example

A valid event sequence for entry consistency

Summary of Consistency Models

Consistency Description

Strict Absolute time ordering of all shared accesses matters.

LinearizabilityAll processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp

SequentialAll processes see all shared accesses in the same order. Accesses are not ordered in time

Causal All processes see causally-related shared accesses in the same order.

FIFOAll processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order

(a)

Consistency Description

Weak Shared data can be counted on to be consistent only after a synchronization is done

Release Shared data are made consistent when a critical region is exited

Entry Shared data pertaining to a critical region are made consistent when a critical region is entered.

(b)

a) Consistency models not using synchronization operations.b) Models with synchronization operations.

Up to Now System wide consistent view on DDS Independent of number of involved

processes Mutual exclusive atomic operations on DDS Processes access only local copies Propagation of updates have to be made,

whenever it is necessary to fulfill requirements of the consistency model

Are there still weaker consistency models?

Client-Centric Consistency Provide guarantees about ordering of

operations only for a single client, i.e. Effects of an operations depend on the client

performing it Effects also depend on the history of client’s

operations Applied only when requested by the client No guarantees concerning concurrent

accesses by different clients Assumption:

Clients can access different replicas, e.g. mobile users

Mobile Users

The principle of a mobile user accessing different replicas of a distributed database.

Eventual Consistency If updates do not occur for a long period of time, all

replicas will gradually become consistent Requirements:

Few read/write conflicts No write/write conflicts Clients can accept temporary inconsistency

Examples: DNS:

No write/write conflicts Updates slowly (1 – 2 days) propagating to all caches.

WWW: Few write/write conflicts Mirrors eventually updated

Cached copies (browser or Proxy) eventually replaced.

Client Centric Consistency Models Monotonic Reads Monotonic Writes Read Your Writes Writes Follow Reads

Monotonic Reading Definition: A DDS provides

monotonic-read consistency if the following holds: If process P reads the value of data

item x, any successive read operation on x by that process will always return the same value or a more recent one (independently of the replica at location L where this new read will be done).

Example Systems Distributed e-mail database with

distributed and replicated user-mailboxes.

Emails can be inserted at any location.

However, updates are propagated in a lazy (i.e. on demand) fashion.

Example

The read operations performed by a single process P at two different local copies of the same data store.

a) A monotonic-read consistent data storeb) A data store that does not provide monotonic reads.

Monotonic Writing Definition: DDS provides monotonic-write

consistency if the following holds: A write operation by process P on data item x is

completed before any successive write operation on x by the same process P can take place.

Remark: Monotonic-writing ~ FIFO consistency Only applies to writes from one client process P Different clients -not requiring monotonic writing

may see the writes of process P in any order

Example

The write operations performed by a single process P at two different local copies of the same data store

a) A monotonic-write consistent data store.b) A data store that does not provide monotonic-write consistency.

Reading Your Writes Definition: DDS provides “read your

write” consistency if the following holds: The effect of a write operation by a process P

on a data item x at a location L will always be seen by a successive read operation by the same process.

Example of a missing read-your-write consistency: Updating a website with an editor, if you want

to view your updated website, you have to refresh it, otherwise the browser uses the old cached website content.

Updating passwords

Example

a) A data store that provides read-your-writes consistency.

b) A data store that does not.

Writes Following Reads Definition: DDS provides “ writes-

follow-reads” consistency if the following holds: A write operation by a process P on a

data item x following a previous read by the same process, is guaranteed to take place on the same or even a more recent value of x, than the one having been read before.

Example

a) A writes-follow-reads consistent data storeb) A data store that does not provide writes-follow-

reads consistency

Implementing Client Centric Consistency Naive Implementation (ignoring

performance): Each write gets a globally unique identifier Identifier is assigned by the server that

accepts this write operation for the first time For each client two sets of write identifiers

are maintained: Read-set(client C) := RS(C)

{write-IDs relevant for the reads of this client C} Write-set(client C) := WS(C)

{write-IDs having been performed by client C}

Implementing Monotonic Reads

When a client C performs a read at server S, that server is handed the client’s read set RS(C) to control whether all identified writes have taken place locally at server S. If not, server has to be updated before reading!

Implementing Monotonic Write If client initiates a write on a server S, this

server S gets the clients write-set in order to update server S. A write on this server is done according to the times stamped WID.

Having done the new write, client’s write-set is updated with this new write. The response time of a client might thus increase with an ever increasing write-set.

However, what to do if all the reader write-sets of a client get larger and larger?

Improving Efficiency with RS and WS Major drawback: potential sizes of

read- and write sets Group all write- and read-operations

of a client in a so called session (mostly assigned with an application)

Every time a client closes its current session, all updates are propagated and these sets are deleted afterwards

Summary on Consistency Models Choosing the right consistency model requires

an analysis of the following trade-offs: Consistency and redundancy

All replicas must be consistent All replicas must contain full state Reduced consistency reduced reliability

Consistency and performance Consistency requires extra work Consistency requires extra communication May result in loss of overall performance

Distribution Protocols Replica Placement

Permanent Replicas Server-Initiated Replicas Client-Initiated Replicas

Update Propagation State versus Operations Pull versus Push Protocols Unicasting versus Multicasting

Epidemic Protocols Update Propagation Models Removing data

Replica Placement

The logical organization of different kinds of copies of a data store into three concentric rings.

Replica Placement Permanent replicas

Initial set of replicas. Created and maintained by DDS-owner(s) Writes are allowed E.g., web mirrors

Server-initiated replicas Enhance performance Not maintained by owner of DDS Placed close to groups of clients

Manually Dynamically

Client-initiated replicas Client caches Temporary Owner not aware of replica Placed closest to a client Maintained by host (often the client)

Update Propagation

What to Be Propagated? Propagate only a notification of an update (“invalidation”)

Typical for invalidation protocols May include information which part of the DDS has been

updated Work best, when ratio of reads/write is low

Propagate updated data from one replica to another Work best, if ratio of reads/writes is high You may also aggregate some update before sending them

across the network Propagate the update operation to other replicas (“active

replication”) This approach called active replication works if the size of

parameters associated with each operation is small compared to the updated data

Pull versus Push Protocols Push protocol , i.e. updates are propagated to

other replicas without those replicas having asked for them

Used between permanent and server initiated replicas, i.e. to achieve a relatively high degree of consistence

Pull protocol , i.e. a server (or a client) asks another server to provide the updates

Used by client caches, e.g. when a client requests a website, not having updated for a longer period of time, it may check the original web site, whether updates have been made

Efficient when read-to-write ratio is relatively low.

Pull versus Push ProtocolsIssue Push-based Pull-based

State of server List of client replicas and caches None

Messages sent Update (and possibly fetch update later) Poll and update

Response time at client

Immediate (or fetch-update time) Fetch-update time

A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.

Unicasting

Potential overhead with unicasting in a LAN.Good for pull-based approach.

Multicasting

With multicasting an update message can be propagated more efficiently across a LAN.

Good for push-based approach.

Epidemic Protocols Implementing eventual consistency you

may rely on epidemic protocols. No guarantees for absolute consistency are

given, but after some time epidemic protocols will send updates to all replicas.

Notions: An infective is a server with a replica that is

willingly to spread to other servers, too A susceptible, is a server that has not yet been

infected, i.e. updated A removed server is a server, that does not want

to propagate any information

Anti-Entropy Protocol Server P picks another server Q at

random, and subsequently exchanges updates with Q, there are 3 approaches how to exchange updates: P only pushed its own updates to Q P only pulls in new updates from Q P and Q exchange to each other their

updates, i.e. a push-pull approach

Gossip Protocols Rumor spreading or gossiping

works as follows: If server P has been updated for data

item x, it contacts another arbitrary server Q and tries to push its new update of x to Q.

However, if Q got this update already by some other server, P is so much disappointed, that it will stop gossiping with a prob. = 1/k

Gossip Protocols (2) Although gossiping really works

quite well on average, you cannot guarantee that every server will be updated.

In a DDS with a “large” number of replicas, the fraction s of servers remaining ignorant towards an update, i.e. are still susceptible is:

s = e-(k+1)(1-s)

Analysis of Epidemic Protocols Advantages:

Scalability, due to limited # of update messages

Disadvantage: Spreading the deleting of a data is quite

cumbersome, due to an unwanted side effect:

Suppose, you have deleted on server S data item x, but you may receive again an old copy of data item x from some other server due to still ongoing gossiping

Consistency Protocols Primary-Based Protocols

Remote-Write Protocols Local-Write Protocols

Replicated-Write protocols Active Replication Quorum-Based Protocols

Primary-Based Protocols Each data item of a DDS has an

associated primary, responsible for coordinating write operations on x

Primary server: Fixed,i.e. a specific remote server, i.e.

remote writing Dynamic, primary is migrated to the

place, of the next write

Remote-Write Protocols (1)

Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded.

Remote-Write Protocols (2)

The principle of primary-backup protocol.

Local-Write Protocols (1)

Primary-based local-write protocol in which a single copy is migrated between processes.

Local-Write Protocols (2)

Primary-backup protocol in which the primary migrates to the process wanting to perform an update.

Replicated-Write Protocols Writes can take place at multiple

replicas, instead of on only a specific primary server. Active replication

Operation is forwarded to all replicas Problem:

make sure all operations need to be carried out in the same order everywhere.

Scalability Replicated invocation

Majority voting Before reading or writing ask a subset of all replicas

Replicated Invocation for Active Replication

Solutions

a) Forwarding an invocation request from a replicated object.b) Returning a reply to a replicated object.

Quorum-Based Protocols Preliminaries:

If a client wants to read or write, it first must request and acquire permission of multiple servers.

Example: A DFS with file F being replicated on N servers. If an

update has to be made, demand, that the client first contacts half of the servers plus 1, and get them to agree to do his update. Once, they have agreed, file F gets a new version number F(x.y)

To read file F, a client also must contact at least half of the servers and ask them, to hand out the current version number of F.

Gifford’s Quorum-Based Protocols To read a file F a client must use a read-

quorum, an arbitrary assemble of NR servers. To write a file F, at least NW servers( the

write quorum) is required. The following must hold: A) NR +NW > N B) NW > N/2 A) Is used to prevent read-write conflicts B) Is used to prevent write-write conflicts

Examples

Three examples of the voting algorithm:a) A correct choice of read and write setb) A choice that may lead to write-write conflictsc) A correct choice, known as ROWA (read one, write all)

Documents

Chapter 10