101
CSC 536 Lecture 5

CSC 536 Lecture 5

  • Upload
    flann

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

CSC 536 Lecture 5. Outline. Replication and Consistency Defining consistency models Data centric Client centric Implementing consistency Replica management Consistency Protocols Examples Distributed file systems Web hosting systems p2p systems. Replication and consistency. - PowerPoint PPT Presentation

Citation preview

Page 1: CSC  536 Lecture  5

CSC 536 Lecture 5

Page 2: CSC  536 Lecture  5

Outline

Replication and ConsistencyDefining consistency models

Data centricClient centric

Implementing consistencyReplica managementConsistency Protocols

ExamplesDistributed file systemsWeb hosting systemsp2p systems

Page 3: CSC  536 Lecture  5

Replication and consistency

ReplicationReliabilityPerformance

Multiple replicas leads to consistency problems.

Keeping the copies the same can be expensive.

Page 4: CSC  536 Lecture  5

Replication and scalingWe focus for now on replication as a scaling technique

Placing copies close to processes using them reduces the load on the network

But, replication can be expensiveKeeping copies up to date puts more load on the network

Example: distributed totally-ordered multicastingExample: central coordinator

The only real solution is to loosen the consistency requirements

The price is that replicas may not be the sameWe will develop several consistency models

Data-centricClient-centric

Page 5: CSC  536 Lecture  5

Data-centric consistency models

Page 6: CSC  536 Lecture  5

Data-Centric Consistency Models

The general organization of a logical data store, physically distributed and replicated across multiple processes.

Page 7: CSC  536 Lecture  5

Consistency models

A consistency model is a contract between processes and the data store.

If processes obey certain rules, i.e. behave according to the contract, then ...The data store will work “correctly”, i.e. according to the contract.

Page 8: CSC  536 Lecture  5

Strict consistency

Any read on a data item x returns a value corresponding to the most recent write on x

Definition implicitly assumes existence of absolute global timeDefinition makes sense in uniprocessor systems:a=1;a=2;print(a);

Definition makes little sense in distributed systemMachine B writes xA moment later, machine A reads xDoes A read old or new value?

Page 9: CSC  536 Lecture  5

Strict Consistency

Behavior of two processes, operating on the same data item.a) A strictly consistent store.b) A store that is not strictly consistent.

Page 10: CSC  536 Lecture  5

Weakening of the model

Strict consistency is ideal, but un-achievable

Must use weaker models, and that's OK!Writing programs that assume/require the strict consistency model is unwiseIf order of events is essential, use critical sections or transactions

Page 11: CSC  536 Lecture  5

Sequential consistency

The result of any execution is the same as ifthe (read and write) operations by all processes on the data store were executed in some sequential order, andthe operations of each individual process appear in this sequence in the order specified by its program

No reference to time

All processes see the same interleaving of operations.

Page 12: CSC  536 Lecture  5

Sequential Consistency

a) A sequentially consistent data store.b) A data store that is not sequentially consistent.

Page 13: CSC  536 Lecture  5

Sequential Consistency

Three concurrently executing processes.

z = 1;print (x, y);

y = 1;print (x, z);

x = 1;print ( y, z);

Process P3Process P2Process P1

Page 14: CSC  536 Lecture  5

Sequential Consistency

Four valid execution sequences for the processes of the previous slide. The vertical axis is time.

y = 1;x = 1;z = 1;print (x, z);print (y, z);print (x, y);

Prints: 111111

(d)

y = 1;z = 1;print (x, y);print (x, z);x = 1;print (y, z);

Prints: 010111

(c)

x = 1;y = 1;print (x,z);print(y, z);z = 1;print (x, y);

Prints: 101011

(b)

x = 1;print ((y, z);y = 1;print (x, z);z = 1;print (x, y);

Prints: 001011

(a)

Page 15: CSC  536 Lecture  5

Sequential consistency more precisely

An execution E of processor P is a sequence of R/W operations by P on a data store.

This sequence must agree with the P's program order .

A history H is an ordering of all R/W operations that is consistent with the execution of each processor.

In a sequentially consistent model, the history H must obey the following rules:

Program order (of each process) must be maintained.Data coherence must be maintained.

Page 16: CSC  536 Lecture  5

Sequential consistency is expensive

Suppose A is any algorithm that implements a sequentially consistent data store

Let r be the time it takes A to read a data item xLet w be the time it takes A to write to a data item xLet t be the message time delay between any two nodes

Then r + w > ??

Page 17: CSC  536 Lecture  5

Sequential consistency is expensive

Suppose A is any algorithm that implements a sequentially consistent data store

Let r be the time it takes A to read a data item xLet w be the time it takes A to write to a data item xLet t be the message time delay between any two nodes

Then r + w > t

If lots of reads and lots of writes, we need weaker consistency models.

Page 18: CSC  536 Lecture  5

Causal consistency

Sequential consistency may be too strict: it enforces that every pair of events is seen by everyone in the same order, even if the two events are not causally related

Causally related: P1 writes to x; then P2 reads x and writes to yNot causally related: P1 writes to x and, concurrently, P2 writes to x

Reminder: operations that are not causally related are said to be concurrent.

Page 19: CSC  536 Lecture  5

Causal consistency

Writes that are potentially causally related must be seen by all processes in the same order.Concurrent writes may be seen in a different order on different machines.

Page 20: CSC  536 Lecture  5

Causal consistency

This sequence is allowed with a causally-consistent store, but not with sequentially or strictly consistent store.

Page 21: CSC  536 Lecture  5

Causal consistency

a) A violation of a causally-consistent store.b) A correct sequence of events in a causally-consistent store.

Page 22: CSC  536 Lecture  5

FIFO consistency

Writes done by a single process are seen by all other processes in the order in which they were issued, but writes from different processes may be seen in a different order by different processes.

Page 23: CSC  536 Lecture  5

FIFO consistency

A valid sequence of events of FIFO consistency

Page 24: CSC  536 Lecture  5

Weak consistencySequential and causal consistency defined at the level of read and write operations

The granularity is fine

Concurrency is typically controlled through synchronization mechanisms such as mutual exclusion or transactions

A sequence of reads/writes is done in an atomic block and their order is not important to other processesThe consistency contract should be at the level of atomic blocks

granularity should be coarser

Associate synchronization of data store with a synchronization variable

when the data store is synchronized, all local writes by process P are propagated to all other copies, whereas writes by other processes are brought in to P's copy

Page 25: CSC  536 Lecture  5

Weak consistency

a) A valid sequence of events for weak consistency.b) An invalid sequence for weak consistency.

Page 26: CSC  536 Lecture  5

Release consistencyDivide access to a synchronization variable into two parts

Acquire phase: forces a requester to wait until the lock on a synchronization variable is obtained and the shared data can be accessed

Release phase: sends requestor’s local value to other servers in the data store

Page 27: CSC  536 Lecture  5

Release consistency

When a lock on a shared variable is being released, the updated variable value is sent to shared memory

To view the value of a shared variable as guaranteed by the contract, an acquire (i.e. a lock) is required

Accesses to synchronization variables are FIFO consistent (sequential consistency is not required).

Page 28: CSC  536 Lecture  5

Release consistency example: JVM

The Java Virtual Machine uses the SMP model of parallel computation.

Each thread will create local copies of shared variables Calling a synchronized method is equivalent to an acquire exiting a synchronized method is equivalent to a release

A release has the effect of flushing the cache to main memory

writes made by this thread can be visible to other threads

An acquire has the effect of invalidating the local cache so that variables will be reloaded from main memory

Writes made by the previous release are made visible

Page 29: CSC  536 Lecture  5

Client-centric consistency models

Page 30: CSC  536 Lecture  5

Data-centric consistency is too much

Data-centric consistency models aim at providing a system-wide consistent view of a data store.

The goal is to provide consistency in the face of concurrent updates

In some data stores, there may not be concurrent updates, or if there are, they may be easy to handle:

DNS: each domain has unique naming authorityWWW: each web page has an ownerSome database systems often have just a few processes that do updates

Page 31: CSC  536 Lecture  5

Eventual consistency

The previous examples have in common:They tolerate some inconsistency.If no updates take place for a long time, then all replicas will eventually be consistent.

Eventual consistency is a consistency models that guarantees the propagation of updates to all replicas

Nothing elseCheap to implementWorks fine if a client always accesses the same replica

But what if different replicas are accessed?

Page 32: CSC  536 Lecture  5

Eventual Consistency

A mobile user accessing different replicas of a distributed database.

Page 33: CSC  536 Lecture  5

Client-centric consistency

It provides consistency guarantees for a single client.

We will define several versions of client-centric consistency

Assumptions:Data store is distributed across multiple machines.A process connects to the nearest copy and all R/W operations are performed on that copy.Updates are eventually propagated to other copies.Each data item has an owner process, which is the only one that can update that item.

Page 34: CSC  536 Lecture  5

Monotonic-read consistency

If a process reads the value of a data item x, any successive read operations on x by that process will always return that same value or a more recent value.

Example: replicated mail server

Page 35: CSC  536 Lecture  5

Monotonic-write consistency

A write operation by a process on a data item x is completed before any successive write operation on x by the same process.

Example: a software library

Page 36: CSC  536 Lecture  5

Read-your writes consistency

The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process.

Example: updating passwords for a web resource

Page 37: CSC  536 Lecture  5

Write-follow-reads consistency

A write operation by a process on a data item x following a previous read operation by the same process, is guaranteed to take place on the same or a more recent value of x that was read.

Example: newsgroups.

Page 38: CSC  536 Lecture  5

Implementing replica consistency

Page 39: CSC  536 Lecture  5

Distribution protocols

There are different ways of propagating (distributing) updates to replicas

independent of the consistency model that is supported

Before we discuss them, we need to discuss the following:When, where, and by whom copies of the data store are to be placed?

Page 40: CSC  536 Lecture  5

Replica placement

The logical organization of different kinds of copies of a data store into three concentric rings.

Page 41: CSC  536 Lecture  5

Permanent replicas

Initial set of replicasExample: mirroring web sites to a few serversExample: web site whose files are replicated across a few servers on a LAN; requests are forwarded to copies in a round-robin fashion

Page 42: CSC  536 Lecture  5

Server-Initiated Replicas

Counting access requests from different clients.

Page 43: CSC  536 Lecture  5

Client-initiated replicas

Caches

Goal is only to improve performance, not enhance reliability

Page 44: CSC  536 Lecture  5

Update propagation

Updates are initiated by a client and subsequently forwarded to one of the copies.

From the copies, updates should be propagated to other copies.

What to propagate?Propagate only notifications of an updateTransfer updated data from one copy to anotherPropagate the update operation to other copies

Page 45: CSC  536 Lecture  5

Push or pull?In a push-based approach, updates are propagated to replicas without those replicas even asking for the updates

Useful when ???Used between permanent and server-initiated replicas, and even client caches

In a pull-based approach, a server or client requests another server to send it any updates it has at that moment

Useful when ???Example: web caches

Page 46: CSC  536 Lecture  5

Push or pull?In a push-based approach, updates are propagated to replicas without those replicas even asking for the updates

Useful when ratio of reads vs. writes is highUsed between permanent and server-initiated replicas, and even client caches

In a pull-based approach, a server or client requests another server to send it any updates it has at that moment

Useful when ratio of reads vs. writes is lowExample: web caches

Page 47: CSC  536 Lecture  5

Pull versus Push Protocols

A comparison between push-based and pull-based protocols in the case of multiple client, single server systems.

Issue Push-based Pull-based

State of server List of client replicas and caches None

Messages sent Update Poll and update

Response time at client Immediate Fetch-update time

Page 48: CSC  536 Lecture  5

Consistency protocols

A consistency protocol describes an implementation of a specific consistency model

Protocols for client-centric consistency models

Protocols for data-centric consistency modelsFocus on consistency models in which operations are globally serialized (Sequential consistency, Atomic transactions, Weak consistency with synchronization variables)

Page 49: CSC  536 Lecture  5

Implementing monotonic-read consistency

Each write operation is assigned a unique global ID.

Every client keeps track of a:Write set of IDs of writes performed by that client.Read set of IDs of writes the client knows about (i.e. it has read them).

When a client performs a read operation at a server, it hands its read set to the server.

The server checks that all writes in the read set have taken place

If not, server must contact other servers for updates

Page 50: CSC  536 Lecture  5

Implementing other client-centric consistency models

Monotonic-writes

Read-your-writes

Writes-follow-reads

Page 51: CSC  536 Lecture  5

Implementing data-centric consistency models

Consistency protocols can be classified as:Primary based protocolsReplicated-write protocols

Page 52: CSC  536 Lecture  5

Primary based protocols

Each data item x in the data store has an associated primary, which is responsible for coordinating all updates of x

Two approaches:primary fixed at a remote server (remote write)Write operations are carried out locally after moving the primary to the process where the write is initiated (local write)

Page 53: CSC  536 Lecture  5

Remote-Write Protocols (1)

Primary-based remote-write protocol with a fixed server to which all read and write operations are forwarded.

Page 54: CSC  536 Lecture  5

Remote-Write Protocols (2)

The principle of primary-backup protocol.

Page 55: CSC  536 Lecture  5

Issues with remote-write protocols

Original approachProblem: update blocks process that initiated the updateAdvantage: fault tolerant

Alternative approach: use non-blocking approach in which the primary updates x and immediately returns an acknowledgement, before sending the update to other replicas

Advantage: better performanceProblem: not fault tolerant

Either way, remote write protocols can be used to implement sequential consistency

Page 56: CSC  536 Lecture  5

Local-Write Protocols (1)

Primary-based local-write protocol in which a single copy is migrated between processes.

Page 57: CSC  536 Lecture  5

Local-Write Protocols (2)

Primary-backup protocol in which the primary migrates to the process wanting to perform an update.

Page 58: CSC  536 Lecture  5

Issues with local-write protocols

How to keep track of where each data item is located?LAN: broadcastingLarge-scale, widely distributed systems: harder

Sequential consistency is easy to implement since the primary can coordinate updates of other copies

Page 59: CSC  536 Lecture  5

Replicated-write protocols

A write operation can be carried out at any replica

Two approaches:Active replicationMajority voting

Page 60: CSC  536 Lecture  5

Active replication

A client sends update request to a replica

Replica should forward the update to other replicas

How to make sure all updates are done in the same order at each replica?

Using Lamport timestampsSeen that, done that

Using a central coordinator (sequencer)Just like primary based consistency protocols!

Page 61: CSC  536 Lecture  5

Quorum-based protocols

Idea: require clients to request and acquire the permission of multiple servers before either reading or writing a replicated data item.

Page 62: CSC  536 Lecture  5

Example

Distributed file system with N replicated serversTo update a file, a client must send a request and wait for at least Nw servers to agree. Once Nw servers agree, the file is updated and receives a new version numberTo read a file, a client must send a request and wait for at least Nr replies. It picks the file whose version is most recent

How do we make sure no read/write conflict occurs? (i.e. we read the most recent file)

How do we make sure no write/write conflict occurs? (i.e. two writes get same version number)

Page 63: CSC  536 Lecture  5

Quorum-Based Protocols

Three examples of the voting algorithm:a) A correct choice of read and write setb) A choice that may lead to write-write conflictsc) A correct choice, known as ROWA (read one, write all)

Page 64: CSC  536 Lecture  5

Solution

Distributed file system with N replicated servers

How do we make sure no read/write conflict occurs? (i.e. we read the most recent file)

We need ???

How do we make sure no write/write conflict occurs? (i.e. two writes get same version number)

We need ???

Page 65: CSC  536 Lecture  5

Solution

Distributed file system with N replicated servers

How do we make sure no read/write conflict occurs? (i.e. we read the most recent file)

We need Nw + Nr > N

How do we make sure no write/write conflict occurs? (i.e. two writes get same version number)

We need Nw > N/2

Page 66: CSC  536 Lecture  5

Replication and consistency implementations

Focus on replication whose goal is improved performance

ExamplesDistributed file systems p2p systemsWeb hosting systems and consistent hashing

Page 67: CSC  536 Lecture  5

(Traditional) distributed file systems

Page 68: CSC  536 Lecture  5

Distributed file systems

A hierarchical (i.e. traditional) file system whose files and directories are distributed across a network

Goal is to provide the illusion of a local file systemSupports usual file system operations: open/close, read/write...Under the hood, file system operations are handled through Remote Procedure Calls (RPCs)

Design goals:Access/location/concurrency/replication/failure/migration transparencyHeterogeneityScalability

Page 69: CSC  536 Lecture  5

Distributed file systems

Examples:Sun Microsystem’s Network File System (NFS)Carnegie Mellon’s (then Transac’s, then IBM’s) Andrew File System (AFS), now OpenAFS and CodaWindows Distributed File SystemLustre Linux cluster parallel distributed file system

Page 70: CSC  536 Lecture  5

NFS client-server architecture

The basic NFS architecture for UNIX systems.

Page 71: CSC  536 Lecture  5

Mounting a directory in NFS

Mounting (part of) a remote file system in NFS

Page 72: CSC  536 Lecture  5

NFS file system model

An incomplete list of file system operations supported by NFS

Page 73: CSC  536 Lecture  5

NFS file system model

An incomplete list of file system operations supported by NFS

Page 74: CSC  536 Lecture  5

Remote versus local access

(a) The remote access model. (b) The upload/download model.

Page 75: CSC  536 Lecture  5

Semantics of file sharing

(a) On a single processor, when a read follows a write, the value returned by the read is the value just written (UNIX semantics)

Page 76: CSC  536 Lecture  5

Semantics of file sharing

(b) In a distributed system with caching, obsolete values may be returned (session semantics)

Page 77: CSC  536 Lecture  5

Semantics of file sharing

Four ways of dealing with the shared files in a distributed system

Page 78: CSC  536 Lecture  5

Overview of Coda

The overall organization of CODA.

Page 79: CSC  536 Lecture  5

Overview of Coda

Replication of files happens in two ways:Client cachingServer replication

Page 80: CSC  536 Lecture  5

Client Caching

Server keeps track of clients that fetched a file: it records a callback promise for a client

When a client writes to the file, it notifies the server, who, in turn, sends a callback break to the other clients

Page 81: CSC  536 Lecture  5

Sharing Files in Coda

The transactional behavior of sharing files in Coda

Page 82: CSC  536 Lecture  5

Sharing Files in Coda

The transactional behavior of sharing files in Coda

Page 83: CSC  536 Lecture  5

Server replication

Replicated-write protocol used to maintain consistencyusing reliable multicasting implemented at the RPC level

What happens if the servers get disconnected?

Page 84: CSC  536 Lecture  5

Server Replication

Two clients with different AVSG (Accesible Volume Storage Group) for the same replicated file.

Page 85: CSC  536 Lecture  5

p2p and “gossip” propagation

Page 86: CSC  536 Lecture  5

P2P and gossiping

P2P systems can propagate replicas through “gossip”

Now and then, each node picks some “peer” (at random, more or less)

Sends it a snapshot of its own dataCalled “push gossip”

Or asks for a snapshot of the peer’s data“Pull” gossip

Or both: a push-pull interaction

Page 87: CSC  536 Lecture  5

Gossip “epidemics”

[t=0] Suppose that I know something[t=1] I pick you… Now two of us know it.[t=2] We each pick … now 4 know it…

Information spread: exponential rate.

Due to re-infection (gossip to an infected node) spreads as 1.8k after k roundsBut in O(log(N)) time, N nodes are infected

Page 88: CSC  536 Lecture  5

Gossip epidemics

An unlucky node may just “miss” the gossip for a long time

Page 89: CSC  536 Lecture  5

Facts about gossip epidemics

Extremely robustData travels on exponentially many paths!Hard to even slow it down…

Suppose 50% of our packets are simply lost…… we’ll need 1 additional round: a trivial delay!

Push-pull works best. For push-only/pull-only a few nodes can remain uninfected for a long time

Page 90: CSC  536 Lecture  5

Uses of gossip epidemics

To robustly multicast dataSlow, but very sure of getting through

To implement eventual consistency between replicasTo support “all to all” monitoring and distributed managementFor distributed data mining and discovery

Page 91: CSC  536 Lecture  5

p2p and large file propagation

Page 92: CSC  536 Lecture  5

BitTorrent

A p2p file sharing system with:Load sharing through file splittingUses bandwidth of peers instead of a single serverA concept of “fairness”among peers with the goal of improving overall performance

Page 93: CSC  536 Lecture  5

The idea (setup)

A “seed”node has the fileFile is split into fixed-size segments (256KB)Hash calculated for each segmentA “.torrent”meta-file is built for the file – identifies the address of a tracker node

The .torrent file is hosted on a web server and passed around the web: it’s a link to the file

Tracker is a server which tracks currently active clientsTracker does not participate in actual distribution of file

Page 94: CSC  536 Lecture  5

The idea (setup)

Terminology:Seed: Client with a complete copy of the fileLeecher: Client still downloading the file and having some but not all fragments of the file

Client contacts tracker and gets a list of (50 or so) seeds and leechers

This set of seeds and leechers is called peer set

Page 95: CSC  536 Lecture  5

The idea (download)

A client wants to download the file described by a .torrent fileClient contacts the tracker identified in the .torrent file (using HTTP)Tracker sends client a (random) list of peers who have/are downloading the fileClient contacts peers on list to see which segments of the file they haveClient requests segments from peers (via TCP)Client uses hash from .torrent to confirm that segment is legitimateClient reports to other peers on the list that it has the segment and becomes a leecherOther peers start to contact client to get the segment (while client is getting other segments)Client eventually becomes a seed

Page 96: CSC  536 Lecture  5

The flow

Page 97: CSC  536 Lecture  5

Load sharing

As the file segments are downloaded by peers, they become the sources for further downloadsTo spread the load, clients have to have an incentive to allow peers to download from themBitTorrent does this with an economic systemA peer serves peers that serve too

Improves overall performanceEncourages cooperation, discourages free-riding

Peers use rarest first policy when downloading chunksHaving a rare chunk makes peer attractive to othersOther wants to download it, peer can then download the chunks it wants

For first chunk, just randomly pick something, so that peer has something to share

Page 98: CSC  536 Lecture  5

Encouraging peers to cooperate

Peer X serves 4 peers in peer set simultaneouslySeeks best (fastest) downloaders if X is a seedSeeks best uploaders if X a leecher

(Leecher specific) Choke is a temporary refusal to upload to a peerLeecher serves 4 best uploaders, chokes all othersEvery 10 seconds, it evaluates the transfer speedIf there is a better peer, choke the worst of the current 4

Every 30 seconds peer makes an optimistic unchokeRandomly unchoke a peer from peer setIdea: Maybe it offers better service

Seeds behave exactly the same way, except they look at download speed instead of upload speed

Page 99: CSC  536 Lecture  5

Summary

Works quite wellDownload a bit slow in the beginning, but speeds up considerably as peer gets more and more chunks

Those who want the file, must contributeImproves overall performanceAttempts to minimize free-riding

Efficient mechanism for distributing large files to a large number of clients

Popular software, updates, …

Page 100: CSC  536 Lecture  5

Replication for Web Hosting Systems

Page 101: CSC  536 Lecture  5

Replication for Web Hosting Systems

Akamai.ppt