6.4 Data And File Replication By Shruti poundarik

By Shruti poundarik. Data Objects and Files are replicated to increase system performance and availability. Increased system performance achieved

Embed Size (px)

Citation preview

Page 1: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

6.4 Data And File Replication

By Shruti poundarik

Page 2: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

Data Objects and Files are replicated to increase system performance and availability.

Increased system performance achieved through concurrent access of replicas.

High availability of data due to redundancy of data objects.

Parallelism and failure transparencies are desirable in distributed systems.

Not useful unless replication and concurrency transparency is provided.

Page 3: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

In database systems, atomicity is one of the ACID transaction properties. An atomic transaction is a series of database operations which either all occur, or all do not occur[1].

All or nothing.

In DFS (Distributed File System), replicated objects (data or file) should follow atomicity rules, i.e., all copies should be updated (synchronously or asynchronously) or none


Page 4: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

One-copy serializability: The effect of transactions performed by clients on replicated objects should be the same as if they had been performed one at a time on a single set of objects.[2]


Page 5: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

(FSA) File service agent, client interface

(RM) replica manager, provide replication functions [3] .

Client chooses one or more FSA to access data object.

FSA acts as front end to replica managers RMs to provide replication transparency.

FSA contacts one or more RMs for actual updating and reading of data objects.

Architecture for Replica Management

Page 6: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved










Page 7: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

Three options for Read operations.

Read-one-primary, FSA only read from a primary RM to enforce consistency.

Read-one, FSA may read from any RM for concurrency.

Read-quorum, FSA must read from a quorum of RMs to decide the currency of data

Read operations [3]

Page 8: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

Object access operations may be Read or Update.

In this architecture read operation needs to be addressed to one of the replicas.

Replica’s transparent to the client.

File Services invoked by the client may be required by RM protocol to ensure data read is most recent.

Page 9: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

From systems view point write operations should be addressed to all replicas automatically.

Scenarios for Write:-

Write-one-primary:-Only write to primary RM, primary RM update all other RMs

Write-all:- update to all RMs.

Write-all- available:- Write to all functioning RMs. Faulty RM need to be synched before bring online.

Write Operations[3]

Page 10: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

Write-quorum:- Update to a predefined quorum of RMs

Write-gossip :- Update to any RM and lazily propagated to other RMs.

Page 11: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

◦ Both Read and write operations must be directed to primary replica manager.

◦ No replication issue.

◦ All operations are serialized by the primary RM.

◦ Secondary RMs supply redundancy in case of primary failures.

◦ Consistency is easy to achieve but not concurrency.

Read One Primary/Write One Primary

Page 12: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

To provide Concurrency, read operation performed at any RM site.

leads to Coherency problem since propagation from one RM to the other secondary RM leads to communication delay.

Therefore the propagation of updates must be made atomic.

Updates can be initiated at any RM ,preferably the one closer to the requesting client.

Provides concurrency and coherency

Read one, Write all [3]

Page 13: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

Achieves one copy serializability ,execution of transaction on replicated objects is equivalent to execution of same transaction on non replicated objects.

In this the data objects are replicated to faulty and non faulty replicas.

This contradicts the purpose of replication ,as atomic updates should be made available to non faulty replicas.

Page 14: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

Variation of Read One write All.

Atomic updates should be made available to non faulty replicas.

Therefore one copy serializability gets slightly complicated.

Read one Write all available

Page 15: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

Each read operation to replicated data object d must obtain a read quorum R(d) to perform read.

Each write operation needs a write quorum W(d) to complete write.

Client gets most recently completed update of data as version number attached to replicated object.

Read operation queries all R(d) replicas, replica with highest version number is returned.

Write operation advances version by 1.

Read quorum Write quorum

Page 16: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

Updates are less frequent than reads ,updates can be propagated lazily to replicas.

This read-one/write-gossip approach is gossip update propagation protocol

Both read and update operations are directed by FSA to any RM

FSA shields replication details from clients.

Gossip Update [3]

Page 17: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

Main purpose is to support high availability in an environment where failure of replicas are likely .

Disadvantages of File replication:-

◦ Contents of the file needs to be known before replication operation takes place .

◦ Existing System cant work in limited bandwidth networks.

◦ DFS replication will not work well when there are large number of changes to replicate [4].

Page 18: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

File Replication is optimized over limited bandwidth networks using remote differential compression.[5]

RDC Remote Differential Compression protocol heuristically negotiates a set of differences between a recipient and sender that have two sufficiently similar versions of the same file.

RDC optimizes communication between sender and recipient by having both sides sub divide all of the files into chunks and compute strong checksums or signatures for each chunks.

Current Advancements

Page 19: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

RDC needs to be applied to compressed chunk files.

Windows Server uses Remote Differential Compression to propagate change only to save bandwidth [4].

Page 20: By Shruti poundarik.  Data Objects and Files are replicated to increase system performance and availability.  Increased system performance achieved

[1] Wikipedia; http://en.wikipedia.org/wiki/Atomicity

[2] M. T. Harandi;J. Hou (modified: I. Gupta);"Transactions with Replication";http://www.crhc.uiuc.edu/~nhv/428/slides/repl-trans.ppt

[3] Randy Chow,Theodore Johnson, “Distributed Operating Systems & Algorithms”, 1998

[4] "Overview of the Distributed File System Solution in Microsoft Windows Server 2003 http://technet2.microsoft.com/WindowsServer/en/library/d3afe6ee-3083-4950-a093-8ab748651b761033.mspx?mfr=true

[5] “Optimizing File Replication over Limited-Bandwidth Networks using Remote Differential Compression” IEEE Infocom Conference, 2006.
