Upload
linette-wiggins
View
216
Download
1
Embed Size (px)
Citation preview
1
Advanced Database Topics
Copyright © Ellis Cohen 2002-2005
SynchronousData Replication
These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License.
For more information on how you may use them, please see http://www.openlineconsult.com/db
Copyright © Ellis Cohen, 2002-2005 2
Topics
Models for Replication
ROWA Approaches
Eager Approaches
Reconciliation
Copyright © Ellis Cohen, 2002-2005 3
Modelsfor
Replication
Copyright © Ellis Cohen, 2002-2005 4
Data ReplicationData Replication
Put copies of the same data at multiple sites
AchievesDecreased network latency by placing replicas
near multiple high demand sites (access data at nearest replica)
High availability & reliability in face of failureParallel processingScalability (single copy no longer bottleneck)Disconnected operation
Query ProcessingIs even more complex, since the coordinator's
query optimizer additionally needs to decide which replica to use
Copyright © Ellis Cohen, 2002-2005 5
Replication Group:Group of related data items which are replicated together at different sites
A database may have multiple replication groups
Two different replication groups may be replicated–at the same set of sites–at disjoint sets of sites–at overlapping sites
A replication group typically represents all the data used in an application
A transaction generally accesses data in a single replication groupMuch more complicated if a transaction accesses multiple replication groups
Replication Groups
S1 S2
S3
Ra Ra
Rb
Rb
Copyright © Ellis Cohen, 2002-2005 6
Consistency Timeframes
Immediate ConsistencyAll replicas in a group are consistent after each and every update is made.
Transactional ConsistencyAt the end of each transaction, the replica group has a consistent model of the committed data. Updates need only be made at a single replica, and then are consistently propagated to others
Eventual ConsistencyUpdates made at one replica will eventually be propagated to other replicas, but at any point in time, different replicas may have inconsistent committed data. The replica group will only have a consistent model of the committed data when the system quiesces (all updates are propagated)
Copyright © Ellis Cohen, 2002-2005 7
1SR - 1 Copy Serializability
Replicated form of serializability:Interleaved execution of transactions on
a replicated database is equivalent toSerial execution of those transactions on
a database where there is only one copy of each data item
Eventual approaches– may not be able to ensure 1SR, but– may be able to satisfy other weak
consistency guarantees
Copyright © Ellis Cohen, 2002-2005 8
Basic Replication Topologies
MasterMaster/Snapshot
The replication group has a primary copy held at the
master. Its committed state represents the current committed state of the
entire replication group.Snapshots
Group
The replication group has no single primary copy. It
has a consistent (committed) state only if
the replicas are consistent with one another
Copyright © Ellis Cohen, 2002-2005 9
Master vs Snapshot Sites
Master (Primary Copy) SiteHolds all the data in the replication groupHolds most consistent / up-to-date version
of the dataSnapshot Site
May hold a partial (instead of a complete) subset of the data in a replication group
i.e. a subset of its tables, or even horizontal and/or vertical fragments of tables
Often not as up-to-date as Master sitesMay be completely or partly read-only, in
particular, if it contains materialized views (involving multiple tables) of the data in the replication group
Copyright © Ellis Cohen, 2002-2005 10
Partial SnapshotsPartial Snapshots further complicate
query processingCoordinator may need to get part of the queried data
from one snapshot, and part from anotherEspecially complicated if an operation needs to
update different partial snapshots (we'll ignore this case!)
Static vs Dynamic Partial SnapshotsStatic Partial Snapshots
Specified declaratively, made known to coordinator who uses it for query processing
Dynamic Partial SnapshotsReplica appears complete, but processes
queries by obtaining missing data from primary copy
Leads to complex query processing at snapshot site, since its request to the primary copy must take into account data already replicated (which might have already been modified during the transaction!)
Copyright © Ellis Cohen, 2002-2005 11
Snapshots with Materialized ViewsA materialized view is a view where the results of the
view are actually maintained persistently
When the underlying data is modified, the materialized view must generally be modified as well (e.g. by triggers, sometimes set up automatically by the replication manager) or deleted.
If a query optimizer knows about materialized views, it can rewrite queries on the underlying data to efficiently use the materialized view instead ("query rewriting")
Different snapshots may contain different materialized views of the data. If the coordinator's query optimizer knows the location of materialized views, this can affect the replica it asks to process a query.
Materialized views can be static or dynamic. Dynamic materialized views often result from remembering the result sets of executed queries.
Copyright © Ellis Cohen, 2002-2005 12
Complex Replication Topologies
MultiMaster(Combines
Group & Master)
HierarchicalMaster
Copyright © Ellis Cohen, 2002-2005 13
Consistency Models
Total ConsistencyAll replicas (unless they are crashed or
disconnected) are consistent with one another.
Master ConsistencyThere is a master replica (the primary copy).
Transactions committed at the master site reflect the intended state of the data.
MultiMaster ConsistencyLike master consistency, but there are multiple
master replicas, all consistent with one another
Copyright © Ellis Cohen, 2002-2005 14
Update Propagation Models
SynchronousWrite-Synchronous (ROWA)
Coordinator's write operation is not completed until every replica is updated
Commit-Synchronous (EAGER)All replicas commit (using 2PC) as part of coordinator's transaction
AsynchronousAs-Needed-Propagation (LAZY)
After transaction ends, updates are propagated to other replicas as needed
Eventual-Propagation (EVENTUAL)After transaction ends, updates are eventually propagated to other replicas
What is the consistency timeframe & model for each one?
Copyright © Ellis Cohen, 2002-2005 15
Update & Consistency ModelsWrite-Synchronous (ROWA)
Coordinator's write operation is not completed until every replica is updatedImmediate Total Consistency
Commit-Synchronous (EAGER)All replicas commit (using 2PC) as part of coordinator's transactionTransactional Total Consistency
As-Needed-Propagation (LAZY)After transaction ends, updates are propagated to other replicas as neededTransactional (Multi-)Master Consistency
Eventual-Propagation (EVENTUAL)After transaction ends, updates are eventually propagated to other replicasEventual Consistency
Copyright © Ellis Cohen, 2002-2005 16
ROWA Eager Lazy Eventual
Master/Snapshot
Group
Update Models & Topologies
Sometimes, these are both called EAGER
Sometimes, these are both
called LAZY
Copyright © Ellis Cohen, 2002-2005 17
ROWARead OneWrite All
Copyright © Ellis Cohen, 2002-2005 18
ROWA Update Model
SynchronousWrite-Synchronous (ROWA)
Coordinator's write operation is not completed until every replica is updated
Commit-Synchronous (EAGER)All replicas commit (using 2PC) as part of coordinator's transaction
AsynchronousAs-Needed-Propagation (LAZY)
After transaction ends, updates are propagated to other replicas as needed
Eventual-Propagation (EVENTUAL)After transaction ends, updates are eventually propagated to other replicas
Copyright © Ellis Cohen, 2002-2005 19
ROWA Consistency ModelImmediate Consistency
Write-Synchronous (ROWA)Coordinator's write operation is not
completed until every replica is updated
Transactional ConsistencyCommit-Synchronous (EAGER)
All replicas commit (using 2PC) as part of coordinator's transaction
As-Needed-Propagation (LAZY)After transaction ends, updates are
propagated to other replicas as needed
Eventual ConsistencyEventual-Propagation (EVENTUAL)
After transaction ends, updates are eventually propagated to other replicas
Copyright © Ellis Cohen, 2002-2005 20
ROWA Overview
Read One, Write AllAll replicas are updated immediately (without waiting until the transaction doing the update commits)
Data can be read from any replica
Immediate Total ConsistencyAll replicas (unless crashed or disconnected) are always consistent with one another
TopologyGroup-based. No need for a special primary copy.
ConcurrencyUsually lock-based. Other concurrency models can be used as well.
When might the ROWA model be used?
Copyright © Ellis Cohen, 2002-2005 21
ROWA Uses
Hot StandbyUpon failure, another replica can be switched in immediately
Mobile UseSuppose every cell has a nearby replica. A mobile coordinator can switch from replica to replica during a transaction, using whichever one is nearest
ReliabilityRead from multiple replicas simultaneously to– avoid waiting in case of site/link failure– ensure that data is correct
Updates can be very expensive.Either they're done infrequently, or they must be worth the cost
Copyright © Ellis Cohen, 2002-2005 22
ROWAC
On write,Coordinator must acquire X locks for all replicas, and writes to all of them
On read,Coordinator acquires S lock for the one replica it will actually read
Ensures 1SRCan use non-locking also
ROWA Advantage:Can read from any replica
ROWA Disadvantage:Every write requires communication round trip involving the farthest & slowest replicas
Serious ROWA ProblemIf a replica site crashes, the coordinator and all competing transactions must wait until it recovers
Solutions• All Available Writes• Quorum Consensus (Read Some, Write Some)
Copyright © Ellis Cohen, 2002-2005 23
All Available Writes (AAW)
At transaction start, assumeAll replicas are available
Coordinator writes byWriting to all known available replicas.Those which do not ACK within timeout period are marked as unavailable but otherwise ignored
Coordinator reads byReading from chosen (e.g. nearest) replica.If it times out, mark it as unavailable, and read from a different replica
Coordinator augments 2PC withMissing Writes Validation: Makes sure that all replicas that were not written to are still unavailableAccess Validation: Make sure that all replicas read or written are still available.This is necessary for 1SR
C
Copyright © Ellis Cohen, 2002-2005 24
Partitioning
Assume a set of replicas are partitioned.
C1 C2
Majority Partition ApproachOnly if the partition contains a (weighted) majority of the replicas.
Disconnected OperationEach can continue. Requires reconciliation when the network recovers (discuss later).
Can each partition continue executing read-write transactions that update its set of replicas?
Copyright © Ellis Cohen, 2002-2005 25
Site Recovery
On restart– Site contacts sibling replica– Obtains & processes [relevant portion of]
log of all (sub)transactions committed while site was down, carefully in case
[a] new transaction completes while processing the log
– Makes itself available again (i.e. responds to reads and writes)
• Many variations of this protocol, esp to accommodate– Dynamic creation, removal and relocation
of replica sites
C
Copyright © Ellis Cohen, 2002-2005 26
Multiple ReadsRead from n replicas in parallel
•Allows fastest one to respond
•Avoids taking time for reading another replica if first one is unavailable
•Use Voting: Detect/correct errors/sabotage by comparing results of multiple reads
•Guarantees getting latest value even if not all replicas were updated (Quorum Consensus Protocol: Requires that the write set contains weighted majority of replicas)
C
Copyright © Ellis Cohen, 2002-2005 27
ROWA Summary
ROWA AdvantagesGlobal Consistency & 1SRCan read from any replica
ROWA DisadvantagesEvery write requires writing all
(available) replicasHigh overhead for every write(Can trade off write all for quorum
read, though it is generally more expensive)
Copyright © Ellis Cohen, 2002-2005 28
Eager Approaches
Copyright © Ellis Cohen, 2002-2005 29
Eager Update Model
SynchronousWrite-Synchronous (ROWA)
Coordinator's write operation is not completed until every replica is updated
Commit-Synchronous (EAGER)All replicas commit (using 2PC) as part of coordinator's transaction
AsynchronousAs-Needed-Propagation (LAZY)
After transaction ends, updates are propagated to other replicas as needed
Eventual-Propagation (EVENTUAL)After transaction ends, updates are eventually propagated to other replicas
Copyright © Ellis Cohen, 2002-2005 30
ROWA Consistency ModelImmediate Consistency
Write-Synchronous (ROWA)Coordinator's write operation is not
completed until every replica is updated
Transactional ConsistencyCommit-Synchronous (EAGER)
All replicas commit (using 2PC) as part of coordinator's transaction
As-Needed-Propagation (LAZY)After transaction ends, updates are
propagated to other replicas as needed
Eventual ConsistencyEventual-Propagation (EVENTUAL)
After transaction ends, updates are eventually propagated to other replicas
Copyright © Ellis Cohen, 2002-2005 31
Eager Overview
Read & WriteCoordinator uses a single replica for all reads & writes of replication group data. [If replicas hold partial snapshot, may need to read/write from multiple ones]There are variants that just write to the master.
Transactional Total ConsistencyAt the end of each transaction, all replicas in the group have a consistent model of the committed data.
Updates made at a single replica are consistently propagated to others at/by commit-time.
TopologyEither Group-based or Master/Snapshot
ConcurrencyAll concurrency mechanisms can be used
When might the Eager model be used?
Copyright © Ellis Cohen, 2002-2005 32
Eager UsesHot StandbyUpon failure, another replica can be switched in immediately, although transactions which updated the failed replica will need to be aborted
Disconnected OperationIf the network is partitioned and contains a replica, operations can continue• Read-only transactions will have access to an up-to-
date version of the data• Read/write operations can continue if reconciliation
is supported
SerializabilityEnsuring transactional consistency ensures that concurrent transactions which use different replicas are serializable.
Commits can be expensive, since they require 2PC involving every replica
Copyright © Ellis Cohen, 2002-2005 33
Eager Master/Snapshot
C
Coordinator interacts with a single replica (e.g. nearest one) chosen from the replica groupDuring 2PC– Coordinator requests PREPARE from that replica– (Unless the chosen replica is the master), the
chosen replica requests PREPARE from the Master, propagating all updates along with the request
– The master requests PREPARE from all the other snapshot replicas, propagating all updates along with the request
Read & write from single
replica
What happens in an hierarchical master topology?If the transaction uses data from two replication groups, which have replicas on the same machine, how does that affect 2PC?
Copyright © Ellis Cohen, 2002-2005 34
Eager Master/Snapshot Concurrency
C
Lock-BasedData locked at primary copy. Either the
coordinator or the chosen replica requests those locks.
Non-Lock-BasedValidation/checking is done at the master, which
acts as a commit gateway
Can use either locking or non-locking concurrency
Copyright © Ellis Cohen, 2002-2005 35
Eager Propagation Models
When and how are updates propagated–from chosen replica to master–from master to other snapshot replicas
•Transactional BatchSend batched information about writes to replicas along with PREPARE message
•Continuous on WritePropagate each update when it occurs (don't wait for the end of the transaction)
• Immediate ConfirmationPropagate each update when it occurs, and wait for an ACK. Similar to ROWA, but propagates managed by the replication group, not by the coordinator.
Copyright © Ellis Cohen, 2002-2005 36
Propagation Capture & ApplyIn what format are updates "captured" where they are
made, and how are they applied by the other replicas?
Log-Based– Operations (logical log format; operation may
need to be modified for partial replicas)– Deltas (physiological log format:
"before" & "after" values of rows)
Procedural– Suppose each transaction is implemented by a
stored DB procedure. Just propagate the identity of the procedure and the parameters to it
– May require that replicas be complete
Copyright © Ellis Cohen, 2002-2005 37
Eager Group
Coordinator interacts with a single replica (e.g. nearest one) chosen from the replica group
During 2PC– Coordinator requests PREPARE from that
replica– That replica requests PREPARE from all the
other snapshot replicas, propagating all updates along with the request
No primary copy, so– Must use a non-locking protocol– Validation/checking must be done at every replica
Read & write from single
replicaC
Copyright © Ellis Cohen, 2002-2005 38
Eager Variants
All reads and writes are to master only– Other replicas used for hot standbys, or to
support disconnected operation
– Used to implement 2-safe backup
All writes are to master only– Queries of data unchanged by current
transaction can be directed to any replica
– How about querying data affected by the transaction's updates
• Must either be directed to master
• Coordinator maintains client-side cache with all changes, and queries use cache + any replica
Copyright © Ellis Cohen, 2002-2005 39
Eager Summary
EAGER AdvantagesGlobal Consistency & 1SR
Need not immediately propagate each write
Can read/write from any single replica(except for variants)
EAGER DisadvantagesEvery commit requires propagating to all
(available) replicas
High overhead for every commit
Copyright © Ellis Cohen, 2002-2005 40
MultiMaster Model
MultiMaster(Combines
Group & Master)
What kind of update model should be used among the
master sites?
What kind of update model should be
used among a master and its
snapshots?
Copyright © Ellis Cohen, 2002-2005 41
Reconciliation
Copyright © Ellis Cohen, 2002-2005 42
Failure and Partitioning
When eager replication is used, the replicas all need to be able to communicate with one another.
Failure prevents communication.– Site failure -- a site crashes– Network failure -- a link or links fail,
partitioning the network.
A live replica can't tell which of these is responsible for its inability to communicate.
It can generally assume that it is in a partition with just the replicas it can communicate with.
Copyright © Ellis Cohen, 2002-2005 43
Partitioning
Assume a set of replicas are partitioned.
C1C2
Majority Partition ApproachOnly if the partition contains a (weighted) majority of the replicas.
Disconnected OperationEach can continue. Requires merging (a.k.a. reconciliation) when the network recovers.
Can each partition continue executing read-write transactions that update its set of replicas?
Copyright © Ellis Cohen, 2002-2005 44
Primary Copy Election
In a master/snapshot topology, each partition needs a primary copy. What if a partition doesn’t have one?
• Majority Partition ApproachUse weights to ensure that the majority partition contains the primary copy. [But what if the primary copy itself crashed?]
• Elect a Primary CopyElect a primary copy using an election protocol [similar to 3PC protocol to elect a new coordinator]
What should be done in a multimaster environment?
Copyright © Ellis Cohen, 2002-2005 45
Discovering Transaction Conflicts
As part of healing (i.e. recovery from) a network partition, conflicts may be discovered between committed transactions that were in disconnected partitions.
Modification (W/W) Conflicts:Transactions in different partitions modified the
same data item (inconsistently).Can lead to lost updates.
R/W Conflicts:A transaction in one partition read data that was
modified in the other partition.Can lead to non-serializable results; however,
because the results in each partition are consistent (w.r.t. the partition), it is sometimes acceptable to ignore pure R/W conflicts.
Copyright © Ellis Cohen, 2002-2005 46
Eager Reconciliation Approaches
Compensation"Undo" conflicting committed
transactions by executing compensating transactions.
Tentative CommitWhen disconnected, transactions only
commit tentatively. During reconciliation, these are either fully committed or aborted.
Conflict ResolutionConflicting modifications are resolved by
"merging" the changes.
Copyright © Ellis Cohen, 2002-2005 47
Primary vs Group ReconciliationEager Primary Reconciliation
During healing, the elected primary provides a description (typically the log) of transactions committed during partition to the original primary
The original primary identifies and reconciles all conflicts, which are (in the normal course of things) propagated to all the replicas
Eager Group ReconciliationDuring healing, a replica provides its changes to
some or all replicas it was partitioned from.Each replica identifies, reconciles and
propagates changes independently. To maintain consistency, this implies
– Symmetric reconciliation: The results of reconciliation must be identical at each replica, independent of the order in which changes and propagated updates are received
– A replica must be able to ignore changes and propagated updates it has already processed
Copyright © Ellis Cohen, 2002-2005 48
Compensation
Every transaction that might need to be "undone" has a compensating transaction associated with it.
A committed transaction that has a conflict is "undone" by executing its compensating transaction (often followed by re-executing the original transaction)
This can lead to cascading compensation. Any committed transaction which read data written by the original transaction may need to have its compensating transaction run as well.
Copyright © Ellis Cohen, 2002-2005 49
Motivating Tentative Commit
If we can delay all commitments during partition, we can simply abort conflicting transactions during healing.
However, commitment is necessary for reducing resource conflicts
Long-running transactions that don't commit– If lock-based: Can block other
transactions for long periods
– If validation-based: Are more likely to fail validation
Copyright © Ellis Cohen, 2002-2005 50
Tentative CommitmentDuring network partition, commits are tentative
– A tentatively committed transaction is not yet durable and may subsequently be aborted.
– However, other transactions may see its updates. This can lead to cascaded aborts, so they must be tentative as well.
Reconciliation resolves tentative commits– Transactions without conflicts will be fully
committed– Transactions with conflicts will be aborted
Usually uses primary reconciliation– All resolution is done at the primary copy– If group reconciliation is used, it must be
symmetric, otherwise transactions will be committed at some sites and aborted at others
A system might also allow a transaction to explicitly commit tentatively (even without using replicas), and then be either
committed or aborted at a later time (forcing cascaded aborts)
Copyright © Ellis Cohen, 2002-2005 51
Clients & Tentative Commitment
Explicit AbortA client may be able to explicitly abort a
transaction that is still only tentatively committed
Triggering & NotificationA client may be able to arrange to– execute a procedure when a tentatively
committed transaction is about to be committed (and which could actually abort the transaction)
– to notify the user or (more generally) execute a procedure after a tentatively committed transaction is committed or aborted.
Copyright © Ellis Cohen, 2002-2005 52
Identifying Modification Conflicts
A site may receive an unprocessed update (transaction log entry) which conflicts with its current state
Update ConflictOld value of log entry <> Current record state
Insert ConflictPrimary key of record to be inserted is already
in the table
Delete ConflictPrimary key of record to be updated or deleted
not present in table
Copyright © Ellis Cohen, 2002-2005 53
Resolution Techniquesfor Modification Conflicts
Latest TimestampIf update timestamp > data timestamp, do
update, else discardExample: Address Change
MaxIf new value > current data value, do update,
else discardExample: Max daily temperature
AdditiveData value := current data value +
update's new value - update's old valueExample: Bank account balance
These are built-in to Oracle; others may be defined by DBA.
These conflict resolution techniques can be used as a prelude to either compensation or tentative commit resolution.