15
A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso , Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland

A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Embed Size (px)

Citation preview

Page 1: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication

Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland

Page 2: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Distributed STMs

STMs are being employed in new scenarios:Database caches in three-tier web apps

(FénixEDU)HPC programming language (X10) In-memory cloud data grids (Coherence,

Infinispan)

New challenges:ScalabilityFault-tolerance

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland

REPLICATION

2

Page 3: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Partial Replication

Each site stores a partial copy of the data.

Genuine partial replication schemes maximize scalability by ensuring that:Only data sites that replicate data item read or

written by a transaction T, exchange messages for executing/committing T.

Existing 1-Copy Serializable implementations enforce distributed validation of read-only transactions [SRDS10]: considerable overheads in typical workloads

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 3

Page 4: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Issues with Partial Replication

Extending existing local multiversion (MV) STMs is not enough.

Local MV STMs rely on a single global counter to track version advancement.

Problem:Commit of transactions should involve ALL NODES

NO GENUINENESS = POOR SCALABILITY

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 4

Page 5: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

GMU: Genuine Multiversion Update-Serializable Replication

[ICDCS12]

In the execution/commit phase of a transaction T, ONLY nodes which store data items accessed by T are involved.

It uses multiple versions for each data item

It builds visible snapshots = freshest consistent snapshots taking into account:1. causal dependencies vs. previously committed transactions

at the time a transaction began,

2. previous reads executed by the same transaction

Vector clocks used to establish visible snapshots

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland

G M U

5

Page 6: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

High Level Overview (i)Transactions commit using a vector clock.

Each node stores a log of committed vector clocks.

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 6

Initial view of the visible snapshotUpon a transaction T begins on N: it acquires the

most recent vector clock in N’s commit log.

View extension of the visible snapshotUpon T reads on a node N:

T’s vector clock can be modified according to N’s commit log.

Three reading rules are applied using T’s vector clock.

Page 7: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

High Level Overview (ii)

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 7

Write operationUpon a transaction T writes V on data item O: it

inserts <O,V> in T’s write-set.

Commit operationRead-only transactions always commit.Update transactions run a genuine 2-Phase Commit:

Upon prepare message reception (participant-side)acquire read/write locks and validate read-set,send back a tentative commit vector clock.

If all replies are positive (coordinator-side)multicast write-set and final commit vector

clock.

Page 8: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Rule 1: Reading Lower BoundNode 0 Node 1

(it stores X)Node 2

(it stores Y)

X(2)

X(2)T1:R(X)

(1,1,1)

(1,2,2)

(1,1,1)

Y(2)

(1,2,2)

T0:W(X,v)

T0:W(Y,w)

(1,1,1)

T1:R(Y)Y(2)

(1,2,2)

Most recent VC in VCLog

T1.VC

T0:Commit

Commit

(1,2,2)T1.VC

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 8

Page 9: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Rule 2: Reading Upper BoundNode 0 Node 1

(it stores X)Node 2

(it stores Y)

X(3)

Y(2)

X(1)T1:R(X)

(1,1,1)

(1,3,3)

(1,1,1)

Y(3)

(1,3,3)

T0:W(X,v)

T0:W(Y,w)

X(1)

(1,1,1)

T1:R(Y) Y(2)

T1:Commit

(1,1,1)

Most recent VC in VCLog

T1.VC

T0:CommitCommit

(1,1,2)T1.VC

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland

(1,1,2)

Y(1)

9

Page 10: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Rule 3: Selection of Data Versions

Informally: observe the most recent consistent version of data item id on node i based on T’s history (previous reads).

Formally: iterate over the versions of id and return the most recent one s.t.

id.version.VN <= T.VC[i]

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 10

Page 11: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Building the commit Vector Clock

Based on a variant of the Skeen’s total order multicast algorithm [SKEEN85].

Intuition:Serialize all-and-only conflicting transactions,

trackingdirect and transitive conflict dependencies,causal relationship

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 11

Page 12: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Consistency Criterion

GMU ensures Extended Update Serializability:Update Serializability [ICDT86] ensures:

1-Copy-Serializabilty (1CS) on the history restricted to committed update transactions;

1CS on the history restricted to committed update transactions and any single read-only transaction. But it can admit non-1CS histories containing at least 2 read-

only transactions.

Extended Update Serializability [Adya99]:ensures US property also to executing

transactions;analogous to opacity in STMs.

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 12

Page 13: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Experiments on private cluster

8 core physical nodes

TPC-C- 90% read-only xacts- 10% update xacts

- 4 threads per node

- moderate contention (15% abort rate at 20 nodes)

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 13

Page 14: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

Thanks for the attention

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland 14

Page 15: A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís

References

Euro-TM Workshop on Transactional Memory (WTM 2012), Bern, Switzerland

[Adya99] A. Adya, “Weak consistency: A generalized theory and optimistic implementations for distributed transactions,” tech. rep., PhD Thesis, Massachusetts Institute of Technology, 1999.[ICDCS12] Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, Luís Rodrigues. “When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Replication”. The IEEE 32nd International Conference on Distributed Computing Systems, June, 2012.[ICDT86] R. C. Hansdah and L. M. Patnaik, “Update serializability in locking,”. International Conference of Database Theory, vol. 243 of Lecture Notes in Computer Science, pp. 171–185, Springer Berlin / Heidelberg, 1986. [SKEEN85] D. Skeen. “Unpublished communication”, 1985. Referenced in K. Birman, T. Joseph “Reliable Communication in the Presence of Failures”, ACM Trans. on Computer Systems, 47-76, 1987 [SRDS10] Nicolas Schiper, Pierre Sutra, Fernando Pedone. “P-Store: Genuine Partial Replication in Wide Area Networks”. Proc. of the 29th Symposium of Reliable Distributed Systems, 2010.

15