Causal Consistency Without Dependency Check Messages Willy Zwaenepoel

Causal Consistency WithoutDependency Check Messages

Willy Zwaenepoel

INTRODUCTION

Geo-replicated data stores• Geo-replicated data centers • Full replication between data centers• Data in each data center partitioned

Data center

Data center

Data center

Data center

Data center

The Consistency Dilemma

Strong Consistency

• synchronous replication

• all replicas share same consistent view

• sacrifice availability

Causal Consistency

• asynchronous replication

• all replicas eventually converge

• sacrifice consistency, but …

• … replication respects causality

Eventual Consistency

• asynchronous replication

• all replicas eventually converge

• sacrifice consistency

The cost of causal consistency

1 2 3 4 5 60

100

200

300

400

500

600

700

Throughput

EventualCausal

Partitions

Kops

/sec

Can we close the throughput gap?

• The answer is: yes, but there is a price

STATE OF THE ART: WHY THE GAP?

How is causality enforced?

• Each update has associated dependencies

• What are dependencies?– metadata to establish causality relations between operations– used only for data replication

• Internal dependencies– previous updates of the same client session

• External dependencies– read updates of other client sessions

Internal dependencies

W(y = 2)

AliceW(x = 1)

Bob

US Datacenter

Europe Datacenter

R(y) y = 0 R(y)

y = 2

Example of 2 users performing operations at different datacenters at the same partition

External dependencies

R(x)

Alice

W(x = 1)

Bob

US Datacenter

Europe Datacenter

x = 1

R(y) y = 0 R(y)

y = 2

W(y = x + 1)

Charlie

Example of 3 users performing operations at datacenters at the same partition

How dependencies are tracked & checked• In current implementations

– COPS [SOSP ’11], ChainReaction [Eurosys ‘13], Eiger [NSDI ’13], Orbe [SOCC ’13]

• DepCheck(A) – “Do you have A installed yet?”

Partition 0 Partition 1 Partition N…

Partition 0 Partition 1 …US Datacenter

Europe Datacenter

Client

Read(A)

Read

(B) Write(C, A+B)

DepCheck(B)DepCheck(A)

Partition N

Encoding of dependencies

• COPS [SOSP ’11], ChainR. [Eurosys ‘13], Eiger [NSDI ’13]– “direct” dependencies– Worst case: O( reads before a write )

• Orbe [SOCC ‘13]– Dependency matrix– Worst case: O( partitions )

The main issues

• Metadata size is considerable– for both storage and communucation

• Remote dependency checks are expensive– multiple partitions are queried for each update


1 2 3 4 5 60

100

200

300

400

500

600

700

Throughput

EventualCausal

Partitions

Kops

/sec

The cost of dependency check messages

1 2 3 4 5 60

100

200

300

400

500

600

700

Throughput

EventualCausalCausal - Dep check msgs

Partitions

Kops

/sec

CAUSAL CONSISTENCY WITH0/1 DEPENDENCY CHECK MESSAGES

Getting rid of external dependencies

• Partitions serve only fully replicated updates– Replication Confirmation messages broadcast periodically

• External dependencies are removed– replication information implies dependency installation

• Internal dependencies are minimized– we only track the previous write– requires at most one remote check

• zero if write is local, one if it is remote

The new replication workflow

Example of 2 users performing operations at different datacenters at the same partition

AliceW(x = 1)

Bob

R(x)

US Datacenter

Europe Datacenter

Asia Datacenter

Replication Confirmation(periodically)

R(x)

x = 0

x = 1

Reading your own writes• Clients need not wait for the replication confirmation

– they can see their own updates immediately– other clients’ updates are visible once they are fully replicated

• Multiple logical update spaces

Global update space (fully visible)

Replication update space (not yet visible)

Alice’s update space

(visible to Alice)

Bob’s update space

(visible to Bob)…


1 2 3 4 5 60

100

200

300

400

500

600

700

Throughput

Eventual01-msg CausalConventional Causal

Partitions

Kops

/sec

The price paid: update visibility increased

• With new implementation~ max ( network latency from origin to furthest replica +

network latency from furthest replica to destination +interval of replication information broadcast )

• With conventional implementation:~ network latency from origin to destination

CAUSAL CONSISTENCY WITHOUTDEPENDENCY CHECK MESSAGES ?!

Is it possible?

• Only make update visible when one can locally determine no causally preceding update will become visible later at another partition

How to do that?

• Encode causality by means of Lamport clock

• Each partition maintains its Lamport clock• Each update is timestamped with Lamport clock

• Update visible – update.timestamp ≤ min( Lamport clocks )

• Periodically compute minimum

0-msg causal consistency - throughput

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

100200300400500600700800900

Throughput

Eventual0-msg CausalConventional Causal

Partitions

Kops

/sec

The price paid: update visibility increased

• With new implementation~ max ( network latency from origin to furthest replica +

network latency from furthest replica to destination +interval of minimum computation )

• With conventional implementation:~ network latency from origin to destination

How to deal with “stagnant” Lamport clock?

• Lamport clock stagnates if no update in a partition

• Combine – Lamport clock– Loosely synchronized physical clock– (easy to do)

More on loosely synchronized physical clocks

• Periodically broadcast clock

• Reduces update visibility latency to– Network latency from furthest replica to destination +

maximum clock skew +clock broadcast interval

Can we close the throughput gap?

• The answer is: yes, but there is a price• The price is increased update visibility

Method Throughput # of dep.check messages

Update visibility

Conventional < Evt. consistency O( Rs since W )O( partitions )

DD

01-msg ~ Evt. consistency O or 1 ~ 2 Dmax

0-msg ~ Evt. consistency 0 ~ 2 Dmax

0-msg + physical clock ~ Evt. consistency 0 ~ Dmax

Conclusion: Throughput, messages, latency

Documents

Causal Consistency Without Dependency Check Messages Willy Zwaenepoel