Upload
calvin-cooper
View
224
Download
4
Tags:
Embed Size (px)
Citation preview
Causal Consistency WithoutDependency Check Messages
Willy Zwaenepoel
INTRODUCTION
Geo-replicated data stores• Geo-replicated data centers • Full replication between data centers• Data in each data center partitioned
Data center
Data center
Data center
Data center
Data center
The Consistency Dilemma
Strong Consistency
• synchronous replication
• all replicas share same consistent view
• sacrifice availability
Causal Consistency
• asynchronous replication
• all replicas eventually converge
• sacrifice consistency, but …
• … replication respects causality
Eventual Consistency
• asynchronous replication
• all replicas eventually converge
• sacrifice consistency
The cost of causal consistency
1 2 3 4 5 60
100
200
300
400
500
600
700
Throughput
EventualCausal
Partitions
Kops
/sec
Can we close the throughput gap?
• The answer is: yes, but there is a price
STATE OF THE ART: WHY THE GAP?
How is causality enforced?
• Each update has associated dependencies
• What are dependencies?– metadata to establish causality relations between operations– used only for data replication
• Internal dependencies– previous updates of the same client session
• External dependencies– read updates of other client sessions
Internal dependencies
W(y = 2)
AliceW(x = 1)
Bob
US Datacenter
Europe Datacenter
R(y) y = 0 R(y)
y = 2
Example of 2 users performing operations at different datacenters at the same partition
External dependencies
R(x)
Alice
W(x = 1)
Bob
US Datacenter
Europe Datacenter
x = 1
R(y) y = 0 R(y)
y = 2
W(y = x + 1)
Charlie
Example of 3 users performing operations at datacenters at the same partition
How dependencies are tracked & checked• In current implementations
– COPS [SOSP ’11], ChainReaction [Eurosys ‘13], Eiger [NSDI ’13], Orbe [SOCC ’13]
• DepCheck(A) – “Do you have A installed yet?”
Partition 0 Partition 1 Partition N…
Partition 0 Partition 1 …US Datacenter
Europe Datacenter
Client
Read(A)
Read
(B) Write(C, A+B)
DepCheck(B)DepCheck(A)
Partition N
Encoding of dependencies
• COPS [SOSP ’11], ChainR. [Eurosys ‘13], Eiger [NSDI ’13]– “direct” dependencies– Worst case: O( reads before a write )
• Orbe [SOCC ‘13]– Dependency matrix– Worst case: O( partitions )
The main issues
• Metadata size is considerable– for both storage and communucation
• Remote dependency checks are expensive– multiple partitions are queried for each update
The cost of causal consistency
1 2 3 4 5 60
100
200
300
400
500
600
700
Throughput
EventualCausal
Partitions
Kops
/sec
The cost of dependency check messages
1 2 3 4 5 60
100
200
300
400
500
600
700
Throughput
EventualCausalCausal - Dep check msgs
Partitions
Kops
/sec
CAUSAL CONSISTENCY WITH0/1 DEPENDENCY CHECK MESSAGES
Getting rid of external dependencies
• Partitions serve only fully replicated updates– Replication Confirmation messages broadcast periodically
• External dependencies are removed– replication information implies dependency installation
• Internal dependencies are minimized– we only track the previous write– requires at most one remote check
• zero if write is local, one if it is remote
The new replication workflow
Example of 2 users performing operations at different datacenters at the same partition
AliceW(x = 1)
Bob
R(x)
US Datacenter
Europe Datacenter
Asia Datacenter
Replication Confirmation(periodically)
R(x)
x = 0
x = 1
Reading your own writes• Clients need not wait for the replication confirmation
– they can see their own updates immediately– other clients’ updates are visible once they are fully replicated
• Multiple logical update spaces
Global update space (fully visible)
Replication update space (not yet visible)
Alice’s update space
(visible to Alice)
Bob’s update space
(visible to Bob)…
The cost of causal consistency
1 2 3 4 5 60
100
200
300
400
500
600
700
Throughput
Eventual01-msg CausalConventional Causal
Partitions
Kops
/sec
The price paid: update visibility increased
• With new implementation~ max ( network latency from origin to furthest replica +
network latency from furthest replica to destination +interval of replication information broadcast )
• With conventional implementation:~ network latency from origin to destination
CAUSAL CONSISTENCY WITHOUTDEPENDENCY CHECK MESSAGES ?!
Is it possible?
• Only make update visible when one can locally determine no causally preceding update will become visible later at another partition
How to do that?
• Encode causality by means of Lamport clock
• Each partition maintains its Lamport clock• Each update is timestamped with Lamport clock
• Update visible – update.timestamp ≤ min( Lamport clocks )
• Periodically compute minimum
0-msg causal consistency - throughput
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
100200300400500600700800900
Throughput
Eventual0-msg CausalConventional Causal
Partitions
Kops
/sec
The price paid: update visibility increased
• With new implementation~ max ( network latency from origin to furthest replica +
network latency from furthest replica to destination +interval of minimum computation )
• With conventional implementation:~ network latency from origin to destination
How to deal with “stagnant” Lamport clock?
• Lamport clock stagnates if no update in a partition
• Combine – Lamport clock– Loosely synchronized physical clock– (easy to do)
More on loosely synchronized physical clocks
• Periodically broadcast clock
• Reduces update visibility latency to– Network latency from furthest replica to destination +
maximum clock skew +clock broadcast interval
Can we close the throughput gap?
• The answer is: yes, but there is a price• The price is increased update visibility
Method Throughput # of dep.check messages
Update visibility
Conventional < Evt. consistency O( Rs since W )O( partitions )
DD
01-msg ~ Evt. consistency O or 1 ~ 2 Dmax
0-msg ~ Evt. consistency 0 ~ 2 Dmax
0-msg + physical clock ~ Evt. consistency 0 ~ Dmax
Conclusion: Throughput, messages, latency