Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
AIFB, eOrganization Research Group
www.kit.edu
Towards Comprehensive Measurement of Consistency Guarantees for Cloud-Hosted Data Storage ServicesDavid Bermbach (KIT)Liang Zhao, Sherif Sakr (NICTA & UNSW)
© David Bermbach
AIFB – eOrganization Research Group2
Motivation
Developing applications on top of only eventually consistent cloudstorage services is difficult and requires a change in mindsetsMost applications can tolerate a certain degree of uncertainty as longas they know about the extentIn some cases eventual consistency is just not acceptable but changingthe behavior of storage systems requires knowledge of their guaranteeinternaConsistency guarantees are affected by additional influence factors(e.g., failures, workload, geo-distribution, …); this should be analyzed in a comprehensive consistency benchmark.
David Bermbach
AIFB – eOrganization Research Group3
What is consistency?
Two different definitions“C“ from ACID is not what distributed systems see as consistency
Basic idea: If all replicas are identical and all ordering guarantees of theconsistency model are observed, then we call the datastore consistentEventual Consistency: In the absence of updates all replicas eventuallyconverge towards a consistent state
David Bermbach
AIFB – eOrganization Research Group4
What is consistency? (cont.)
Two perspectives: data-centric vs. client-centric
David Bermbach02.09.2013
AIFB – eOrganization Research Group5
What is consistency? (cont.)
Two perspectives: data-centric vs. client-centric
Two dimensions:Staleness: How far are replicas lagging behindOrdering: How many requests are executed out of order
David Bermbach02.09.2013
AIFB – eOrganization Research Group6
Metrics
StalenessIn terms of time (t-Visibility)In terms of versions (k-Staleness)
OrderingProbability of violations of Monotonic Read ConsistencyProbability of violations of Monotonic Write ConsistencyProbability of violations of Read Your Writes Consistency
David Bermbach02.09.2013
AIFB – eOrganization Research Group7
Challenges
Accuracy and MeaningfulnessChoose appropriate metricsConsider accuracy of the measurement approach
WorkloadsSystem loadActual measurement workload (highly volatile reaction, so keep it simple)
Geo-replicationSingle site vs. multi site deployments vs. anything in between
FailuresCloud service vs. self-hosted systemCrash-stop, crash-recover, byzantine
Multi-tenancyCross-effects between tenants
David Bermbach02.09.2013
AIFB – eOrganization Research Group9
Architecture
02.09.2013
YCSB
Not yet implemented
e.g., Simian Army
Based on Bermbach & Tai, MW4SOC 2011
David Bermbach
AIFB – eOrganization Research Group10
Experiments
Study effects ofdifferent workloadsgeo-distribution
on consistency guarantees ofCassandra: (3,1,1) quorumMongoDB: Master-Slave
David Bermbach02.09.2013
AIFB – eOrganization Research Group11
Setup
Storage system deployed on 3 medium EC2 instances running in oneAZ (eu-west-1a), multiple AZ (eu-west) or multiple regions (eu-west, us-west, asia-pacific)YCSB running on xlarge instanceConsistency measurement running on 5 small instances per testLoad balancer routes requests to the closest replicaMeasure consistency
Without additional workloadWith 80% reads and 20% writes in parallel (YCSB)With 20% reads and 80% writes in parallel (YCSB)
David Bermbach02.09.2013
AIFB – eOrganization Research Group12
Staleness Results for Cassandra
David Bermbach02.09.2013
Single-AZ
Multi-AZ
Multi-region
<1ms for 98% ofrequests
AIFB – eOrganization Research Group13
Staleness Results for MongoDB
David Bermbach02.09.2013
Single-AZ Multi-AZ Multi-region
AIFB – eOrganization Research Group14
Effects of parallel workload
No effect except for multi-region Cassandra:
David Bermbach02.09.2013
Workload 1 Workload 2
AIFB – eOrganization Research Group15
Conclusion
A comprehensive consistency benchmark shoulduse meaningful and accurate (fine-grained & continuous) metricsconsider effects of workloads, geo-distribution, multi-tenancy and failurespreferably reuse and integrate popular, proven solutions
Increased geo-distribution results in larger staleness values for bothMongoDB and CassandraIncreased workloads seem to have little effect on consistency as longas CPU utilization is below 100%
David Bermbach02.09.2013
AIFB – eOrganization Research Group16
Thank you for your attention
Questions?
David Bermbach02.09.2013
This paper was joint work withLiang Zhao and Sherif Sakr atNICTA and UNSW:=> [email protected]=> [email protected]