Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Towards Understanding the Performance of Distributed Database Management Systems in Volatile EnvironmentsJörg Domaschka and Daniel SeyboldInstitute of Information Resource ManagementUlm University | Ulm | Germany
Page 2 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Current Trends of Data-intensive Applications
Web 2.0 Big Data IoT
appl
icat
ion
dom
ains
&
requ
irem
ents
appl
icat
ion
arch
itect
ures
Infr
astr
uctu
res
Page 3 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Current Trends of Data-intensive Applications
Web 2.0 Big Data IoT
appl
icat
ion
dom
ains
&
requ
irem
ents
appl
icat
ion
arch
itect
ures
infr
astr
uctu
res
performance scalability
elasticity availability
Page 4 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
performance scalability
elasticity availability
performance scalability
elasticity availability
Current Trends of Data-intensive Applications
Web 2.0 Big Data IoT
performance scalability
elasticity availability
appl
icat
ion
dom
ains
&
requ
irem
ents
appl
icat
ion
arch
itect
ures
Infr
astr
uctu
res
Page 5 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Current Trends of Data-intensive Applications
Web 2.0 Big Data IoT
performance scalability
elasticity availability
appl
icat
ion
dom
ains
&
requ
irem
ents
appl
icat
ion
arch
itect
ures
infr
astr
uctu
res
performance scalability
elasticity availability
performance scalability
elasticity availability
Page 6 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Current Trends of Data-intensive Applications
Web 2.0 Big Data IoT
performance scalability
elasticity availability
appl
icat
ion
dom
ains
&
requ
irem
ents
appl
icat
ion
arch
itect
ures
infr
astr
uctu
res
performance scalability
elasticity availability
https://www.gartner.com/en/documents/3941821/the-future-of-the-dbms-market-is-cloud
Page 7 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Contribution
How to operate distributed DBMS in the cloud?
Insights in operating DBMS on cloud resources:
distributed DBMS impact factors
cloud resource impact factors
selected DBMS and cloud resource-centric evaluation results
Page 8 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Contribution
How to operate distributed DBMS in the cloud?
Insights in operating DBMS on cloud resources:
distributed DBMS impact factors
cloud resource impact factors
selected DBMS and cloud resource-centric evaluation results
the results summarizes the insights of a series of DBMS evaluation publications
Page 9 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Distributed DBMS Impact Factors
Page 10 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Distributed DBMS Impact Factors
data models1 sharding & scale1,2 replication & consistency2
1Mazumdar, S., Seybold, D., Kritikos, K., & Verginadis, Y. (2019). A survey on data storage and placement methodologies for cloud-big data ecosystem. Journal of Big Data, 6(1), 15. 2Domaschka, J., Hauser, C. B., & Erb, B. (2014, September). Reliability and availability properties of distributed database systems. In 2014 IEEE 18th International Enterprise Distributed Object Computing Conference (pp. 226-233). IEEE.
Page 11 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Distributed DBMS Impact Factors
data models sharding & scale replication & consistency
RDBMS
NewSQL
NoSQL
Page 12 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Distributed DBMS Impact Factors
data models sharding & scale replication & consistency
RDBMS
NewSQL
NoSQLcluster size
architecture
shardingmechanism
Page 13 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Distributed DBMS Impact Factors
data models sharding & scale replication & consistency
RDBMS
NewSQL
NoSQLcluster size
architecture
shardingmechanism
a*
a
consistency model replication
mechanism
replication factor
Page 14 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Distributed DBMS Impact Factors
data models sharding & scale replication & consistency
RDBMS
NewSQL
NoSQLcluster size
architecture
shardingmechanism
a*
a
consistency model replication
mechanism
replication factor
> 220 NoSQL & 20 NewSQL DBMS on the market1
1http://nosql-database.org/
Page 15 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Distributed DBMS Impact Factors
DBMS impact factors
cluster size
sharding
factor
consistency model
range hash
replication
ACID BASE scope
data model
RDBMS NewSQL NoSQL …
client-side consistency
architecture
single master-slave
multi-master
scalability – elasticity – availability performance
configurable
predefined
Page 16 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Cloud Resource Impact Factors
Page 17 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Cloud Resource Impact Factors
provider1 resource type2 resource characteristics3
1Baur, D., Seybold, D., Griesinger, F., Masata, H., & Domaschka, J. (2018, May). A provider-agnostic approach to multi-cloud orchestration using a constraint language. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (pp. 173-182). IEEE Press.2Seybold, D., Hauser, C. B., Eisenhart, G., Volpert, S., & Domaschka, J. (2018, August). The Impact of the Storage Tier: A Baseline Performance Analysis of Containerized DBMS. In European Conference on Parallel Processing (pp. 93-105). Springer, Cham.3Seybold, D., Hauser, C. B., Volpert, S., & Domaschka, J. (2017, October). Gibbon: An availability evaluation framework for distributed databases. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 31-49). Springer
Page 18 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Cloud Resource Impact Factors
provider1 resource type resource characteristics
1Baur, D., Seybold, D., Griesinger, F., Masata, H., & Domaschka, J. (2018, May). A provider-agnostic approach to multi-cloud orchestration using a constraint language. In Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (pp. 173-182). IEEE Press.
public private
resource offerings
Page 19 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Cloud Resource Impact Factors
provider resource type resource characteristics
public
bare metal
private VMcontainer
storagesizing
resource offerings
Page 20 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Cloud Resource Impact Factors
provider resource type resource characteristics
bare metal
VMcontainer
storagesizing
interferences
failures
public private
resource offerings
Page 21 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Cloud Resource Impact Factors
provider resource type resource characteristics
bare metal
VMcontainer
storagesizing
interferences
failures
public private
resource offerings
> 20.000 public cloud resource offerings1
1https://cloudharmony.com/
Page 22 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Cloud Resource Impact Factors
cloud resource impact factors
characteristicsresource type
interferences failuresVM bare metal
provider
AWS … OpenStack container
scalability – elasticity – availability performance
configurable
predefined
storage
HDD SSD remote
sizing sizing
Page 23 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
How to operate distributed DBMS in the Cloud?
Page 24 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Cloud Resource Impact Factors
scalability – elasticity – availability performance
configurable
predefined
cloud resource impact factors
characteristicsresource type
range hashVM bare metal
provider
AWS …Open-Stack
container
storage
HDD SSD remote
DBMS impact factors
cluster size
sharding
factor
consistency model
range hash
replication
ACID BASE scope
data model
RDBMS
NewSQL
NoSQL …
client-side consistency
architecture
master-slave
mult-master
Page 25 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Evaluation Scenario: Client-Consistency & Cluster Size
scalability – elasticity – availability performance
dynamic
static
cloud resource impact factors
characteristicsresource type
range hashVM bare metal
provider
AWS …Open-Stack
container
storage
HDD SSD remote
DBMS impact factors
sharding
factor
consistency model
range hash
replication
ACID BASE scope
data model
RDBMS
NewSQL
NoSQL …
client-side consistency
architecture
master-slave
mult-master
cluster size
Page 26 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Evaluation Scenario: Client-Consistency & Cluster Size
evaluation environment
provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage
DBMS: Apache Cassandraversion: 3.11
Workload: YCSBType: write-heavy
complete evaluation details1
1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.
Page 27 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Evaluation Scenario: Client-Consistency & Cluster Size
evaluation environment
provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage
DBMS: Apache Cassandraversion: 3.11
Workload: YCSBType: write-heavy
complete evaluation details1
1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.
consistency – performance impact
Page 28 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Evaluation Scenario: Client-Consistency & Cluster Size
evaluation environment
provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage
DBMS: Apache Cassandraversion: 3.11
Workload: YCSBType: write-heavy
complete evaluation details1
1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.
scalability
Page 29 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Evaluation Scenario: Client-Consistency & Cluster Size
evaluation environment
provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage
DBMS: Couchbaseversion: 5.0.1
Workload: YCSBType: write-heavy
complete evaluation details1
1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.
consistency – performance impact
Page 30 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Evaluation Scenario: Client-Consistency & Cluster Size
evaluation environment
provider: OpenStack Ulmresource: VMsizing: 2 cores – 4GB memory – SSD storage
DBMS: Couchbaseversion: 5.0.1
Workload: YCSBType: write-heavy
complete evaluation details1
1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.
consistency – performance impact
scalability
Page 31 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Lessons Learned: Client-Consistency & Cluster Size
minor changes to the DBMS configuration may have significant performance impact
scalability depends on (resources1), DBMS runtime configuration and workload properties
1Seybold, D., Keppler, M., Gründler, D., & Domaschka, J. (2019, April). Mowgli: Finding your way in the DBMS jungle. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering (pp. 321-332). ACM.
Page 32 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Evaluation Scenario: Resource Types
scalability – elasticity – availability performance
configurable
predefined
cloud resource impact factors
characteristicsresource type
range hash
VM bare metal
provider
AWS …Open-Stack
container
storage
HDD SSD remote
DBMS impact factors
cluster size
sharding
factor
consistency model
range hash
replication
ACID BASE scope
data model
RDBMS
NewSQL
NoSQL …
client-side consistency
architecture
master-slave
mult-master
Page 33 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Evaluation Scenario: Resource Types
evaluation environment
provider: OpenStack Ulmresource: VMsizing: 4 cores – 4GB memory – SSD storage
DBMS: MongoDBversion: 3.6.3
Workload: YCSBType: write-heavy
complete evaluation details1
1Seybold, D., Hauser, C. B., Eisenhart, G., Volpert, S., & Domaschka, J. (2018, August). The Impact of the Storage Tier: A Baseline Performance Analysis of Containerized DBMS. In European Conference on Parallel Processing (pp. 93-105). Springer, Cham.
physical
container
DBMS
physical
container
DBMS
physical
VM
DBMS
physical
VM
container
DBMS
Page 34 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Lessons Learned: Resource Types
virtualization reduces DBMS performance
storage location is an important and challenging decision for operating DBMS in the cloud
DBMS in containers on VM introduce neglectable overhead compared to operational benefits
Page 35 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Conclusion & Outlook
Page 36 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Conclusion
DBMS evaluation need to consider DBMS, cloud resource and workload characteristics
comprehensive DBMS evaluations are technically challenging, time consuming and error prone
Tool support is required!
Page 37 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Conclusion
DBMS evaluation need to consider DBMS, cloud resource and workload characteristics
comprehensive DBMS evaluations are technically challenging, time consuming and error prone
Tool support is required!
Mowgli Framework: fully automates DBMS evaluations and enables reproducible and portable evaluations!
Page 38 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Outlook
Advanced distributed DBMS evaluations: complex DBMS workloads – DBMS elasticity and availability – self-hosted DBMS vs. DBaaS
Automated DBMS operation in the cloud
Page 39 06.11.2019 | Symposium on Software Performance | Towards Understanding the Performance of Distributed DBMS in Volatile Environments
Thank you!
The research leading to these results has received funding from the EC's Framework Programme HORIZON 2020 under grant agreement number 731664 (MELODIC) and 732667 (RECAP).
Mowgli Software:https://omi-gitlab.e-technik.uni-ulm.de/mowgli
Release 0.1:https://zenodo.org/record/3341512#.XcFnRehKiUk
DBMS Evaluation Data Sets:
Performance & Scalability:https://zenodo.org/record/3518786#.XcFnf-hKiUk
Elasticity:https://zenodo.org/record/3362279#.XcFnmehKiUk
containerized DBMS https://github.com/omi-uulm/Containerized-DBMS-Evaluation