Upload
tilmann-rabl
View
512
Download
2
Tags:
Embed Size (px)
DESCRIPTION
This is a presentation that was held at the 38th Conference on Very Large Databases (VLDB), 2012. Full paper and additional information available at: http://msrg.org/papers/vldb12-bigdata Abstract: As the complexity of enterprise systems increases, the need for monitoring and analyzing such systems also grows. A number of companies have built sophisticated monitoring tools that go far beyond simple resource utilization reports. For example, based on instrumentation and specialized APIs, it is now possible to monitor single method invocations and trace individual transactions across geographically distributed systems. This high-level of detail enables more precise forms of analysis and prediction but comes at the price of high data rates (i.e., big data). To maximize the benefit of data monitoring, the data has to be stored for an extended period of time for ulterior analysis. This new wave of big data analytics imposes new challenges especially for the application performance monitoring systems. The monitoring data has to be stored in a system that can sustain the high data rates and at the same time enable an up-to-date view of the underlying infrastructure. With the advent of modern key-value stores, a variety of data storage systems have emerged that are built with a focus on scalability and high data rates as predominant in this monitoring use case. In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. We evaluated these systems with data and workloads that can be found in application performance monitoring, as well as, on-line advertisement, power monitoring, and many other use cases. We present our insights not only as performance results but also as lessons learned and our experience relating to the setup and configuration complexity of these data stores in an industry setting.
Citation preview
MIDDLEWARE SYSTEMS
RESEARCH GROUP
MSRG.ORG
Solving Big Data Challenges for
Enterprise Application
Performance Management
Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen
University of Toronto
Sergio Gomez-Villamor
Universitat Politecnica de Catalunya
Victor Muntes-Mulero*, Serge Mankowskii
CA Labs (Europe*)
Agenda
Application Performance Management
APM Benchmark
Benchmarked Systems
Test Results
8/26/2012 Tilmann Rabl - VLDB 2012
Motivation
8/26/2012 Tilmann Rabl - VLDB 2012
Enterprise System Architecture
Measurements
Client
Client
Client Web Server
Application
Server
Application
Server
Application
Server
Database
Database
Client
Web
Service
Main Frame
3rd Party
Identity
Manager
SAP
Message
Broker
Message
Queue
Agent
Agent Agent
Agent
Agent
Agent
Agent Agent
Agent
Agent
Agent Agent
Agent
8/26/2012 Tilmann Rabl - VLDB 2012
Application
Server Agent
Application Performance
Management
Client
Client
Client Web Server
Application
Server
Application
Server
Database
Database
Client
Web
Service
Main Frame
3rd Party
Identity
Manager
SAP
Message
Broker
Message
Queue
Measurements
Agent
Agent Agent
Agent
Agent
Agent
Agent Agent
Agent
Agent
Agent Agent
8/26/2012 Tilmann Rabl - VLDB 2012
APM in Numbers
Nodes in an enterprise
system
100 – 10K
Metrics per node
Up to 50K
Avg 10K
Reporting period
10 sec
Data size
100B / event
Raw data
100MB / sec
355GB / h
2.8 PB / y
Event rate at storage
> 1M / sec
8/24/2012 Tilmann Rabl - VLDB 2012
APM Benchmark
Based on Yahoo! Cloud Serving Benchmark (YCSB)
CRUD operations
Single table
25 byte key
5 values (10 byte each)
75 byte / record
Five workloads
50 rows scan length
8/24/2012 Tilmann Rabl - VLDB 2012
Benchmarked Systems 6 systems
Cassandra
HBase
Project Voldemort
Redis
VoltDB
MySQL
Categories Key-value stores
Extensible record stores
Scalable relational stores
Sharded systems
Main memory systems
Disk based systems
Chosen by Previous results, popularity, maturity, availability
8/26/2012 Tilmann Rabl - VLDB 2012
Classification of Benchmarked
Systems
8/26/2012 Tilmann Rabl - VLDB 2012
Key-value stores Extensible Row
Stores
Scalable Relational
Stores
In-Memory
Stores
Disk-Based
Stores
Sharded
Distributed
Experimental Testbed
Two clusters
Cluster M (memory-bound)
16 nodes (plus master node)
2x quad core CPU,16 GB RAM, 2x 74GB HDD (RAID 0)
10 million records per node (~700 MB raw)
128 connections per node (8 per core)
Cluster D (disk-bound)
24 nodes
2x dual core CPU, 4 GB RAM, 74 GB HDD
150 million records on 12 nodes (~10.5 GB raw)
8 connections per node
8/24/2012 Tilmann Rabl - VLDB 2012
Evaluation
Minimum 3 runs per workload and system
Fresh install for each run
10 minutes runtime
Up to 5 clients for 12 nodes
To make sure YCSB is no bottleneck
Maximum achievable throughput
8/24/2012 Tilmann Rabl - VLDB 2012
Workload W - Throughput
Cassandra dominates
Higher throughput for Cassandra and HBase
Lower throughput for other systems
Scalability not as good for all web stores
VoltDB best single node throughput
8/24/2012 Tilmann Rabl - VLDB 2012
Workload W – Latency Writes
Same latencies for
Cassandra, Voldemort, VoltDB,Redis, MySQL
HBase latency increased
8/24/2012 Tilmann Rabl - VLDB 2012
Workload W – Latency Reads
Same latency as for R for
Cassandra, Voldemort, Redis, VoltDB, HBase
HBase latency in second range
8/24/2012 Tilmann Rabl - VLDB 2012
95% reads, 5% writes
On a single node, main memory systems have best performance
Linear scalability for web data stores
Sublinear scalability for sharded systems
Slow-down for VoltDB
Workload R - Throughput
8/24/2012 Tilmann Rabl - VLDB 2012
Workload RW - Throughput
50% reads, 50% writes
VoltDB highest single node throughput
HBase and Cassandra throughput increase
MySQL and Voldemort throughput reduction
8/24/2012 Tilmann Rabl - VLDB 2012
Workload RS – Latency Scans
HBase latency equal to reads in Workload R
MySQL latency very high due to full table scans
Similar but increased latency for
Cassandra, Redis, VoltDB
8/24/2012 Tilmann Rabl - VLDB 2012
Workload RWS - Throughput
25% reads, 25% scans, 50% writes
Cassandra, HBase, Redis, VoltDB gain performance
MySQL performance 20 – 4 ops / sec
8/24/2012 Tilmann Rabl - VLDB 2012
Bounded Throughput –
Write Latency
Workload R, normalized
Maximum throughput in previous tests 100%
Steadily decreasing for most systems
HBase not as stable (but below 0.1 ms)
8/24/2012 Tilmann Rabl - VLDB 2012
Bounded Throughput –
Read Latency
Redis and Voldemort slightly decrease
HBase two states
Cassandra decreases linearly
MySQL decreases and then stays constant
8/24/2012 Tilmann Rabl - VLDB 2012
Disk Usage
Raw data: 75 byte per record, 0.7GB/10M
Cassandra: 2.5GB / 10M
HBase: 7.5GB / 10M
No compression
8/24/2012 Tilmann Rabl - VLDB 2012
Cluster D Results – Throughput
Disk bound, 150M records on 8 nodes
More writes – more throughput
Cassandra: 26x
Hbase: 15x
Voldemort 3x
8/24/2012 Tilmann Rabl - VLDB 2012
Cluster D Results – Latency Writes
Low latency for writes for all systems
Relatively stable latency for writes
Lowest for HBase
Equal for Voldemort and Cassandra
8/24/2012 Tilmann Rabl - VLDB 2012
Cluster D Results – Latency Reads
Cassandra latency decreases with more writes
Voldemort latency low 5-20ms
HBase latency high (up to 200 ms)
Cassandra in between
8/24/2012 Tilmann Rabl - VLDB 2012
Lessons Learned
YCSB Client to host ratio 1:3 better 1:2
Cassandra Optimal tokens for data distribution necessary
HBase Difficult setup, special JVM settings necessary
Redis Jedis distribution uneven
Voldemort Higher number of connections might lead to better results
MySQL Eager scan evaluation
VoltDB Synchronous communication slow on multiple nodes
8/24/2012 Tilmann Rabl - VLDB 2012
Conclusions I
Cassandra
Winner in terms of scalability and throughput
HBase
Low write latency, high read latency, low throughput
Sharded MySQL
Scales well, latency decreases with higher scales
Voldemort
Read and write latency stable at low level
Redis
Standalone has high throughput, sharded version does not scale well
VoltDB
High single node throughput, does not scale for synchronous access
8/24/2012 Tilmann Rabl - VLDB 2012
Conclusions II
8/26/2012 Tilmann Rabl - VLDB 2012
Cassandra’s performance close to APM requirements
Additional improvements needed for reliably sustaining APM
workload
Future work
Benchmark impact of replication, compression
Monitor the benchmark runs using APM tool