28
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Solving Big Data Challenges for Enterprise Application Performance Management Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen University of Toronto Sergio Gomez-Villamor Universitat Politecnica de Catalunya Victor Muntes-Mulero*, Serge Mankowskii CA Labs (Europe*)

Solving Big Data Challenges for Enterprise Application Performance Management

Embed Size (px)

DESCRIPTION

This is a presentation that was held at the 38th Conference on Very Large Databases (VLDB), 2012. Full paper and additional information available at: http://msrg.org/papers/vldb12-bigdata Abstract: As the complexity of enterprise systems increases, the need for monitoring and analyzing such systems also grows. A number of companies have built sophisticated monitoring tools that go far beyond simple resource utilization reports. For example, based on instrumentation and specialized APIs, it is now possible to monitor single method invocations and trace individual transactions across geographically distributed systems. This high-level of detail enables more precise forms of analysis and prediction but comes at the price of high data rates (i.e., big data). To maximize the benefit of data monitoring, the data has to be stored for an extended period of time for ulterior analysis. This new wave of big data analytics imposes new challenges especially for the application performance monitoring systems. The monitoring data has to be stored in a system that can sustain the high data rates and at the same time enable an up-to-date view of the underlying infrastructure. With the advent of modern key-value stores, a variety of data storage systems have emerged that are built with a focus on scalability and high data rates as predominant in this monitoring use case. In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. We evaluated these systems with data and workloads that can be found in application performance monitoring, as well as, on-line advertisement, power monitoring, and many other use cases. We present our insights not only as performance results but also as lessons learned and our experience relating to the setup and configuration complexity of these data stores in an industry setting.

Citation preview

Page 1: Solving Big Data Challenges for Enterprise Application Performance Management

MIDDLEWARE SYSTEMS

RESEARCH GROUP

MSRG.ORG

Solving Big Data Challenges for

Enterprise Application

Performance Management

Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen

University of Toronto

Sergio Gomez-Villamor

Universitat Politecnica de Catalunya

Victor Muntes-Mulero*, Serge Mankowskii

CA Labs (Europe*)

Page 2: Solving Big Data Challenges for Enterprise Application Performance Management

Agenda

Application Performance Management

APM Benchmark

Benchmarked Systems

Test Results

8/26/2012 Tilmann Rabl - VLDB 2012

Page 3: Solving Big Data Challenges for Enterprise Application Performance Management

Motivation

8/26/2012 Tilmann Rabl - VLDB 2012

Page 4: Solving Big Data Challenges for Enterprise Application Performance Management

Enterprise System Architecture

Measurements

Client

Client

Client Web Server

Application

Server

Application

Server

Application

Server

Database

Database

Client

Web

Service

Main Frame

3rd Party

Identity

Manager

SAP

Message

Broker

Message

Queue

Agent

Agent Agent

Agent

Agent

Agent

Agent Agent

Agent

Agent

Agent Agent

Agent

8/26/2012 Tilmann Rabl - VLDB 2012

Page 5: Solving Big Data Challenges for Enterprise Application Performance Management

Application

Server Agent

Application Performance

Management

Client

Client

Client Web Server

Application

Server

Application

Server

Database

Database

Client

Web

Service

Main Frame

3rd Party

Identity

Manager

SAP

Message

Broker

Message

Queue

Measurements

Agent

Agent Agent

Agent

Agent

Agent

Agent Agent

Agent

Agent

Agent Agent

8/26/2012 Tilmann Rabl - VLDB 2012

Page 6: Solving Big Data Challenges for Enterprise Application Performance Management

APM in Numbers

Nodes in an enterprise

system

100 – 10K

Metrics per node

Up to 50K

Avg 10K

Reporting period

10 sec

Data size

100B / event

Raw data

100MB / sec

355GB / h

2.8 PB / y

Event rate at storage

> 1M / sec

8/24/2012 Tilmann Rabl - VLDB 2012

Page 7: Solving Big Data Challenges for Enterprise Application Performance Management

APM Benchmark

Based on Yahoo! Cloud Serving Benchmark (YCSB)

CRUD operations

Single table

25 byte key

5 values (10 byte each)

75 byte / record

Five workloads

50 rows scan length

8/24/2012 Tilmann Rabl - VLDB 2012

Page 8: Solving Big Data Challenges for Enterprise Application Performance Management

Benchmarked Systems 6 systems

Cassandra

HBase

Project Voldemort

Redis

VoltDB

MySQL

Categories Key-value stores

Extensible record stores

Scalable relational stores

Sharded systems

Main memory systems

Disk based systems

Chosen by Previous results, popularity, maturity, availability

8/26/2012 Tilmann Rabl - VLDB 2012

Page 9: Solving Big Data Challenges for Enterprise Application Performance Management

Classification of Benchmarked

Systems

8/26/2012 Tilmann Rabl - VLDB 2012

Key-value stores Extensible Row

Stores

Scalable Relational

Stores

In-Memory

Stores

Disk-Based

Stores

Sharded

Distributed

Page 10: Solving Big Data Challenges for Enterprise Application Performance Management

Experimental Testbed

Two clusters

Cluster M (memory-bound)

16 nodes (plus master node)

2x quad core CPU,16 GB RAM, 2x 74GB HDD (RAID 0)

10 million records per node (~700 MB raw)

128 connections per node (8 per core)

Cluster D (disk-bound)

24 nodes

2x dual core CPU, 4 GB RAM, 74 GB HDD

150 million records on 12 nodes (~10.5 GB raw)

8 connections per node

8/24/2012 Tilmann Rabl - VLDB 2012

Page 11: Solving Big Data Challenges for Enterprise Application Performance Management

Evaluation

Minimum 3 runs per workload and system

Fresh install for each run

10 minutes runtime

Up to 5 clients for 12 nodes

To make sure YCSB is no bottleneck

Maximum achievable throughput

8/24/2012 Tilmann Rabl - VLDB 2012

Page 12: Solving Big Data Challenges for Enterprise Application Performance Management

Workload W - Throughput

Cassandra dominates

Higher throughput for Cassandra and HBase

Lower throughput for other systems

Scalability not as good for all web stores

VoltDB best single node throughput

8/24/2012 Tilmann Rabl - VLDB 2012

Page 13: Solving Big Data Challenges for Enterprise Application Performance Management

Workload W – Latency Writes

Same latencies for

Cassandra, Voldemort, VoltDB,Redis, MySQL

HBase latency increased

8/24/2012 Tilmann Rabl - VLDB 2012

Page 14: Solving Big Data Challenges for Enterprise Application Performance Management

Workload W – Latency Reads

Same latency as for R for

Cassandra, Voldemort, Redis, VoltDB, HBase

HBase latency in second range

8/24/2012 Tilmann Rabl - VLDB 2012

Page 15: Solving Big Data Challenges for Enterprise Application Performance Management

95% reads, 5% writes

On a single node, main memory systems have best performance

Linear scalability for web data stores

Sublinear scalability for sharded systems

Slow-down for VoltDB

Workload R - Throughput

8/24/2012 Tilmann Rabl - VLDB 2012

Page 16: Solving Big Data Challenges for Enterprise Application Performance Management

Workload RW - Throughput

50% reads, 50% writes

VoltDB highest single node throughput

HBase and Cassandra throughput increase

MySQL and Voldemort throughput reduction

8/24/2012 Tilmann Rabl - VLDB 2012

Page 17: Solving Big Data Challenges for Enterprise Application Performance Management

Workload RS – Latency Scans

HBase latency equal to reads in Workload R

MySQL latency very high due to full table scans

Similar but increased latency for

Cassandra, Redis, VoltDB

8/24/2012 Tilmann Rabl - VLDB 2012

Page 18: Solving Big Data Challenges for Enterprise Application Performance Management

Workload RWS - Throughput

25% reads, 25% scans, 50% writes

Cassandra, HBase, Redis, VoltDB gain performance

MySQL performance 20 – 4 ops / sec

8/24/2012 Tilmann Rabl - VLDB 2012

Page 19: Solving Big Data Challenges for Enterprise Application Performance Management

Bounded Throughput –

Write Latency

Workload R, normalized

Maximum throughput in previous tests 100%

Steadily decreasing for most systems

HBase not as stable (but below 0.1 ms)

8/24/2012 Tilmann Rabl - VLDB 2012

Page 20: Solving Big Data Challenges for Enterprise Application Performance Management

Bounded Throughput –

Read Latency

Redis and Voldemort slightly decrease

HBase two states

Cassandra decreases linearly

MySQL decreases and then stays constant

8/24/2012 Tilmann Rabl - VLDB 2012

Page 21: Solving Big Data Challenges for Enterprise Application Performance Management

Disk Usage

Raw data: 75 byte per record, 0.7GB/10M

Cassandra: 2.5GB / 10M

HBase: 7.5GB / 10M

No compression

8/24/2012 Tilmann Rabl - VLDB 2012

Page 22: Solving Big Data Challenges for Enterprise Application Performance Management

Cluster D Results – Throughput

Disk bound, 150M records on 8 nodes

More writes – more throughput

Cassandra: 26x

Hbase: 15x

Voldemort 3x

8/24/2012 Tilmann Rabl - VLDB 2012

Page 23: Solving Big Data Challenges for Enterprise Application Performance Management

Cluster D Results – Latency Writes

Low latency for writes for all systems

Relatively stable latency for writes

Lowest for HBase

Equal for Voldemort and Cassandra

8/24/2012 Tilmann Rabl - VLDB 2012

Page 24: Solving Big Data Challenges for Enterprise Application Performance Management

Cluster D Results – Latency Reads

Cassandra latency decreases with more writes

Voldemort latency low 5-20ms

HBase latency high (up to 200 ms)

Cassandra in between

8/24/2012 Tilmann Rabl - VLDB 2012

Page 25: Solving Big Data Challenges for Enterprise Application Performance Management

Lessons Learned

YCSB Client to host ratio 1:3 better 1:2

Cassandra Optimal tokens for data distribution necessary

HBase Difficult setup, special JVM settings necessary

Redis Jedis distribution uneven

Voldemort Higher number of connections might lead to better results

MySQL Eager scan evaluation

VoltDB Synchronous communication slow on multiple nodes

8/24/2012 Tilmann Rabl - VLDB 2012

Page 26: Solving Big Data Challenges for Enterprise Application Performance Management

Conclusions I

Cassandra

Winner in terms of scalability and throughput

HBase

Low write latency, high read latency, low throughput

Sharded MySQL

Scales well, latency decreases with higher scales

Voldemort

Read and write latency stable at low level

Redis

Standalone has high throughput, sharded version does not scale well

VoltDB

High single node throughput, does not scale for synchronous access

8/24/2012 Tilmann Rabl - VLDB 2012

Page 27: Solving Big Data Challenges for Enterprise Application Performance Management

Conclusions II

8/26/2012 Tilmann Rabl - VLDB 2012

Cassandra’s performance close to APM requirements

Additional improvements needed for reliably sustaining APM

workload

Future work

Benchmark impact of replication, compression

Monitor the benchmark runs using APM tool

Page 28: Solving Big Data Challenges for Enterprise Application Performance Management

Thanks!

Questions?

Contact: Tilmann Rabl

[email protected]

8/24/2012 Tilmann Rabl - VLDB 2012