Solving Big Data Challenges for Enterprise Application Performance Management

  • Published on

  • View

  • Download

Embed Size (px)


This is a presentation that was held at the 38th Conference on Very Large Databases (VLDB), 2012. Full paper and additional information available at: Abstract: As the complexity of enterprise systems increases, the need for monitoring and analyzing such systems also grows. A number of companies have built sophisticated monitoring tools that go far beyond simple resource utilization reports. For example, based on instrumentation and specialized APIs, it is now possible to monitor single method invocations and trace individual transactions across geographically distributed systems. This high-level of detail enables more precise forms of analysis and prediction but comes at the price of high data rates (i.e., big data). To maximize the benefit of data monitoring, the data has to be stored for an extended period of time for ulterior analysis. This new wave of big data analytics imposes new challenges especially for the application performance monitoring systems. The monitoring data has to be stored in a system that can sustain the high data rates and at the same time enable an up-to-date view of the underlying infrastructure. With the advent of modern key-value stores, a variety of data storage systems have emerged that are built with a focus on scalability and high data rates as predominant in this monitoring use case. In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. We evaluated these systems with data and workloads that can be found in application performance monitoring, as well as, on-line advertisement, power monitoring, and many other use cases. We present our insights not only as performance results but also as lessons learned and our experience relating to the setup and configuration complexity of these data stores in an industry setting.


  • 1.MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORGSolving Big Data Challenges for Enterprise Application Performance Management Tilmann Rabl, Mohammad Sadoghi, Hans-Arno Jacobsen University of TorontoSergio Gomez-Villamor Universitat Politecnica de CatalunyaVictor Muntes-Mulero*, Serge Mankowskii CA Labs (Europe*)

2. Agenda Application Performance ManagementAPM BenchmarkBenchmarked SystemsTest ResultsTilmann Rabl - VLDB 20128/26/2012 3. MotivationTilmann Rabl - VLDB 20128/26/2012 4. Enterprise System Architecture SAP AgentClient Client ClientIdentity ManagerAgentWeb Server AgentApplication ServerAgentApplication ServerAgentMessage QueueDatabaseAgentMessage BrokerAgentAgentWeb Service AgentApplication ServerClientMain Frame3rd PartyDatabaseAgentMeasurementsAgentAgent Tilmann Rabl - VLDB 2012Agent 8/26/2012 5. Application Performance Management SAP AgentClient Client ClientIdentity ManagerAgentWeb Server AgentApplication ServerAgentApplication ServerAgentMessage QueueDatabaseAgentMessage BrokerAgentAgentWeb Service AgentApplication ServerClientMain Frame3rd PartyDatabaseAgentMeasurementsAgentAgent Tilmann Rabl - VLDB 2012Agent 8/26/2012 6. APM in Numbers Nodes in an enterprise system 10 sec 100B / eventRaw data Up to 50K Avg 10KReporting periodData size Metrics per node 100 10K100MB / sec 355GB / h 2.8 PB / yEvent rate at storage > 1M / secTilmann Rabl - VLDB 20128/24/2012 7. APM Benchmark Based on Yahoo! Cloud Serving Benchmark (YCSB) Single table CRUD operations 25 byte key 5 values (10 byte each) 75 byte / recordFive workloads50 rows scan length Tilmann Rabl - VLDB 20128/24/2012 8. Benchmarked Systems 6 systems Categories Cassandra HBase Project Voldemort Redis VoltDB MySQL Key-value stores Extensible record stores Scalable relational stores Sharded systems Main memory systems Disk based systemsChosen by Previous results, popularity, maturity, availabilityTilmann Rabl - VLDB 20128/26/2012 9. Classification of Benchmarked SystemsDisk-Based Stores Distributed In-Memory Sharded Stores Key-value storesExtensible Row StoresScalable Relational StoresTilmann Rabl - VLDB 20128/26/2012 10. Experimental Testbed Two clusters Cluster M (memory-bound) 16 nodes (plus master node) 2x quad core CPU,16 GB RAM, 2x 74GB HDD (RAID 0) 10 million records per node (~700 MB raw) 128 connections per node (8 per core)Cluster D (disk-bound) 24 nodes 2x dual core CPU, 4 GB RAM, 74 GB HDD 150 million records on 12 nodes (~10.5 GB raw) 8 connections per node Tilmann Rabl - VLDB 20128/24/2012 11. Evaluation Minimum 3 runs per workload and system Fresh install for each run 10 minutes runtime Up to 5 clients for 12 nodes To make sure YCSB is no bottleneckMaximum achievable throughputTilmann Rabl - VLDB 20128/24/2012 12. Workload W - Throughput Cassandra dominates Higher throughput for Cassandra and HBase Lower throughput for other systems Scalability not as good for all web stores VoltDB best single node throughput Tilmann Rabl - VLDB 20128/24/2012 13. Workload W Latency WritesSame latencies for Cassandra,Voldemort,VoltDB,Redis, MySQLHBase latency increasedTilmann Rabl - VLDB 20128/24/2012 14. Workload W Latency ReadsSame latency as for R for Cassandra,Voldemort, Redis,VoltDB, HBaseHBase latency in second rangeTilmann Rabl - VLDB 20128/24/2012 15. Workload R - Throughput 95% reads, 5% writes On a single node, main memory systems have best performance Linear scalability for web data stores Sublinear scalability for sharded systems Slow-down for VoltDB Tilmann Rabl - VLDB 20128/24/2012 16. Workload RW - Throughput 50% reads, 50% writes VoltDB highest single node throughput HBase and Cassandra throughput increase MySQL and Voldemort throughput reduction Tilmann Rabl - VLDB 20128/24/2012 17. Workload RS Latency Scans HBase latency equal to reads in Workload R MySQL latency very high due to full table scans Similar but increased latency for Cassandra, Redis,VoltDB Tilmann Rabl - VLDB 20128/24/2012 18. Workload RWS - Throughput 25% reads, 25% scans, 50% writes Cassandra, HBase, Redis,VoltDB gain performance MySQL performance 20 4 ops / secTilmann Rabl - VLDB 20128/24/2012 19. Bounded Throughput Write Latency Workload R, normalized Maximum throughput in previous tests 100% Steadily decreasing for most systems HBase not as stable (but below 0.1 ms) Tilmann Rabl - VLDB 20128/24/2012 20. Bounded Throughput Read Latency Redis and Voldemort slightly decrease HBase two states Cassandra decreases linearly MySQL decreases and then stays constant Tilmann Rabl - VLDB 20128/24/2012 21. Disk Usage Raw data: 75 byte per record, 0.7GB/10M Cassandra: 2.5GB / 10M HBase: 7.5GB / 10M No compression Tilmann Rabl - VLDB 20128/24/2012 22. Cluster D Results Throughput Disk bound, 150M records on 8 nodes More writes more throughput Cassandra: 26x Hbase: 15x Voldemort 3x Tilmann Rabl - VLDB 20128/24/2012 23. Cluster D Results Latency Writes Low latency for writes for all systems Relatively stable latency for writes Lowest for HBase Equal for Voldemort and Cassandra Tilmann Rabl - VLDB 20128/24/2012 24. Cluster D Results Latency Reads Cassandra latency decreases with more writes Voldemort latency low 5-20ms HBase latency high (up to 200 ms) Cassandra in between Tilmann Rabl - VLDB 20128/24/2012 25. Lessons Learned YCSB Cassandra Higher number of connections might lead to better resultsMySQL Jedis distribution unevenVoldemort Difficult setup, special JVM settings necessaryRedis Optimal tokens for data distribution necessaryHBase Client to host ratio 1:3 better 1:2Eager scan evaluationVoltDB Synchronous communication slow on multiple nodes Tilmann Rabl - VLDB 20128/24/2012 26. Conclusions I Cassandra HBase Read and write latency stable at low levelRedis Scales well, latency decreases with higher scalesVoldemort Low write latency, high read latency, low throughputSharded MySQL Winner in terms of scalability and throughputStandalone has high throughput, sharded version does not scale wellVoltDB High single node throughput, does not scale for synchronous accessTilmann Rabl - VLDB 20128/24/2012 27. Conclusions II Cassandras performance close to APM requirements Additional improvements needed for reliably sustaining APM workloadFuture work Benchmark impact of replication, compression Monitor the benchmark runs using APM toolTilmann Rabl - VLDB 20128/26/2012 28. Thanks! Questions?Contact: Tilmann Rabl tilmann.rabl@utoronto.caTilmann Rabl - VLDB 20128/24/2012