Upload
andreas-grabner
View
1.785
Download
1
Embed Size (px)
Citation preview
And other Tips & Tricks to make you a “Performance Expert”More on http://blog.dynatrace.comAndreas Grabner - @grabnerandi
Java One 2015 – Deep Dive Top Performance Mistakes
Safe Harbor
AND MANY MORE
0.02ms
0.01ms
15 Years: That’s why I ended up talking about performance
Where do your Stories come
from?
#1: Real Life & Real User Stories
#2: http://bit.ly/onlineperfclinic
#3: http://bit.ly/sharepurepath
20%80%
Frontend PerformanceWe are getting FATer!
Example of a “Bad” Web Deployment 282! Objects on that page9.68MB Page Size
8.8s Page Load Time
Most objects are images delivered from your main
domain
Very long Connect time (1.8s) to your CDN
Mobile landing page of Super Bowl ad
434 Resources in total on that page:230 JPEGs, 75 PNGs, 50 GIFs, …
Total size of ~ 20MB
Fifa.com during Worldcup
Source: http://apmblog.compuware.com/2014/05/21/is-the-fifa-world-cup-website-ready-for-the-tournament/
8MB of background image for STPCon (Word Press)
Make F12 or Browser Agent your friend!
Compare yourself Online!
Key Metrics
# of ResourcesSize of ResourcesTotal Size of Content
• Browser Built-In Developer Tools• Extensions such as YSlow, PageSpeed• Online Tools
• WebPageTest• Google PageSpeed Insights• Dynatrace Performance Center• ...
• Automate!! With Selenium, WebDriver, Cucumber, ...
Tooling
Frontend AvailabilityBack to Basics Please!
Online Services for you: Is it down right now?
Online Services for you: Outage Analyzer
Tip for handling Spike Load: GO LEAN!!
Response time improved 4x
1h before SuperBowl KickOff
1h after Game ended
Key Metrics
HTTP 3xx, 4xx, 5xx# of Domains
• Dynatrace Synthetic• Ruxit Synthetic• NewRelic Synthetic• AppDynamics• PingDom• ... Just Google for „Synthetic Monitoring“
Online Services
Backend PerformanceThe Usual Suspects
• Symptoms• HTML takes between 60 and 120s to render• High GC Time
• Developer Assumptions• Bad GC Tuning• Probably bad Database Performance as rendering was simple
• Result: 2 Years of Finger pointing between Dev and DBA
Project: Online Room Reservation System
Developers built own monitoringvoid roomreservationReport(int officeId){ long startTime = System.currentTimeMillis(); Object data = loadDataForOffice(officeId); long dataLoadTime = System.currentTimeMillis() - startTime; generateReport(data, officeId);}
Result:Avg. Data Load Time: 45s!
DB Tool says:Avg. SQL Query: <1ms!
#1: Loading too much data24889! Calls to the Database
API!
High CPU and High Memory Usage to keep all data in Memory
#2: On individual connections 12444! individual
connections
Classical N+1 Query Problem
Individual SQL really <1ms
#3: Putting all data in temp Hashtable
Lots of time spent in Hashtable.get
Called from their Entity Objects
• … you know what code is doing you inherited!!• … you are not making mistakes like this
• Explore the Right Tools• Built-In Database Analysis Tools• “Logging” options of Frameworks such as Hibernate, …• JMX, Perf Counters, … of your Application Servers• Performance Tracing Tools: Dynatrace, Ruxit, NewRelic,
AppDynamics, Your Profiler of Choice …
Lessons Learned – Don’t Assume …
Key Metrics# of SQL Calls# of same SQL Execs (1+N)# of ConnectionsRows/Data Transferred
Backend PerformanceArchitectural Mistakes with „Migrating“ to (Micro)Services
26.7s Execution Time
33! Calls to the same Web Service
171! SQL Queries through LINQ by this Web Service – request
similar data for each call
Architecture Violation: Direct access to DB instead from frontend logic
21671! Calls to Oracle
3136! Calls to H2 mostly executed on async background
threads
33! Different connections used
DB Exceptions on both Databases
DB Exceptions on both Databases
40! internal Web Service Calls that do all these DB
Updates
Key Metrics# of Service CallsPayload of Service Calls# of Involved Threads1+N Service Call Pattern!
• Dynatrace• Ruxit• NewRelic• AppDynamics• Any Profiler that can trace across tiers• Google for Tracing or APM (Application Performance Management)
Tooling
LoggingWE CAN LOG THIS!!
LOG
Log Hotspots in Frameworks!callAppenders clear CPU and I/O Hotspot
Excessive logging through Spring Framework
Debug Log and outdated log4j library#1: Top Problem: log4j.callAppenders
-> 71% Sync Time
#2: Most of logging done from fillDetail method
#3: Doing “DEBUG” log output: Is this necessary?
Key Metrics
# of Log EntriesSize of Logs per Use Case
Response Time is not the only Performance IndicatorLook at Resources as well
Is this a successful new Build?
Look at Resource Usage: CPU, Memory, …
Memory? Look at Heap Generations
Root Cause: Dependency Injection
Prevent: Monitor Memory Metrics for every Build
#3: Growing “Old Gen” is a good indicator
for a Mem Leak
#4: Heavy GC kicks in when
Old Generation is
full!
#5: Throughput of Application
goes to 0 due to no memory
available
#1: Eden Space stays constant. Objects being propagated to
Survivor Space
#2: GC Activity in Young Generation ultimately moves objects into Old
Generation
Key Metrics# of Objects per Generation# of GC RunsTotal Impact of GC
Tips & TricksAnd more Metrics of course
Tip: Layer Breakdown over Time
With increasing load: Which LAYER doesn’t SCALE?
Tip: Exceptions and Log Messages
How are # of EXCEPTIONS evolving over time?
How many SEVERE LOG messages to we write in relation to Exceptions?
Tip: Failed Transactions
Are more TRANSACTIONS FAILING (HTTP 5xx, 4xx, …)
under heavier load?
Tip: Database Activity
Do we see increased in AVG # of SQL Executions over Time?
Do TOTAL # of SQL Executions increase with load? Shouldn’t
it flatten due to CACHES?
Tip: Database History Dashboard
How many SQL Statements are PREPARED?
What’s the overall Execution Time of different SQL Types (SELECT, INSERT, DELETE, …)
Tip: DB Connection Pool UtilizationDo we have enough DB
CONNECTIONS per pool?
For more Key Metricshttp://blog.dynatrace.com
http://blog.ruxit.com
We want to get from here …
To here!
Use these application metrics as additional Quality Gates
71
What you currently measure
What you should measure
Quality Metrics in your CI # Test Failures
Overall Duration
Execution Time per test# calls to API# executed SQL statements# Web Service Calls# JMS Messages# Objects Allocated# Exceptions# Log Messages# HTTP 4xx/5xxRequest/Response SizePage Load/Rendering Time…
Connecting your Tests with Quality
12 0 120ms3 1 68ms
Build 20 testPurchase OKtestSearch OK
Build 17 testPurchase OKtestSearch OK
Build 18 testPurchase FAILEDtestSearch OK
Build 19 testPurchase OKtestSearch OK
Build # Test Case Status # SQL # Excep CPU
12 0 120ms3 1 68ms
12 5 60ms3 1 68ms
75 0 230ms3 1 68ms
Test Framework Results Architectural Data
We identified a regresesion
Problem solved
Exceptions probably reason for failed testsProblem fixed but now we have an
architectural regressionProblem fixed but now we have an
architectural regressionNow we have the functional and architectural confidence
Let’s look behind the scenes
#1: Analyzing each Test
#2: Metrics for each Test
#3: Detecting Regression based on Measure
Quality-Metrics based Build Status
Pull data into Jenkins, Bamboo ...
Making Quality a first-class citizen„Too hard“
„we‘ll get round to this later“
„not cool enough“
Questions and/or DemoSlides: slideshare.net/grabnerandiGet Tools: bit.ly/dttrialYouTube Tutorials: bit.ly/dttutorialsContact Me: [email protected] Me: @grabnerandiRead More: blog.dynatrace.com
Andreas GrabnerDynatrace Developer Advocate@grabnerandihttp://blog.dynatrace.com