Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Institute of Software Technology
Reliable Software Systems
ApplicationPerformanceManagement
State of the Art andChallenges for the Future
André van HoornMario MannDušan OkanovićChristoph Heger
Tutorial @ 8th ACM/SPEC International Conference on Performance Engineering,
April 22, 2017, L‘Aquila, Italy
André van Hoorn
University of Stuttgart
Inst. of Softw. Techn.
Reliable SW Systems
Mario Mann
NovaTec Consulting
Application
Performance
Management
Dušan Okanović
University of Stuttgart
Inst. of Softw. Techn.
Reliable SW Systems
Christoph Heger
NovaTec Consulting
Application
Performance
Management
Who Are These Guys?
Performance Problems are Omnipresent
An unexpected
error occured
Temporarily
not available
… more capacity
is on the way
Try again later
Please visit us
again later
We are
experiencing
heavy demand
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Influence of Poor Performance on the Success of Businesses
„
“Application performance management (APM), as a core IT operations discipline, aims to achieve an adequate level of performance during operations. To achieve this,
APM comprises methods, techniques, and tools for
• continuously monitoring the state of an application system and its usage, as well as for
• detecting, diagnosing, and resolving performance-related problems using the monitored data.”
C. Heger, A. van Hoorn, D. Okanović, M. Mann.:
Application performance management: State of the art and challenges for the future.
In: Proc. 8th ACM/SPEC ICPE, ACM (2017)
Agenda
Introductionto APM
APM Tools
APM Advanced
Discussion
45 min
45+15 min
60 min
15 min
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
André van Hoorn und Stefan Siegl.
Application Performance Management (APM). Continuous Monitoring of Application Performance.
(Poster in ObjektSpektrum magazine; in German)
Order for free: http://www.sigs-datacom.de/wissen/fachposter.html
• Agents collect data from all system levels
• On application level the agents are often technology-dependent
Collecting Data from All System Levels1.
Where? What? How?
Trace-based Metrics (Selection)
What?
Metric
Response Time
CPU Time
Method Name
Return Type
Logging Level
SQL Statement
Error Message
…
Okanović, D., van Hoorn, A., Heger, C., Wert, A., Siegl, S.:
Towards performance tooling interoperability: An open format for representing execution traces.
In: Proc. EPEW ’16. LNCS, Springer (2016)
Monitoring (Measurement-based Performance Evaluation)
Database
Server
Client
Browser
Request
Response
Query
Result
Application
Server
Data
Recorder
0.2s 0.3s 0.1s 3.0s
0.1s0.4s
0.5s
0.3s
3.0s
4.9s
VisualizationData Analysis
(fictio
naltim
ings)
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
• Event-driven measurement
• Event: a change in system state
• e.g., incoming requests, access to HDD, operation execution, throwing exceptions
• Calculate performance metrics whenever an event occurs
• Simplest metric: counter
• Tracing (also event-driven)
• Recording a certain part of the system state when an event occurs
• Example: Code adjustments
• printf – easy, but with questionable maintainability
• Bytecode engineering
• Aspect-oriented Programming (AOP)
• Example: Sampling
• Measurement (state or counter) is performed in specified time intervals
• Through overhead and increased resource consumption, measurements
influence the system!
Measurement Approaches and Techniques (Examples)
How?
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
• Data is collected from the system…
• represented as time series…
Reconstructing Information from Data2.
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
• Data is collected from the system…
• represented as time series…
• … and as detailed execution traces, and
used to support problem analysis
Reconstructing Information from Data2.
Traces in an APM Tool
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
• High quantity of information has to be pre-processed
• It has proven useful to use different views to show the data
• Views are navigable and can be categorized by both scope and detail level
Visualization Through Navigable Views3.
Example: Application Topology Discovery and Visualization
© AppDynamics
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Manual or automated conclusions and actions can be derived from the
information, e.g.,
• Problem detection and alerting
• E.g., increased response times and resource utilization
• Detection, for instance, based on thresholds and baselines
• Problem diagnosis and root cause isolation
• E.g., N+1 problem, too many remote calls, poor DB queries
• Detection based on monitoring information
• System refactoring and adaptation
• E.g., auto-scaling in cloud-based architectures
Interpreting and Using the Information4.
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
• http://kieker-monitoring.net
1/20/2016 22
Dynamic Software Analysis and ApplicationPerformance Management
Application Performance Management: State of the Art and Challenges for the Future
Institute of Software Technology
Reliable Software Systems
ApplicationPerformanceManagement
Part 2 – APM Tools
André van HoornMario MannDušan OkanovićChristoph Heger
Tutorial @ 8th ACM/SPEC International Conference on Performance Engineering,
April 22, 2017, L‘Aquila, Italy
• Commercial APM tools
• Magic quadrant
• Timeline of APM tools
• Goals of APM tools
• Architecture
• EUM, Server, Database Monitoring
• How APM tools can help you to identify problems
• Open Source APM tools
Contents
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Commercial APM Tools
Magic Quadrant
Sourc
e: G
artn
er (2
016)
4/22/2017
Timeline of Tools
Wily – CA APM
DynaTrace
AppDynamics
Cisco
inspectIT
1998
2005
2005 2008
2017
Kieker
Commercial tools
Open source tools
Application Performance Management: State of the Art and Challenges for the Future
2006
Goals of APM Tools
End-User
PerspectiveEnd to End
Monitoring
Smart
Analytics
Lifecycle by
Design
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
@Dynatrace
Architecture – Dynatrace
Java .NET
Mainframe,
Native, … Database
Dynatrace Sessions
Performance WarehouseDynatrace Frontend ServerDynatrace Backend Server
Agent/PurePath
Collector GroupMonitoring Collector
Web Server /Node.js
/ NGINX / PHP
Dynatrace
Client
Web UI Dashboards
@Dynatrace
Sensors – How Do They Work?
• Every sensor (custom and OOTB) instruments Java/.NET methods
• Code gets added to each method that matches the sensor rule to
• measure execution time
• capture method arguments and return values
• count method invocations
• capture exceptions
Uninstrumented Application Instrumented Application
Execution Time
of Diagnosis Code
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
@Dynatrace
4/22/2017
Architecture – AppDynamics
Source:
https://docs.appdynamics.com/download/attachments/34271888/dbmondplussingleapp_appd_architecture.png?version=4
&modificationDate=1487053783658&api=v2
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
10 ms 10 ms 10 ms 10 ms
Sensors – How Do They Work?
@Dynatrace
Applications and Infrastructure Overview
Analysis dashboards
UEM Visits grouped byChannel and Application
Show the total Tier
time with Process,
Host, and
Transaction
Deeper analysis
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Application Overview – Dynatrace
Application Overview – AppDynamics
Tiers with technology
Databases
Remote Services
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
@AppDynamics
Real User Monitoring – AppDynamics
Infrastructure
Visibility
End User Monitoring Application Performance Management
NoSQL
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
@AppDynamics
Real User Monitoring – Dynatrace
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Source: https://qph.ec.quoracdn.net/main-qimg-cd5a4b6e8cc29465d2cea98e055f606e.webp
End User Monitoring – AppDynamics
Geolocation
CrashesRequests
Time
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Server Monitoring – AppDynamics
Availability CPU Usage
Memory Usage
Network
VolumesProcesses
Database Monitoring – AppDynamics
Alerting – AppDynamics
@A
ppD
ynam
ics
Dashboarding
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
@Dynatrace
How APM tools can help to identify problems?
BusinessTransaction
Potential Issues
User Experience
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
@AppDynamics
Top findings to optimize
performance
Filter
How APM tools can help to identify problems?
@Dynatrace
• Share configurations between stages
• Identify business transactions and map this to functional level
Challenges
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Open Source APM tools
Licensing costs
Vendor Lock-In
• Typical model: x $ per agent scaling?
Weaknesses of Commercial APM Tools
Microservices Internet of Things
[Abbildung: http://blog.wso2.com] [Abbildung: Christian Hinkelmann, http://nahverkehrhamburg.de]
Flexibility Interoperability Sustainability
Mobile Revolution
[Abbildung: https://uxmag.com]
Adaption to own
needs
Bug fixing
Sources
Tools for analyse
OtherAPM tools
Stopp development
Change of strategy of
APM vendor
4/22/2017Application Performance Management: State of the Art and Challenges for the Future 4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Open Source APM tools
Load Testing
Performance
Modeling
Web Performance
Analysis
System &
Resources Monitoring
Real User Monitoring
JRatJMemProf
Low-Level
Performance Profiling
Monitoring &
Application Deep Dive
What is inspectIT?
pen Source
8 |2
Platf rm
Pareto-principle
• Focus on main functionality
• Experience of APM projects
Platform-principle
• Integration with tools
• Extensibility
Open-Source APM solution
• Development since 2005
• Open Source since 2015
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Gathering and Visualizing Timeseries Data
Time Series Databases
Graphing ToolsData Collectors
Custom Code…
Persist data Query & Visualize
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Dashboards
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Dashboards
The Flaw of Averages
The Flaw of Averages: Why We Underestimate Risk in the Face of
Uncertainty
Sam L. Savage, with illustrations by Jeff Danziger –
http://flawofaverages.com
Gil Tene – https://www.youtube.com/watch?v=lJ8ydIuPFeU
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Dashboards
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Alerting
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Anomaly Detection & Alerting
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
End User Experience?
[Image | http://cdn2.hubspot.net/hub/2224117/file-4141070752-jpg/blog-files/three-
e1409846588737.jpg]
Satisfied Tolerating Frustrated
Performance User Actions Users
Performance issues
Functional Errors
Network (Local ISPs, Mobile network carriers)
Third party content providers
First user action
Last user action
Did the visit convert?
Did the visit bounce?
Device
Resolution
Browser Versions
Geolocation
What is Web Performance Analysis about?
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Measuring End-User Experience
Real User
Testing
Synthetic
Monitoring
Continuous external testing
Known testing nodes
Different Locations (ISP, Networks)
No baseline traffic required
Competitor Benchmark
Availability testing (incl. 3rd Party)
e.g., every 30 mins from different nodes
Real User
Monitoring
Real End Users
Real Interaction
Real traffic
External testing with all major
browsers,
operating systems,
mobile devices
and real world data
QoS Validation
QoS ExpectationQoS Optimization
Anomaly
Testing and Monitoring Web Performance
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
[http://www.webpagetest.org/performance_optimization.php?test=160623_DF_MV7&run=1
&cached=0]
webpagetest.org
Server Monitoring
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Source: https://www.icinga.com/wp-content/uploads/2016/08/Screen-Shot-2016-08-29-at-16.11.23.png
• No licensing costs
• Growing community
• Pick the tools which work for your needs and combine them
Benefits of Open Source Tools
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
• From perspectives of industry and research…
Challenges
Institute of Software Technology
Reliable Software Systems
ApplicationPerformanceManagement
Part 3 – APM Advanced
André van HoornMario MannDušan OkanovićChristoph Heger
Tutorial @ 8th ACM/SPEC International Conference on Performance Engineering,
April 22, 2017, L‘Aquila, Italy
Problem Diagnosis
Component
Deep-Dive
Interpretation of
measurements
Detect problem
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
Problem: Too Many Traces to Analyze
Lots of data to
analyze
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
http://diagnoseit.github.io/
diagnoseIT Overview
Node 1
JVM
App 1
App 2
Node n
SUT
Traces
Instrumentation
Refinement
Request
Result
Monitoring Tool
Dynamic
Instrumentation
API Labeling
Location
Identification
Instrumentation
Quality Manager
Result Query
4/22/2017
C. Heger, A. van Hoorn, D. Okanović, S. Siegl, and A. Wert.
Expert-guided automatic diagnosis of performance problems in enterprise applications.
In Proc. EDCC ’16. IEEE, 2016.
Agenda
• APM interoperability
• Anti-patterns and trace based detection
• Mobile
• Overhead control
• Results aggregation
• Business integration
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
convert
Application Performance Management: State of the Art and Challenges for the Future
APM Interoperability
.map
.dat X .itds
4/22/2017
Application Performance Management: State of the Art and Challenges for the Future
Solution
APM Interoperability
OPEN.xtrace
.dat .itds .txt
convert
convert convert
4/22/2017
Application Performance Management: State of the Art and Challenges for the Future
Data Export in APM Tools
Open
Source
Export
possible
✔ ✔
✔ ✔
✘ ✔
✘ ✔
Open
Source
Export
possible
✘ ✔
✘ ✘
✘ ✘
✘ ✘
4/22/2017
Metric OPEN.xtrace Kieker inspectIT AppDynamics Dynatrace New Relic
Response
Time ✔ ✔ ✔ ✔ ✔ ✔
CPU Time ✔ ✔ ✔ ✔
Method
Name ✔ ✔ ✔ ✔ ✔ ✔
Return Type ✔ ✔ ✔
Logging
Level ✔ ✔ ✔
SQL
Statement ✔ ✔ ✔ ✔ ✔
Error
Message ✔ ✔ ✔ ✔ ✔
Comparison table (excerpt)
4/22/2017
D. Okanovic, A. van Hoorn, C. Heger, A. Wert, and S. Siegl.
Towards performance tooling interoperability: An open format for representing execution traces.
In Proc. EPEW ’16, pages 94–108, 2016.
• Execution trace is a data structure that captures control flow of method
execution for a request served by the system (Ammons et al., 1997)
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
OPEN.xtrace Traces
Trace
SubTrace
SubTrace
Callable
Callable
Callable
Location
…
Callables
• Method executions
• DB calls
• Remote calls
• Logging
• Errors
• What is available:
• Trace model based on the available data in these tools
• Adapters to convert the data between OPEN.xtrace and tools
• Serialization
• Planned: OPEN.timeseries
• More on OPEN.xtrace
• D. Okanovic, A. van Hoorn, C. Heger, A. Wert, and S. Siegl. Towards performance
tooling interoperability: An open format for representing execution traces. In Proc.
EPEW ’16, pages 94–108, 2016.
• https://research.spec.org/apm-interoperability/
• https://github.com/spec-rgdevops
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
OPEN.xtrace
• Track what is going on in a complex, heterogeneous systems
• Capture important events during processing of client request
• Avoid vendor lock-in
• Wide industry support (http://opentracing.io)
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
OpenTracing
diagnoseIT Overview
Node 1
JVM
App 1
App 2
Node n
SUT
Traces
Instrumentation
Refinement
Request
Result
Monitoring Tool
Dynamic
Instrumentation
API Labeling
Location
Identification
Instrumentation
Quality Manager
Result Query
C. Heger, A. van Hoorn, D. Okanović, S. Siegl, and A. Wert.
Expert-guided automatic diagnosis of performance problems in enterprise applications.
In Proc. EDCC ’16. IEEE, 2016.
Anti-patterns are conceptually
similar to patterns, however
describe recurrent solutions to
design problems which, however,
may have a negative effect on
different software quality
attributes.
Koenig, A. (1998). “Patterns and Antipatterns”. In: The Patterns
Handbooks. Ed. by L. Rising. New York, NY, USA: Cambridge
University Press
• Performance Antipatterns
(Performance) Antipatterns
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
• Problem
„[…] a point in the execution where one, or only a few, processes
may continue to execute concurrently. All other processes must wait. […]”
• Cause
• „It frequently occurs in applications that access a database. Here, a lock ensures
that only one process may update the associated portion of the database at a
time.
• It may also occur when a set of processes make a synchronous call to another
process that is not multi-threaded; all of the processes making synchronous
calls must take turns “crossing the bridge.”“
• Solution
“Shared Resources principle[:] responsiveness improves when we minimize the
scheduling time plus the holding time. Holding time is reduced by reducing the
service time for the One-Lane Bridge, and by rerouting the work.”
[Smith and Williams, 2001]
Example 1: One-Lane Bridge
• Synchronization in source-code:
• Thread- and connection- pools in application servers
Example 1: One-Lane Bridge – Causes
public synchronized
void buyItems (Collection<Item> items) {
…
}
<Connector ... maxThreads="300" acceptCount="150"
... />
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Example 1: One-Lane Bridge – Symptoms
0
10
20
30
40
1 10 20 30 40 50
0
20
40
60
80
100
1 10 20 30 40 50
Number of users
Re
sp
on
se
tim
e
[ms]
Number of users
CP
U-u
tili
za
tio
n
[%]
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Example 2: N+1-Problem
Wert, A. (2015). Performance Problem
Diagnostics by Systematic
Experimentation, PhD thesis.
Parsons, T. (2007): “Automatic Detection of
Performance Design and Deployment
Antipatterns in Component Based
Enterprise Systems”. PhD thesis.
Performance-Antipatterns– Classifications
Application Performance Management: State of the Art and Challenges for the Future
Analysis automation
4/22/2017
List of Currently Detectable Antipatterns
• N+1 Query Problem
• Circuitous Treasure Hunt
• The Stifle
• The Ramp
• Traffic Jam
• More is Less
• Application Hiccups
• Garbage Collection Hiccups
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
Detecting Antipatterns in Mobile Environment
Application Performance Management: State of the Art and Challenges for the Future
Java
Agent
iOS
Agent
Buffer
OpenTracing
spans
GUI
OpenTracing
spans
OPEN.xtrace
4/22/2017
Antipatterns in Mobile Environment
• Anti-patterns
• Too many remote calls
• Too many remote calls to the same URL
• Too many remote calls to the same server
• Hard-disk utilization too high
• Hard-disk usage increases too fast (ramp)
• RAM utilization too high
• RAM usage increases too fast (ramp)
• Response is available after timeout (timeout occurred too early)
• Glitches
• No connection available during a request
• Latency between back end and mobile device is too high
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
diagnoseIT Overview
Node 1
JVM
App 1
App 2
Node n
SUT
Traces
Instrumentation
Refinement
Request
Result
Monitoring Tool
Dynamic
Instrumentation
API Labeling
Location
Identification
Instrumentation
Quality Manager
Result Query
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
Overhead Control
• diagnoseIT can request more data from the monitoring tool
• Tool has to support changing of monitoring at runtime
• Instrumentation Quality Monitor can deny this request
• How to assess?
• Modeling?
• Machine learning?
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
diagnoseIT Overview
Node 1
JVM
App 1
App 2
Node n
SUT
Traces
Instrumentation
Refinement
Request
Result
Monitoring Tool
Dynamic
Instrumentation
API Labeling
Location
Identification
Instrumentation
Quality Manager
Result Query
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
diagnoseIT – Diagnosis Results
Problems
Overview
Natural language
description
Problem instance
details
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
Aggregating the Results
PIPI
PIPI
PIPI
PIPI
PIPI
PIPI PI
PIPI
PI
PI
PI PIPI
PI PI
PI PI
PIPI
PI
PIPI
PIPI
PI
PIPI
PI
PIPI
PIPI
PI
PIPI
PI
PIPI
PIPI
PI
T. Angerstein, D. Okanović, C. Heger, A. van Hoorn, A. Kovačević, T. Kluge:
Many flies in one swat: Automated categorization of performance problem diagnosis results.
In: Proc. ACM/SPEC ICPE ’17
Categorization Process
Diagnosis results Categorization
Optimization
Categorization result
T. Angerstein, D. Okanović, C. Heger, A. van Hoorn, A. Kovačević, T. Kluge:
Many flies in one swat: Automated categorization of performance problem diagnosis results.
In: Proc. ACM/SPEC ICPE ’17
Application Performance Management: State of the Art and Challenges for the Future
Aggregation in diagnoseIT
4/22/2017
Missing Business Context
CPU utilization over time
How critical is this?
Asking the right questions:
How are the users affected?
Which business transactions are affected?
Are any time-critical batch jobs affected?
11pm 0am 1am 2am 3am 4am 5am
100 %
50 %
0%
4/22/2017Application Performance Management: State of the Art and Challenges for the Future
Defining Business Context
Application Performance Management: State of the Art and Challenges for the Future 4/22/2017
Conclusion
Node 1
JVM
App
1
App
2
Node n
S
U
T
Traces
Instrumentation
Refinement
Request
Result
Monitoring Tool
Dynamic
Instrumentation
API Labeling
Location
Identification
Instrumentation
Quality Manager
Result Query