Upload
sl-corporation
View
536
Download
0
Embed Size (px)
DESCRIPTION
Rodney Morrison, SL’s Vice President of Products delivered a talk entitled, “How to Achieve Success With Application Performance Monitoring Initiatives” at TradeTech Architecture 2011.
Citation preview
How to Achieve Success With Application Performance Monitoring Initiatives
Rodney Morrison
VP, Products
SL Corporation
Agenda
Definition and Objectives of APM Analyst Breakdown of the APM Solution Space Challenges to Success Coming Advancements in APM An Investment Bank Use Case
What is APM?
APM refers to the discipline within systems management that focuses on monitoring and managing the performance and service availability of software applications.
The two main sources of information for proper application management are resources and user experience.
Factors Driving APM Initiatives
Complexity of applications increasing dramatically Cost estimations of application downtimes are
more frequently calculated and on the rise High severity incidents on the rise MTTR is often 30 minutes to 3 hours in even the
most critical application areas
The Perfect Storm
Best practices for change control have created operational silos
Events geared more for operational management are passed to application support teams from all silos
Each technology stack requires expertise and training in administrative and monitoring tools
Result High level of noisy, uncorrelated events make
proactive application management impossible Lack of centralized access to data and
performance history leads to lengthy triage for resolution
APM Breakdown – The Analyst’s Vision
End-user experience monitoring Runtime architecture discovery Transaction monitoring Component deep-dive monitoring Performance Analytics
Usage of Management Tools Evolves With Complexity
The Missing Gap
Centralized console for all performance data
Summary views relevant to applications and services
Application-centric event filtering and correlation Historical views of performance metrics for
baselining and event analysis
Advanced APM
Analytics engines for discovering performance patterns
Automation – Command and Control
Reduce incidents and time to repair with very specific application visibility requirements Emphasis on root cause
Analysis Tools detailing application performance for preventative care and capacity planning
To support line-of-business visibility into application availability, performance, the risk associated with that performance, and SLA monitoring
To standardize on a common delivery platform for application support to reduce operation costs
IT Challenges for APM Initiatives
Best Practices for APM Delivery and Maintenance
Specifically tasked with APM
Skills: Basic knowledge of
applications, underlying
infrastructure, software
components and architectures,
and can liaise with business and
development
Create a team
Define
Measure
Analyze
Improve
Control
Best Practices for APM Delivery and Maintenance
Create a team
Define
Measure
Analyze
Improve
Control
Choose initial critical
applications
Gather requirements
Gather all relevant and
accessible performance metrics
Choose tool sets
Best Practices for APM Delivery and Maintenance
Create a team
Define
Measure
Analyze
Improve
Control
Aggregate metrics and create
baseline
Determine initial rule sets and
thresholds
Configure summary views and
drill down per defined role
Best Practices for APM Delivery and Maintenance
Create a team
Define
Measure
Analyze
Improve
Control
Iterate with key stakeholders
Verify that the information is
correct, and pertinent to role
activities
Practice scenarios to verify that
discovery and drill down to
analysis or root cause is optimal
per role
Best Practices for APM Delivery and Maintenance
Create a team
Define
Measure
Analyze
Improve
Control
Track MTTR improvements
Determine repair activities
performed outside of the APM
solution, can they be
automated?
Are there other metrics or
correlations that need to be
included to capture outlier
events?
Best Practices for APM Delivery and Maintenance
Create a team
Define
Measure
Analyze
Improve
Control
Tweak thresholds to optimize
alerts
Add any new important metrics
or correlations
Add automated responses
Add new important applications
Case Study: One of the World’s Largest Financial Institutions
Real World Best Practices Designate main support team and determine roll-out plan Choose application group or initial critical applications for
implementation Determine user roles exec/application support/tech support Determine application architectures Determine available monitoring systems and sources of
performance data Work with technology support teams and determine agents/no
agent technology Determine dashboard templates to create consistent company-
wide standards Move to next app group or let app support teams build out
Case Study: One of the World’s Largest Financial Institutions
Case Study: One of the World’s Largest Financial Institutions
Alert Summary Composite Objects of Underlying Data
Alerting in Context of Process Flow
Technology Summaries - Web Server Farms
Detail JMS Server
WebLogic Server Metrics
Data Grid Utilization
Detail – JMS Destinations
JVM Metrics
ROI
Total benefit – Reduction in outages = reduction in…
• Idle labor • Loss of business • Penalties for unmet SLAs/JIT agreements • Loss of discounts for early payments • Penalties for late payments • Cost of idle equipment/telecommunications • Cost of facilities • Cost of recovery • Cost of damage to perceived customer service • Cost of damage to brand perception
– Improved productivity – Reduced complexity – faster training of new employees – Capacity planning
$35.7MM in Year 1
Conclusion
The enterprise IT environment is becoming exponentially more complex
Multiple disparate unconnected monitoring solutions are only adding to the complex- ity and management costs.
End to End APM solutions are not only viable but have already been proven to deliver great benefits
Questions?