43
The Northwestern Mutual Life Insurance Company – Milwaukee, WI Application Monitoring Jeremy Kalsow

Application Monitoring

  • Upload
    chen

  • View
    49

  • Download
    2

Embed Size (px)

DESCRIPTION

Application Monitoring. Jeremy Kalsow. Why Application Monitoring. Majority of all corporations Northwestern Mutual Total 1,000+ servers Team is 6 people Team uses 16 servers Average 50 applications per server Need a way to know status fast. What is it?. - PowerPoint PPT Presentation

Citation preview

Page 1: Application Monitoring

The Northwestern Mutual Life Insurance Company – Milwaukee, WI

Application Monitoring

Jeremy Kalsow

Page 2: Application Monitoring

Why Application Monitoring

• Majority of all corporations

• Northwestern Mutual

• Total 1,000+ servers

• Team is 6 people

• Team uses 16 servers

• Average 50 applications per server

• Need a way to know status fast

Page 3: Application Monitoring

What is it?

• The ability to monitor performance and availability

• Gather metrics

• Show trends

• Pretty pictures for management

Page 4: Application Monitoring

Why?

• Trends predict future problems

• Solve application issues faster

• Uptime relates directly to profit for many companies

• View all applications, servers, databases and other items being monitored with a single dashboard.

Page 5: Application Monitoring

Types of Monitoring

• Fault

• Performance

• Configuration

• Security

• Accounting

Page 6: Application Monitoring

Fault

• Detects major errors

• Easy to implement

• Examples– Network loss– Database Connectivity

• Very Important

Page 7: Application Monitoring

Fault

Type of Monitoring

What to Monitor

When to monitor

Hardware CPU utilization CPU load Load > 99% for x minutes

Memory utilization Memory load Load > 99% for x minutes

Storage System Available space System out of Space

Applications Application available

Application working

Working or Error

Application Logs Error Log monitoring

If error occurred

Databases Database online Database is online Database is up/down

Network Latency Latency Latency > acceptable range

Page 8: Application Monitoring

Performance

• Slow Performance

• Service Level Agreements

• Metrics

• Old and New Metrics

• Visual Display

Page 9: Application Monitoring

Performance

http://www.ibm.com/developerworks/websphere/library/techarticles/0304_polozoff/polozoff.html

Page 10: Application Monitoring

Configuration

• Configuration variables

• Connectivity

• Speed

• Performance

• Proactive

• Servers and Applications

Page 11: Application Monitoring

Configuration

• Why would the configuration change?

• Hardware

• Storage

• Service packs

• Hot fixes

• Windows Updates

Page 12: Application Monitoring

Security

• Attempts to access the system

• Open ports

• Inventories

• Firewall

• Packets

• System events

• Blocked Exploits

Page 13: Application Monitoring

Accounting

• Monitors Usage

• Generally used for fees

• Profit/Loss

• Example– Electric Company– Northwestern Mutual

Page 14: Application Monitoring

Types of Monitoring Recap

• Fault

• Performance

• Configuration

• Security

• Accounting

Page 15: Application Monitoring

Types of Monitoring Recap

• Historical data

• Baseline test

• Current test

• Performance disagreements

Page 16: Application Monitoring

Types of Monitoring Recap

• Allows for trends to be seen

• Modifications can be made

• Trends over multiple releases

Page 17: Application Monitoring

Types of Monitoring Recap

• Monitoring is important

• Not enough time is given

• Implemented After discovery of an issue

• Monitoring only in areas of known problems

• Adding monitoring requires time and money

Page 18: Application Monitoring

Challenges of application monitoring• Various types of systems

• Shared

• Clustered

• Virtualized

• Production logging

Page 19: Application Monitoring

Shared Systems

• 1 server / Multiple applications

• System resources are shared

• Tracking individual usage is difficult

• Many applications may be impacted

• Server without access (production)

Page 20: Application Monitoring

Clustered Systems

• Applications on more than one server

• Avoid single point of failure

• May be hard to target the issue

Page 21: Application Monitoring

Production Logging

• Generally Limited

• Most errors repeated in test

• Application downtime

• Use of company resources

Page 22: Application Monitoring

Implement Application Monitoring• Plan Early

• Monitor Proactively

• Create a Recovery Plan

• Create and use SLAs

Page 23: Application Monitoring

Plan Early

• Planning stage

• Add monitoring during development

• Late additions cover known issues

Page 24: Application Monitoring

Monitor Proactively

• Harder to implement

• Issues are dealt with before end user knows

Page 25: Application Monitoring

Monitor Proactively

• Tools based approach

• Easy and relatively fast setup

• No code

• Multiple applications

Page 26: Application Monitoring

Monitor Proactively

• Logging is directly in the code

• Less efficient

• More specific

• Developers have less time

Page 27: Application Monitoring

Create a Recovery Plan

• Fast resolution

• Knowledge management

Page 28: Application Monitoring

Recovery Plan Template

Page 29: Application Monitoring

Service Level Agreements

• What percentage of time that the services will be up (uptime)

• How many people can use the application at once without performance issues

• Performance metrics and benchmarks to be used with performance monitoring alerts

• The rules for notification announcements• What statistics will be monitored and

when and where they will be available• Acceptable response time

Page 30: Application Monitoring

Service Level Agreements

Page 31: Application Monitoring

Using the Statistics

• Visual display

• Alerts

• Tickets

Page 32: Application Monitoring

Visual (Dashboard)

• Easily view statistics

• Comparison results

• Trend comparison

• Cross Platform

• Auto-generated management reports

Page 33: Application Monitoring

Dashboard

Page 34: Application Monitoring

Alerts and Tickets

• Auto-generated alerts

• Tickets for queue system

• Vital information in each

Page 35: Application Monitoring

Alerts and Tickets

• Most common: Email

• Text, popup, printout, recording and more

• Tickets: auto-generated

• Knowledge databases

• Common fixes and resolutions

Page 36: Application Monitoring

Application Monitoring

• Maximize application uptime

• Higher end user satisfaction

• Higher Profit

Page 37: Application Monitoring

References

• Polozoff, A. (2003, April 9). Proactive Application Monitoring. IBM - United States. Retrieved October 20, 2011, from http://www.ibm.com/developerworks/websphere/library/techarticles/0304_polozoff/polozoff.html 

• Choice. (2009, December 20). Application Monitoring. Adminschoice - Unix Made Easy. Retrieved October 31, 2011, from http://adminschoice.com/application-monitoring

• Application Monitoring Software - uptime software. (n.d.). Server Monitoring Software - IT Systems Management, Capacity Planning, Application and Server Monitoring Tool by uptime software. Retrieved October 31, 2011, from http://www.uptimesoftware.com/application-monitoring.php 

• Marko, K. (2005, December 30). Proactive Application Monitoring. Processor.com:

• Data Center IT Equipment at Processor, Routers, Storage, Rackmount Servers, Computer Room Cabling and Flooring. Retrieved October 29, 2011, from http://www.processor.com/editorial/article.asp?article=articles%2Fp2752%2F43p52%2F43p52.asp 

• "IT Service Level Agreement Templates | ContinuityPlanTemplates." ContinuityPlanTemplates |Free Business Continuity Plan (BCP) Templates. ContinuityPlan Templates, n.d. Web.30 Oct. 2011. http://www.continuityplantemplates.com/it-service-level-agreement-templates

Page 38: Application Monitoring

XML

Page 39: Application Monitoring
Page 40: Application Monitoring
Page 41: Application Monitoring
Page 42: Application Monitoring
Page 43: Application Monitoring

Upcoming events with Dashboard•Ability to display visualized graphs and other pertinent information

•Ability to click a failed component and have the system auto generate a ticket

•Ability to Alert others of the issue found

•Performance monitoring as well as fault