21
Performance is good, Understanding performance is better Peter HJ van Eijk Chairman NLCMG A non-profit community of professionals Feb 11, 2012

CMG 101 - Understanding performance

Embed Size (px)

DESCRIPTION

Web performance is good, understanding performance is better. What you need to understand in order to be able to have IT systems that perform well at a reasonable cost.

Citation preview

Page 1: CMG 101 - Understanding performance

Performance is good, Understanding performance is better

Peter HJ van EijkChairman NLCMG

A non-profit community of professionals

Feb 11, 2012

Page 2: CMG 101 - Understanding performance

CMG 101Computer Cloud Measurement Group

Understand:• Definitions of availability and response time• Psychological and business effect of delay/response time. User interfaces, cost of

downtime • Transactions, and their structure. • Waterfall diagrams for transactions and web page downloads• Performance measures (seconds, bytes, bits per seconds, IOPS, etc).• Reporting measures / metrics. • Visualization of quantitative data, how to• Resources (CPU, memory, disk, network, software) • Elementary queuing theory• Phases in development and how to incorporate performance and capacity (analysis,

design, etc.), performance engineering• Typical free and commercial tools, or at least their functionality

– monitoring, reporting, alerting, analysis, modelling

Page 3: CMG 101 - Understanding performance

Availability and Response Time

• Availability: Ability of a Configuration Item or IT Service to perform its agreed Function when required. […] Availability is usually calculated as a percentage.

• Response Time: A measure of the time taken to complete an Operation or Transaction

Page 4: CMG 101 - Understanding performance

Graphs of availability and response time

Page 5: CMG 101 - Understanding performance

Psychological and business cost of downtime

€ + $ + £

Page 6: CMG 101 - Understanding performance

Sudden surges can kill you1-

jan-

0819

-jan-

086-

feb-

0824

-feb-

0813

-Mrt

-200

831

-Mrt

-200

818

-apr

-08

6-m

ei-0

824

-mei

-08

11-ju

n-08

29-ju

n-08

17-ju

l-08

4-au

g-08

22-a

ug-0

89-

sep-

0827

-sep

-08

15-o

kt-0

82-

nov-

0820

-nov

-08

8-de

c-08

26-d

ec-0

813

-jan-

0931

-jan-

0918

-feb-

0908

-Mrt

-200

926

-Mrt

-200

913

-apr

-09

1-m

ei-0

919

-mei

-09

0

100,000

200,000

300,000

400,000

500,000

600,000

700,000 Pageviews

Pageviews

Page

view

s

Bron: SiteStat

IceSave failure

Page 7: CMG 101 - Understanding performance

KNMI.nlPageviews per hour

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

30-dec

31-dec

Ordinary day

Weather alarm day

Page 8: CMG 101 - Understanding performance

Transactions and their structure waterfall diagrams

Query

Ack

ServerClient

Ack

Reply

Netwerk latency

Serverturnaround time

Yslow detail

A single user level transaction decomposes into multiple transactions on components

Page 9: CMG 101 - Understanding performance

© Digital Infrastructures

9

Transactions: from visits to bandwidth

Visits

Pageviews

GET requests

Bandwidth

7,42 pageviews per bezoek (volgens SiteStat), echter lager tijdens crisis

Circa 6800 bytes per request gemiddeld

Sitestat meting

Sitestat meting, ServerlogsPageopbouw via FireBug

HTTP Serverlogs

HTTP Serverlogs

10,6 (=79/7,42) GET/pageview effectief32 GET voor homepage (volgens browser)

79 GET per bezoek volgens logfile en Sitestat

1,7 visits/sec

6.380 /uur

13 pageviews/sec

47.338 /uur

140 requests/sec

0,95 Mbyte/sec

7,6 Megabit/sec

Page 10: CMG 101 - Understanding performance

How to diagnose a problem, where to look? Resource = capacity

WAN LinkWAN Link

SANSAN

End to endEnd to end

Router Switch (CPE)

Router Switch (CPE)

NASNAS

(Test) client(Test) client

Firewall, ProxyFirewall, Proxy

LAN switchesLAN switches

Load BalancerLoad Balancer

HTTP front endHTTP front end

MySQL DBMySQL DB

Users

Application

Network

Network lines

Server

Example breakdowns

Page 11: CMG 101 - Understanding performance

Na het uitvragen van de medewerkersnummers (er zijn 373 Janssen’s), worden dienstverbanddetails per stuk uitgevraagd (in totaal 612). Dit leidt op het GBO LAN tot 30 sec doorlooptijd (gemeten).

Op basis van 50 mSec roundtrip op het WAN

Resource contribution to response time, modeling different resource allocations

Modelling different network bandwidth’s effect on response time

0 100 200 300 400 500

GBO

ICTRO 2Mb

256K

64K

Server tijd (sec) Client tijd (sec)

Netwerk tijd delay (sec) Netwerk tijd bandbreedte (sec)

Excessive client/server chatter leads to a user interaction time of more than 7 minutes!

How much faster will this be with?•Very fast network/•Very fast client / •Very fast server

Page 12: CMG 101 - Understanding performance

Queuing theoryD

ela

y f

acto

r

0

2

4

6

8

10

12

10% 20% 30% 40% 50% 60% 70% 80% 90%

Utilisation

 Response depends on capacity At higher loads, congestion can set in

Traffic load

Actu

al

thro

ug

hp

ut

Congestion

Perfect

Sweet spotSweet spot

Sw

eet

spot

Sw

eet

spot

Page 13: CMG 101 - Understanding performance

So what was the bottleneck?

• KNMI: static page served from database 1000/sec

• Ministry: very chatty client/server interaction• DNB: JSP application server serves static

content• Anne Frank: many, large digital assets, no use of

CDN• Hospital information system: client (front-end)

code

Page 14: CMG 101 - Understanding performance

How to incorporate performance in development and operations

Page 15: CMG 101 - Understanding performance

Typical free and commercial tools and their functionality

Functionality• Monitoring• Reporting• Alerting• Analysis• Modelling• Etc …

Example tools• Nagios• Cacti• WatchMouse• PDQ• R• Yslow• …

Page 16: CMG 101 - Understanding performance

CMG 101• We want to develop a ‘standard’ body of

knowledge– To educate our people– Speak more of the same language– Enable tool vendors to more easily express their

offerings• Note: defining what is in the course is not the

same as developing a course

Page 17: CMG 101 - Understanding performance

Call for Action

• Want to know more?• Want to collaborate, contribute?• Want to get a course?• Want to sponsor?

• Talk to mePeter HJ van Eijk@petersgriddle

[email protected] +31 2268 4939

www.nlcmg.nl NLCMG is a chapter of CMG.org

Page 18: CMG 101 - Understanding performance

Some of my performance projects

• KNMI (Weather service): website meltdown after weather emergency (“weeralarm”)

• DNB (Dutch Banks Authority): website meltdown during 2008 financial crisis

• Unnamed Ministry: information system with multi-minute response times

• Crisis.nl: ….• Anne Frank website: … anticipated surge after major

redesign• Hospital information system: storage sizing

Page 19: CMG 101 - Understanding performance

http://zoom.nl/foto/1713577/portret/cloudwatch.html

Achtung alles Lookenspeepers! Nur watchen das Cloud.

Page 20: CMG 101 - Understanding performance

How does a financial IT crisis look like?

Page 21: CMG 101 - Understanding performance

Fernando’s office (bank’s capacity planner)