3/4/98\\pcbackup\users\cottrell\icfa\icfa- mar98.ppt 1 Internet Monitoring - Results Les Cottrell...

Preview:

Citation preview

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

1

Internet Monitoring - Results

Les Cottrell SLAC

<cottrell@slac.stanford.edu>

Presented at the ICFA Meeting, CERN, Mar 1998Partially funded by MICS joint SLAC/LBL proposal on Internet End-to-end Performance

Monitoring (IEPM)

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

2

Outline of Talk

• What, why & how are we (ESnet/HENP community) measuring?

• What PingER measurement reports are available and what do they show– (short), intermediate & long term

– grouping and multi-site visualization

• Traffic volume & Traceroute measurements• Summary

– Deployment/development, Internet Performance, Next Steps

– Collaborations

– NIMI/IPWT

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

3

Why go to the effort?• Apparent quality of Internet getting worse as size

and demands increase

• Internet woefully under-measured & under-instrumented

• Internet very diverse - no single path typical

• Users need:– realistic expectations, planning information– guidelines for setting and validating SLAs– information to help in identifying problems– help to decide where to apply resources

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

4

Importance of Response Time

• Time is scarcest and most valuable commodity– Studies in late 70’s and early 80s showed the economic

value of Rapid Response Time • 0-0.4s High productivity interactive response

• 0.4-2s Fully interactive regime

• 2-12s Sporadically interactive regime

• 12s-600s Break in contact regime

– >600sBatch regime– Threshold around 4-5s complaints increase rapidly. – Voice has threshold around 100ms

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

5

Perception of Poor Packet Loss

• Above 4-6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate.

• The occurrence of long delays of 4 seconds or more at a frequency of 4-5% or more is also irritating for interactive activities such as telnet and X windows.

• Above 10-12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable.

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

6

Our Main Metric is Ping

• “Universally available”, easy to understand– no software for clients to install

• Low network impact

• Provides useful real world measures of loss, response time, reachability, unpredictability

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

7

Ping Response vs Web Response 1/2

y = 1.7135x + 719.83y = 2.5726x

0

200

400

600

800

1000

1200

1400

1600

1800

2000

0 100 200 300 400 500

Minimum Ping Response in msec.

GE

T R

esp

on

se i

n m

sec.

y = 2x

y = 1.71x + 720

y = 2.57x

HT

TP

GE

T R

espo

nse

(ms)

Minimum Ping Response (ms)

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

8

Ping Response vs Web Response 2/2Residuals for Min HTTP-GET = 2 * Min Ping

0

20

40

60

80

100

120

140

160

180

200

-400 -200 0 200 400 600 800 1000

Residual (ms)

Fre

qu

en

cy

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Cu

mu

lati

ve

%

Frequency

Cumulative %

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

9

Ranked packet loss for 3 months

Rome

UK

Stanford

Cincinnatti

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

10

Sawtooth EffectPacket Loss for UK sites Jan-95 thru Jan98 seen from

SLAC

0

5

10

15

20

25

30

35

40

Jan-

95

Mar

-95

May

-95

Jul-9

5

Sep-9

5

Nov-9

5

Jan-

96

Mar

-96

May

-96

Jul-9

6

Sep-9

6

Nov-9

6

Jan-

97

Mar

-97

May

-97

Jul-9

7

Sep-9

7

Nov-9

7

Jan-

98

% P

rim

e t

ime

pa

cle

t lo

ss

gla.ac.uk

rl.ac.uk

Added 45 Mbps (quadrupled capacity)

2 * capacity (+ 2Mbps) 3 * capacity

+ 9 Mbps

Holidays

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

11

RAL Last 180 Days plotLines are simply cubic splines fits to aid eye

Upper green and black points are response time in ms

Red & blue are weekday loss

Cyan are weekend loss

Note weekend/weekday differences (cyan vs blue)

Note Xmas/New Year lull

Also note quick onset of saturation at end August & September

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

12

Italian sites look similar to each otherPacket Loss for Italian sites Jan-95 thru Jan-98 seen from SLAC

0

5

10

15

20

25

30

35

40

Jan-

95

Ma

r-9

5

Ma

y-9

5

Jul-9

5

Se

p-9

5

No

v-9

5

Jan-

96

Ma

r-9

6

Ma

y-9

6

Jul-9

6

Se

p-9

6

No

v-9

6

Jan-

97

Ma

r-9

7

Ma

y-9

7

Jul-9

7

Se

p-9

7

No

v-9

7

Jan-

98

% P

rim

e t

ime

pa

ck

et

los

s

ge.infn.it

lnf.infn.it

na.infn.it

pd.infn.it

roma1.infn.it

ts.infn.it

cern.ch

desy.de

ethz.ch

fzu.cz

gla.ac.uk

ihep.ac.cn

inp.nsk.su

kek.jp

phy.tu-dresden.derl.ac.uk

rmki.kfki.hu

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

13

• Note RL (UK) saw-tooths as add UK-US bandwidth (Apr-96, Feb-97, Aug-97)

Representative International HENP Site Loss Jan-95 thru Nov-97

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

14

Aggregation• Group measurements, for example:

– by area (e.g. N. America E, N. America E, W. Europe/Japan, others, by country)

– trans-oceanic links, intercontinental links– separation e.g. number of hops, time zones crossed, IXPs

crossed– ISP (ESnet, vBNS/I2, ...)– by monitoring site– one site seen from multiple sites– common interest/affiliation (XIWT, HENP …)– user selectable

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

15

Group Selection (all sites monitoring CERN)

CMUCMUCNAFRLFNALSLACDESYDESYCareltonRMKIRMKICERNKEK

Select one of these groups

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

16

• Improved between 1 and 2.5% / month

• Response & Loss similar improvements– care with new sites

Group Response Time Jan-95 Nov-97

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

17

• Frequency of zero packet loss (for all time - not cut on prime time)

Network Quiescence

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

18

Ping Loss Quality

• Want quick to grasp indicator of link quality

• Loss is the most sensitive indicator– loss of packet requires ~ 4 sec TCP retry timeout– Studies on economic value of response time by IBM

showed there is a threshold around 4-5secs where complaints increase.

– 0-1% = Good 1-2.5% = Acceptable– 2.5%-5% = Poor 5%-12% = Very Poor– > 12% = Bad

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

19

Quality Distributions

• ESnet median good quality

• All other groups poor or very poor

• Critical to have good peering

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

20

Rem

ote

Site

sMulti Collection Site Visualization

Collection Sites

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

21

• Move mouse over ? to see # links

Looks pretty bad for intercontinental use

Intercontinental Grouping (Loss)

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

22

Mouseover red dots gives more information on TLD (e.g. ch=Switzerland)

Diagonals are within TLD

Top Level Domain Grouping (Loss)

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

23

TLD (Response Time)

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

24

Grouping DetailsSelect groupSelect metric

Sort

Color for quality

Also provides Excel for DIY at bottom

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

25

Recent Transoceanic trendsMedian Transoceanic Packet Loss for

172 Links seen from 14 Sites

0

2

4

6

8

10

12

14

16

Sep-97 Oct-97 Nov-97 Dec-97 Jan-98 Feb-98

% P

acke

t L

oss

25%

75%

Median

Expon. (Median)

9.5% improvement/month

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

26

By Monitoring SiteHENP Median Monthly Packet Loss for 163 links seen from 7 Monitoring Sites

0

2

4

6

8

10

12

14

16

Sep-97 Oct-97 Nov-97 Dec-97 Jan-98 Feb-98

Me

dia

n m

on

thly

pa

ck

et

los

s

INFN/CNAF Median

HEPNRC/FNAL Median

SLAC Median

DESY Median

RMKI/KFKI Median

CERN Median

KEK MedianSite # %/month INFN 14 26%FNAL 46 1.5%SLAC 18 15%DESY 13 -12%RMKI 5CERN 61 17%KEK 6

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

27

CERN Monitoring TLDsHENP Packet loss seen from CERN For

4 main groups of TLDs

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

10.0

Sep-97 Oct-97 Nov-97 Dec-97 Jan-98 Feb-98

% P

acke

t lo

ss

JP

ESnet

EDU

CA

# % Improvement4 16%7 43%39 8.5%5 4.2%

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

28

ESnet bytes accepted by site for Jan ‘98

Exchanges

LBL/ESnet

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

29

US HENP Traffic Growth

Exponential growth from 3-6%

Monthly ESnet Bytes Accepted from Major US HENP Labs Jun-90 thru Jan-98

1000000000

10000000000

1E+11

1E+12

1E+13

Jun-

90

Oct-90

Feb-9

1

Jun-

91

Oct-91

Feb-9

2

Jun-

92

Oct-92

Feb-9

3

Jun-

93

Oct-93

Feb-9

4

Jun-

94

Oct-94

Feb-9

5

Jun-

95

Oct-95

Feb-9

6

Jun-

96

Oct-96

Feb-9

7

Jun-

97

Oct-97

log

of

byt

es a

ccep

ted

/mo

nth anl bnl

cebaf fnallbl slacExpon. (cebaf) Expon. (bnl)Expon. (slac) Expon. (lbl)Expon. (fnal) Expon. (anl)

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

30

CERN-US E1(2Mbps) link

Added 2nd 2Mbps link

Multi Router Traffic Grapher (MRTG)

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

31

Traffic Volume for Germany (DFN)

DFN T1 Utilization for 15 Jan ‘98 (5 min averages)

ESnet Serial Traffic Peaks -- December 1997 (created Jan 15, 1998 ) Line Speed = 15360000 bits/sec

Number of Peaks vs Percent of Utilization

cebaf1-dfn2_2min

cebaf1-dfn2_10min

cebaf1-dfn2_60min

dfn2-cebaf1_2min

dfn2-cebaf1_10min

dfn2-cebaf1_60min

0 5,000 10,000 15,000 20,000 25,000

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

# Samples

DFN T1 Utilization 15 Jan ‘98 (5 min averages)

# of 2 min periods in Dec-96 with peak utilization > y %

From US

To US

Green = to USBlue = from US

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

32

Capacity/Load Ratios

• Looking at the link capacity/average load– Most ESnet links show ratios of a few to several tens– The international links (CERN-Perryman (~4), DFN

(~5), Italy (~4), KEK (~10), Canada (15)) show ratios of 4-15

– The worst link appears to be the MAE-W-ESnet link at about 1.5 ratio

• However this may not be the bottleneck link

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

33

Bottlenecks• Identification

– Traceroute • from/to multiple sites can identify common path segments in

the maps

• Can see onset of losses with traceping

– Pathchar can identify bottlenecks

• Then need to work on:– avoiding bottlenecks (new peering)– getting bottleneck owners to improve– this is difficult, lots of potential bottlenecks, bottlenecks

move, not under our control

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

34

Mul

iple

rou

tes

seen

TracePing (Oxford)

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

35

Traceroute

• Reverse traceroute servers

• Traceping

• TopologyMap– Ellipses show node on

route

– Open ellipse is measurement node

– Blue ellipse no reachable

• Keeping history

From TRIUMF

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

36

GUI Traceroute (e.g. VisualRoute)

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

37

• Deployment Development– ESnet/HENP has 14 Collection sites in 8 countries

collecting data on > 500 links involving 22 countries– XIWT/IPWT deployed ~ 10 collection sites using

PingER tools– 600MB/month/link, 6 bps/link, .25 FTE @ analysis site,

1.5-2.5 FTE on analysis– HEPNRC gathering, archiving– Long term reports being ported to HEPNRC from SLAC– Long term analysis today usually requires tool like SAS

Summary

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

38

Summary• Deployment Development

• Internet Performance – Performance within ESnet is good– Performance between ESnet & other sites is poor to very

poor on average• one of main causes is congestion points, so peering is critical

– Intercontinental performance is very poor to bad– ESnet traffic accepted from major HENP labs growing

by 3-6% per month– Response time improving by 1-2% / month– Packet loss improving between SLAC & other sites by

3% / month

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

39

Summary• Deployment Development

• Internet Performance (continued):– Links to sites outside N. America vary from good (KEK)

to bad– Some of the bad sites are to be expected, e.g. FSU,

China, Czeck Republic, some surprises such as UK– CERN, France, Germany acceptable to poor

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

40

Summary• Deployment Development

• Internet Performance

• Next Steps– Improve tools:

– Make long term reports at Analysis site available & understandable

– Look into prediction (extrapolations, develop models, configure and validate with data)

– Pursue IETF Surveyor & NIMI deployment

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

41

National Internet Measurement Infrastructure (NIMI)

• Secure, scalable infrastructure for scheduling monitoring, gathering data

• Minimal amount of human intervention

• Inexpensive probe built on PC FreeBSD platform

• Dynamic - can add/modify measurement suites, initially includes:– Traceroute– TReno - measures bulk transfer thruput– Poip - one way ping

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

42

Asymmetric One-way Delays

0%

20%

Loss Loss

Delay Delay

Advanced to U Chicago U Chicago to Advanced

0ms

300ms

0 24

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

43

NIMI• Deployed at PSC, LBL, FNAL, platforms being

configured at SLAC & CERN

• As NIMI becomes more real will start to use as infrastructure for IPPM Surveyors

• Security– allows full policy control over any box you own or

delegation of all or subsets– uses ACLs with authentication for requests, and

encryption to prevent sniffing

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

44

Summary• Deployment Development

• Internet Performance

• Next Steps

• Lots of collaboration:– SLAC & HEPNRC

– 14 collection sites, ~ 400 remote sites

– Collection site tools CERN & CNAF/ICFA

– Oxford/TracePing

– MapPing/MAPNet/NLANR

– TRIUMF Traceroute topology Map

– NIMI/LBNL & Surveyor/IETF

– XIWT/IPWT

– Talks at IETF, XIWT, ICFA, ESCC ...

3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

45

More Information• ICFA Monitoring WG home page (links to status report,

meeting notes, how to access data, and code)– http://www.slac.stanford.edu/xorg/icfa/ntf/home.html

• WAN Monitoring at SLAC has lots of links– http://www.slac.stanford.edu/comp/net/wan-mon.html

• Tutorial on WAN Monitoring– http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html

• MapPing Tool:– http://www.slac.stanford.edu/~warrenm/work/java/newjava/mapping.html

• NIMI http://www.psc.edu/~mahdavi/nimi_paper/NIMI.html

Recommended