38
http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks for Non Networkers 2 nd International Workshop, 21-22 June 2005, Edinburgh, Scotland http://www.slac.stanford.edu/grp/scs/net/talk05/nfnn2-jun05.ppt Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP

Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

http://gridmon.dl.ac.uk/nfnn/

NFNN2, 20th-21st June 2005National e-Science Centre, Edinburgh

Diagnostic Steps

Les Cottrell – SLACPresented at the Networks for Non Networkers 2nd International Workshop, 21-22 June

2005, Edinburgh, Scotlandhttp://www.slac.stanford.edu/grp/scs/net/talk05/nfnn2-jun05.ppt

Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP

Page 2: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 2Les Cottrell, SLAC

Overview

Goal: provide a practical guide to debugging common problems (Brian covered high performance problems)

Why is diagnosis difficult yet important? Local host Ping, Traceroute, PingRoute Looking at time series Locating bottlenecks Correlation of problems with routes More tools and problems Where is a node Who do you tell, what do you say? Case studies and More Information

Page 3: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 3Les Cottrell, SLAC

Why is diagnosis difficult?

Internet's evolution as a composition of independently developed and deployed protocols, technologies, and core applications

Diversity, highly unpredictable, hard to find “invariants” Rapid evolution & change, no equilibrium so far

Findings may be out of date Measurement/diagnosis not high on vendors list of priorities

Resources/skill focus on more interesting an profitable issues Tools lacking or inadequate Implementations are flaky & not fully tested with new releases

Page 4: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 4Les Cottrell, SLAC

Add to that … Distributed systems are very hard

A distributed system is one in which I can't get my work done because a computer I've never heard of has failed. Butler Lampson

Network is deliberately transparent The bottlenecks can be in any of the following components:

the applications the OS the disks, NICs, bus, memory, etc. on sender or receiver the network switches and routers, and so on

Problems may not be logical Most problems are operator errors, configurations, bugs

When building distributed systems, we often observe unexpectedly low performance

the reasons for which are usually not obvious Just when you think you’ve cracked it, in steps security

Firewall, NAT boxes etc. Block pings, traceroute looks like port scan, diagnostic tool ports are

blocked … ISPs worried about providing access to core, making results public, &

privacy issues

Page 5: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 5Les Cottrell, SLAC

Sources of problems

Host “errors” TCP buffers, heavy utilization …

Duplex mismatch (Ethernet) Misconfigured router/switches

Including routing errors, especially for backup paths

Bad equipment, wiring/fiber problem Congestion

Page 6: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 6Les Cottrell, SLAC

Local Host (also see NDT later) Usual Unix tools (uname-a, top, vmstat, iostat ..) Is the host overloaded, do you have a gateway

(route), name server (nslookup), which interface are you using (mii-tool (needs root), gives duplex & speed = common error source)

Net: ifconfig –a (look at errors), netstat –a Is server running (if you know port)?

>telnet localhost 2811 Trying 127.0.0.1 220 aftpexp04.bnl.gov GridFTP Server 1.12 GSSAPI

type Globus/GSI wu-2.6.2 (gcc32dbg, 1069715860-42) ready.

^] telnet> quit

Page 7: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 7Les Cottrell, SLAC

Local Host - LISA

Localhost Information Service Agent  LISA is a Java Web Start application which provides: Integration with MonALISA Complete Monitoring of the System (Load, CPU, Memory, Disk,

Disk IO, Paging, Processes, Network Traffic and Connectivity...). History and instantaneous Filters to trigger actions when predefined conditions are detected. A user Friendly GUI to present the monitoring information. Optimization modules for distributed applications. It is a lightweight application that can be easily deployed on any

system. Modules for End to End network measurements ( e.g. IPERF). See monalisa.caltech.edu/dev_lisa.html

Page 8: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 8Les Cottrell, SLAC

Ping

Ping to localhost, ping to gateway, ping to well known host & to relevant remote host Use IP address to avoid nameserver problems Look for connectivity, loss, RTT, jitter, dups May need to run for a long time to see some

pathologies (e.g. bursty loss due to DSL loss of sync) Try flood pings if suspect rate limited Use synack or sting if ICMP blocked

www-iepm.slac.stanford.edu/tools/synack/

Page 9: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 9Les Cottrell, SLAC

Ping example

syrup:/home$ ping -c 6 -s 64 thumper.bellcore.com PING thumper.bellcore.com (128.96.41.1): 64 data bytes 72 bytes from 128.96.41.1: icmp_seq=0 ttl=240 time=641.8 ms 72 bytes from 128.96.41.1: icmp_seq=2 ttl=240 time=1072.7 ms 72 bytes from 128.96.41.1: icmp_seq=3 ttl=240 time=1447.4 ms 72 bytes from 128.96.41.1: icmp_seq=4 ttl=240 time=758.5 ms 72 bytes from 128.96.41.1: icmp_seq=5 ttl=240 time=482.1 ms --- thumper.bellcore.com ping statistics --- 6 packets transmitted, 5

packets received, 16% packet loss round-trip min/avg/max = 482.1/880.5/1447.4 ms

Repeat count Packet size Remote host

RTT

Missing seq #

Summary

Page 10: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 10Les Cottrell, SLAC

3rd party ping (via Looking Glass)

Find servers: www.caida.org/analysis/routing/reversetrace/

Example: http://stats.geant.net/cgi-bin/lg/lg.cgi Ok for checking connectivity and RTT but not for losses (unless

huge)

Looking Glass Results - ch1.ch.geant.net Date: Mon May 30 21:28:39 2005 GMT Query: Ping <IP_Addr | FQDN>Real Query: ping rapid count 5Argument(s): www.slac.stanford.edu PING www8.slac.stanford.edu (134.79.18.163): 56 data bytes !!!!! --- www8.slac.stanford.edu ping statistics --- 5 packets transmitted, 5 packets received, 0% packet loss round-trip min/avg/max/stddev=167.316/172.212/191.222/9.506 ms

Page 11: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 11Les Cottrell, SLAC

Traceroute Traceroute to remote host

Is the route direct, over commercial congested nets

Reverse traceroute from remote host to you or 3rd party www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html www.tracert.com/ www.caida.org/analysis/routing/reversetrace/

CAIDA Mouse sensitivemap

Page 12: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 12Les Cottrell, SLAC

Traceroute

UDP/ICMP tool to show route packets take from local to remote host

17cottrell@flora06:~>traceroute -q 1 -m 20 lhr.comsats.net.pktraceroute to lhr.comsats.net.pk (210.56.16.10), 20 hops max, 40 byte packets 1 RTR-CORE1.SLAC.Stanford.EDU (134.79.19.2) 0.642 ms 2 RTR-MSFC-DMZ.SLAC.Stanford.EDU (134.79.135.21) 0.616 ms 3 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.66) 0.716 ms 4 snv-slac.es.net (134.55.208.30) 1.377 ms 5 nyc-snv.es.net (134.55.205.22) 75.536 ms 6 nynap-nyc.es.net (134.55.208.146) 80.629 ms 7 gin-nyy-bbl.teleglobe.net (192.157.69.33) 154.742 ms 8 if-1-0-1.bb5.NewYork.Teleglobe.net (207.45.223.5) 137.403 ms 9 if-12-0-0.bb6.NewYork.Teleglobe.net (207.45.221.72) 135.850 ms10 207.45.205.18 (207.45.205.18) 128.648 ms11 210.56.31.94 (210.56.31.94) 762.150 ms12 islamabad-gw2.comsats.net.pk (210.56.8.4) 751.851 ms13 * 14 lhr.comsats.net.pk (210.56.16.10) 827.301 ms

Probes/hop Max hops Remote host

No response:Lost packet or router

ignores

Long delaysatellite

location

Page 13: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 13Les Cottrell, SLAC

RTT from California to world

Longitude (degrees)

300ms

300ms

RTT (ms.)

Fre

quen

cy

RT

T (

ms)

Source = Palo Alto CA, W. Coast

E. C

oast

US

W. C

oast

US

Eur

ope

& S

. Am

eric

a

Europe

0.3*0.6c

Bra

zil

E. C

oast

Data from CAIDA Skitter project

Page 14: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 14Les Cottrell, SLAC

Traceroute server results Example: www.slac.stanford.edu/cgi-bin/nph-traceroute.pl

Securitywarning

Traceroute

Relatedinfo

Enter IP address or name

Page 15: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 15Les Cottrell, SLAC

Pingroute Ping routers along route, e.g. a tool to install that helps:

www.slac.stanford.edu/comp/net/fpingroute.pl or www.slac.stanford.edu/comp/net/fpingroute.pl if fping N/A

15cottrell@noric04:~>fpingroute.plfpingroute.pl does a traceroute to the selected host. For each of the hops along the route it then uses fping to ping each node (in parallel) 'count' times. Output includes traceroute information, RTTs, losses for 100 and 'size‘ byte pings.Version=0.21, 8/24/04Usage: fpingroute.pl [Opts] host where host is the remote host's IP address or name e.g. www.slac.stanford.edu Opts: [-c count default=10] [-s size default=1400] [-i initial default=1]Example: fpingroute.pl -i 3 -c 10 -s 1400 www.triumf.ca

Page 16: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 16Les Cottrell, SLAC

Pingroute example May help tell where losses start Will need many pings if losses small

Routers may not

respond

Start of losses?

But?

Start ofsustained

losses

Page 17: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 17Les Cottrell, SLAC

Look at time series

Look at history plots (PingER, AMP, IEPM-BW, ISPs, own border router etc.), when did problem start, how big an effect is it? Assumes you know “proximity” of paths for which there are

archived active measurements to the path that you are interested in

Also that relevant measurements existwww-iepm.slac.stanford.edu/pinger/ amp.nlanr.net/ISPs plots:

– Abilene: http://stryper.uits.iu.edu/abilene/ – GEANT: http://stats.geant.net/usagemap/usagemap– RIPE: http://www.ripe.net/projects/ttm/Plots/ – ESnet: http://measurement.es.net/ (OWAMP)

Collaboration between Internet2/ESnet/Geant to provide access to router measurements holds promise

Look at traceroute histories (see later)

Page 18: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 18Les Cottrell, SLAC

Example time series

Look for change in measured value Note

time Correlate Italy disconnected

Page 19: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 19Les Cottrell, SLAC

Find location of a bottleneck Look at hops along the path

Pingroute (see earlier) If possible look at utilizations or active probes launched from

there Pipechar (son of pathchar, pchar)

Send packets of varying sizes to each router along pathLook at RTT as a function of packet sizeFrom slope deduce “bandwidth”Diferentiate to find capacity at each hopHowever pchar is no longer supported, pathchar is very

slow, pipechar has uncertain support (ask Brian)Packet size variation limited to 1-MTU (~1500) Bytes, so

on fast links timing is difficult, with the result that estimates may not be reliable

– Find pipechar at: http://www.dsd.lbl.gov/OldProjects/NCS/

Page 20: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 20Les Cottrell, SLAC

Divide & Conquer

Abilene has hosts at major PoPs running bwctl So make measurements from end to middle to ID loss

of performance http://e2epi.internet2.edu/pipes/ami/bwctl/

Page 21: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 21Les Cottrell, SLAC

Correlate with routes (traceanal)

Page 22: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 22Les Cottrell, SLAC

Visualizing traceroutes

One compact page per day One row per host, one column per hour One character per traceroute to indicate pathology or change (usually

period(.) = no change) Identify unique routes with a number

Be able to inspect the route associated with a route number Provide for analysis of long term route evolutions

Route # at start of day, gives idea of route stability

Multiple route changes (due to GEANT), later restored to original route

Period (.) means no change

Page 23: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 23Les Cottrell, SLAC

Changes in network topology (BGP) can result in dramatic changes in performance

Snapshot of traceroute summary table

Samples of traceroute trees generated from the table

ABwE measurement one/minute for 24 hours Thurs Oct 9 9:00am to Fri Oct 10 9:01am

Drop in performance(From original path: SLAC-CENIC-Caltech to SLAC-Esnet-LosNettos (100Mbps) -Caltech )

Back to original path

Changes detected by IEPM-Iperf and AbWE

Esnet-LosNettos segment in the path(100 Mbits/s)

Hour

Rem

ote

host

Dynamic BW capacity (DBC)

Cross-traffic (XT)

Available BW = (DBC-XT)

Mbit

s/s

Notes:1. Caltech misrouted via Los-Nettos 100Mbps commercial net 14:00-17:002. ESnet/GEANT working on routes from 2:00 to 14:003. A previous occurrence went un-noticed for 2 months4. Next step is to auto detect and notify

Los-Nettos (100Mbps)

Page 24: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 24Les Cottrell, SLAC

Moving towards application

See Brian’s talk Try user application (mem to mem & disk to disk)

GridFTP, bbcp, bbftp …

Iperf or thrulay (also provides RTT) to test TCP or UDP throughput dast.nlanr.net/Projects/Iperf/ www.internet2.edu/~shalunov/thrulay/

NDT What are the interface speeds? What is the bottleneck? Is there a duplex mismatch? Are buffers set right (both ends)?

Page 25: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 25Les Cottrell, SLAC

NDT example (Rich Carlson)

Page 26: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 26Les Cottrell, SLAC

Other tools Ntop

Summarizes libpcap (sniffer) infor

Internet2 Detective: Tests connectivity to I2, bandwidth, multicast, IPv6

Can run as Java applethttp://detective.internet2.edu/

NLANR Internet Advisor Ethereal, tcpdump, snoop for masochists Passive tools:

Netflow for characterizing network, spotting abnormalities, e.g. www.itec.oar.net/abilene-netflow www.slac.stanford.edu/comp/net/slac-netflow/html/SLAC-

netflow.html SNMP based tools

Page 27: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 27Les Cottrell, SLAC

And then …

Wireless Avoid peer-to-peer/ad-hoc connections

Disable connecting to ad-hoc (set infrastructure only)Disable bridgingHow to do it varies by OS (XP, OSX, Linux)

Ad hoc can still interfere if on same channel Tools to locate an access point (e.g. Yellow-Jacket) See

www2.slac.stanford.edu/comp/net/wireless/Wireless-Meeting-Handout.mht

NAT boxes may block or not support application Private addresses:

10.0.0.0 - 10.255.255.255 a single class A net172.16.0.0 - 172.31.255.255 16 contiguous class Bs192.168.0.0 – 192.168.255.255 256 contiguous class Cs

Page 28: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 28Les Cottrell, SLAC

“Where is” a host? Beware some of information following is ephemeral, in general use

heuristics with Google Google “Internet country codes” for TLDs

Host may not be in TLD country, especially developing regions often use proxies elsewhere

Location may be encoded in router name ipls=Indianapolis, snv=Sunnyvale …

Name server lookup to find hostname given IP address47cottrell@netflow:~>nslookup 210.56.16.10Server: localhostAddress: 127.0.0.1Name: lhr.comsats.net.pkAddress: 210.56.16.10

Use a whois server, e.g. www.networksolutions.com/cgi-bin/whois/whois (Americas & Africa)www.ripe.net/cgi-bin/whois (Europe)www.apnic.net/ (Asia)May identify site name, address, contact, etc, not all domains are in

databases (e.g. will not find comsats.net.pk)

Page 29: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 29Les Cottrell, SLAC

“Where is” a host – cont.

Find the Autonomous System (AS) administering Form giving AS for domain name

http://www.fixedorbit.com/search.htmGives AS number, name adjacent AS’s web page for AS

Given an AS find out more about it:Use http://bgp.potaroo.net/cidr/ go to bottom and enter AS into

form:– Gives ISP name, web page, phone number, email, hours etc.

Review list of AS's ordered by Upstream AS Adjacencywww.telstra.net/ops/bgp/bgp-as-upsstm.txtTells what AS is upstream of an ISP

Page 30: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 30Les Cottrell, SLAC

“Where is” a host - cont.

May be able to get latitude & longitude: http://www.hostip.info/index.html http://www.ip2location.com/ 

But it is a subscriber service ($$$, but …), however it is probably best for developing regions

Triangulate pings from landmarks (in development) planetlab-01.ipv6.lip6.fr:10000/cbg.php

Page 31: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 31Les Cottrell, SLAC

Who you gonna tell?

Local network support people Internet Service Provider (ISP) usually done by local networker

Usually will know immediate one, e.g. [email protected] Use puck.nether.net/netops/nocs.cgi to find ISP Use www.telstra.net/ops/bgp/bgp-as-upsstm.txt to find upstream ISPs

Well managed sites and ISPs maintain a list of email addresses such as abuse@ or postmaster@, that one can send email to, for example to complain about spam etc. This follows an Internet recommendation (RFC 2142). Some less helpful sites do not provide such services, for more on these,

see RFC-ignorant.org

Page 32: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 32Les Cottrell, SLAC

What ya gonna tell ‘em? Describe problem with details

What is affected?Application, host OS (uname –a), NIC (ifconfig, route)

How is it affected?Non responsiveness, unable to contact remote hostSlow performance (see Brian’s talk), packet loss

When did it start?

Send ping output between hosts Send traceroute forward & reverse – if possible

Maybe use –I (ICMP option)

NDT Identify when it started If complex think about creating web page with details

Top, vmstat, pingroute, pipechar, application output (GridFTP, iperf)…

Page 33: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 33Les Cottrell, SLAC

Web page examples: Case studies

http://www.slac.stanford.edu/grp/scs/net/case/html/ http://e2epi.internet2.edu/case-studies/

Page 34: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 34Les Cottrell, SLAC

More Information Tutorial on monitoring

www.slac.stanford.edu/comp/net/wan-mon/tutorial.html RFC 2151 on Internet tools

www.freesoft.org/CIE/RFC/Orig/rfc2151.txt Network monitoring tools

www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html www.caida.org/tools/taxonomy/

Network Performance Tools: an I2 Cookbook e2epi.internet2.edu/network-perf-wk/tools-cookbook.pdf

Network Monitoring sites www.slac.stanford.edu/comp/net/wan-mon/netmon.html

Page 35: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 35Les Cottrell, SLAC

Pathology Encodings

Stutter

Probe type

End host not pingable

ICMP checksum

Change in only 4th octet

Hop does not respond

No change

Multihomed

! Annotation (!X)

Change but same AS

Page 36: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 36Les Cottrell, SLAC

Navigationtraceroute to CCSVSN04.IN2P3.FR (134.158.104.199), 30 hops max, 38 byte packets 1 rtr-gsr-test (134.79.243.1) 0.102 ms …13 in2p3-lyon.cssi.renater.fr (193.51.181.6) 154.063 ms !X

#rt# firstseen lastseen route0 1086844945 1089705757 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx1 1087467754 1089702792 ...,192.68.191.83,171.64.1.132,137,...,131.215.xxx.xxx2 1087472550 1087473162 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx3 1087529551 1087954977 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx4 1087875771 1087955566 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,(n/a),131.215.xxx.xxx5 1087957378 1087957378 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx6 1088221368 1088221368 ...,192.68.191.146,134.55.209.1,134.55.209.6,...,131.215.xxx.xxx7 1089217384 1089615761 ...,192.68.191.83,137.164.23.41,(n/a),...,131.215.xxx.xxx8 1089294790 1089432163 ...,192.68.191.83,137.164.23.41,137.164.22.37,(n/a),...,131.215.xxx.xxx

Page 37: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 37Les Cottrell, SLAC

History Channel

Page 38: Http://gridmon.dl.ac.uk/nfnn/ NFNN2, 20th-21st June 2005 National e-Science Centre, Edinburgh Diagnostic Steps Les Cottrell – SLAC Presented at the Networks

Slide: 38Les Cottrell, SLAC

AS’ information