22
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris 11-12 June 2007 [email protected] www.egee-npm.org

Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks

Network Performance Monitoring in EGEEJeremy Nowell, EPCC5th TERENA NRENs and Grids Workshop, Paris11-12 June 2007

[email protected]

www.egee-npm.org

Page 2: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Overview

• EGEE Overview• Motivation and Requirements for NPM in EGEE• Strategy• Architecture• Tools and data available• Diagnostic Tool walkthrough• Issues and Observations• Conclusions

Page 3: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

EGEE Overview

• EGEE– 1 April 2004 – 31 March 2006– 71 partners in 27 countries, federated in regional Grids

• EGEE-II– 1 April 2006 – 31 March 2008– > 90 partners in 32 countries

• Objectives– Large-scale, production-quality

infrastructure for e-Science– Improving and maintaining

“gLite” Grid middleware– Attracting new resources and

users from industry as well as science

Page 4: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Why NPM for Grids? • For Grid operations

– Help diagnose performance problems between sitesThis transfer is slow, what’s broken? – the network, the server, the middleware…I can’t see site X, has the network gone down or just the cluster head-node?My application’s performance varies with time of day – is there a network bottleneck?

• For Grid middleware– I want to increase the performance of file transfers between sites– I want to know which compute site is “closest” to my data to submit a

job to it

• What’s different about NPM for the Grid?– Large amounts of application data, often continuous– Multiple streams– End-to-end performance crucial

Page 5: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

NPM User Requirements

Middleware• Programmatic interface

– Web service– Database

• Info for 100 paths returned in 0.2s• Relate Compute/Storage Element

with NMP • Raw, historical data for 24 hrs• Mainly end-to-end data

Operation Centres• NOCs and GOCs

– Web-based GUI– Interface to define alarms– On-demand & historical data– Backbone & end-to-end data

• NOCs – Display which tool gathered

the results and how– Per hop data/ability to zoom in

• GOCs– High-level statistics

Page 6: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

NPM Metric RequirementsRelevant to groupMetric / Info

Middleware NOC GOCTCP Achievable Bandwidth Yes YesPacket-loss Yes Yes YesRound-trip time Yes Yes YesRound-trip IPDV Yes YesOne-way delay Yes YesOne-way delay variation YesAvailable bandwidth (path) Yes Yes YesAvailable bandwidth (hop) YesPacket reordering Yes YesHop/list network topology Yes YesAvailability Yes YesPath MTU Yes YesQoS Class Yes YesService Level Agreement Yes YesOn-demand test on all metrics Yes Yes

Page 7: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

NPM General Requirements

• Scale and heterogeneity of EGEE fabric poses a requirement to support diversity of all kinds– Multitude of ways of collecting monitoring data

Different measurement types• end-to-end

o Appropriate to experience of user and application, eg TCP achievable bandwidth

• Backboneo Lower level measurements, used to pin-point source of problems

Different measurement toolsDifferent data formats

– Many administrative domains– Different user groups

Page 8: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Strategy• Aim to standardise access to NPM data across different domains

and frameworks– Note – we are not building measurement tools, but rather facilitating access

to data collected by them• Interoperability pursued through use of OGF NM-WG

– EGEE should not and cannot aim to enforce the uptake of a specific NPM framework across the diverse EGEE fabric or the associated networks

– Use NM-WG interfaces where they have been adopted; facilitate their useelsewhere.

End Users of Network Data

Resource-brokeringMiddleware

NOC/GOCUser

NPM Clientsand Services

Monitoring Frameworks

NREN usingPerfSONAR

Backbone usingPerfSONAR

End-sites usinge2emonit

Home-grownFramework

Page 9: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

NPM Architecture

BackbonePerfmonit

NM-WG

Some Client

BackbonePiPEs

NM-WG

BackbonePerfSONAR

NM-WG

End-siteHome-grown

NM-WGEnd-sitee2emonit

NM-WG

E2emonit Monitoring Framework

E2emonit Service

CapDiscoveryNM-WG v1

• Single point of contact• Standard interface• Insulation from framework

interface changes

Page 10: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

What’s available - Software• Clients

– The Diagnostic Tool (DT)For use by people

– The PublisherFor use by middleware

• Middleware– Mediator/Discoverer

• Monitoring Frameworks– e2emonit

Formerly EDG::WP7Provided and maintained by NPM team

– PerfSONAR– LHC-OPN

Soon?PerfSONAR Monitoring Framework

PerfSONARTranslation Service

CapDiscoveryNM-WG v1

NM-WG v2 Client

E2emonit Monitoring Framework

E2emonit Service

CapDiscoveryNM-WG v1

Data from GÉANT2

Data from EGEE PPS

Page 11: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

What’s available - Metrics

• Data depends on which tools you use!– We will allow access to any relevant data, provided it is available using

a OGF NM-WG compliant interface• e2emonit

– pingConnectivity

• Round trip time, packet loss

– iperfReal life application performance

• TCP achievable bandwidth

– udpmonNetwork health, congestion etc

• UDP achievable bandwidth, one-way delay, UDP packet loss

• PerfSONAR– Developed by GÉANT, Internet2 and ESNet– Currently accessing utilisation data

Page 12: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Data Federation

• Use of NM-WG schema facilitates federation– e2emonit from EGEE sites– e2emonit from related projects – BalticGrid– PerfSONAR Measurement Archives

Currently via translation layer

• Currently adopting version 2 of the NM-WG schema– Will allow access to more data sources

Gridmon (UK GridPP)Other PerfSONAR components

• E2E layer 2 link status (relevant for LHC-OPN)• Measurement Archives through native interface• BWCTL, OWAMP Measurement Points

Others – RRD based, flow etc?

Page 13: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DT Usage (1)

• Step 1: Access the NPM Diagnostic Tool.– The Diagnostic Tool can be accessed using a standard web browser, which users are individually authorised to use.

• In the future, we plan to use VOMS for authorisation.• Please mail us for access!

– The intended user is a NOC/GOC/ROC operator

Page 14: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DT Usage (2)

• Step 2: Select a Time.– The end-user does not have a specific time, but wants to see the performance for the past four weeks.– The user enters the appropriate time range, specifying a Start date/time of 2007-05-01 00:00:00 and a period of 4 weeks.– The user presses the Set button to confirm and the alternate time range representations update.

Page 15: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DT Usage (3)

• Step 3: Select a Path.– The end-user wants to see the performance for the path between Cyfronet in Krakow and CERN.– The user selects e2emonit sites at Cyfronet and CERN, adds the path and then selects “Find Data For This Query”

Page 16: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DT Usage (4)

• Step 4: Select a Metric.– The end-user experienced throughput problems.– Although there are several possibly relevant metrics to choose from (and only those measured are available to select from), the user decides to look at the Achievable Bandwidth on the path.– Achievable Bandwidth is selected from the Metrics box and the Set button pressed to confirm.

Page 17: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DT Usage (5)

• Step 5: Select a Statistic.– Several types of statistical data are available, such as Minimum, Maximum, Mean. – A particular interval can be applied to each, to provide, for example, an hourly mean over the past two days.– The user just wants a general overview of measurements and elects to retrieve raw data (Statistic check-box not checked).

Page 18: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DT Usage (6)

• Step 6: Select a View.– Currently Data Table and Time Plot views are available.– The user wants an overview of how the Achievable Bandwidth has changed over time, so selects the Time Plot.– The Query entry is complete, and the user selects Submit Query.

Page 19: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DT Usage (7)

• Step 7: Examine results.– The results are plotted, with Time on the x-axis and Achievable Bandwidth on the y-axis.– The parameters used to gather measurements are shown - here, showing that the iperf tool was used to gather the achievable bandwidth information.– These parameters can be useful in interpreting the results.

Page 20: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DT Usage (8)– Information from multiple

paths may be plotted at the same time.

– Here utilisation data for the GÉANT2 to JANET router is plotted for both inbound and outbound traffic over the course of one week, obtained from the GÉANT2 PerfSONAR Measurement Archive.

Page 21: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Issues and Observations• Providing data federation tools usually not enough by itself

– Sites will not necessarily have any monitoring data available, so they still need guidance to install monitoring tools

Those that do have monitoring may not know about it

• Deployment of monitoring tools is not easy– There has to be a clear benefit to the site before they install tools– This benefit is not obvious until after an incident has occurred, by which time it is too

late…– Firewall changes may be difficult (eg ICMP blocked by default)– They need to be trivial to install and robust when running– Need to carefully consider scheduling for end-to-end tests

• Different user groups may have widely different requirements fordisplaying data

– e.g. site or service admins may just want an alarm that tells them “your network is broken”, and never look at the DT

– Network people would not contemplate investigating problems without clear historical data to look at

• The network is still assumed by many to “just work”

Page 22: Network Performance Monitoring in EGEE - TERENA€¦ · Network Performance Monitoring in EGEE Jeremy Nowell, EPCC 5th TERENA NRENs and Grids Workshop, Paris ... the iperf tool was

NPM in EGEE - Jeremy Nowell, 5th NRENs and Grids Workshop 22

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Conclusions

• Providing federating access to network measurement data is an interesting technical challenge, but achievable– Facilitated by standards such as OGF NM-WG schema

• Getting access to data itself is much harder– Deployment challenge– Need to “sell” to sites the value of having data available