18
Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective Mark Leese Paul Mealor

Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Embed Size (px)

DESCRIPTION

Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) The Grid? uBasic Grid principle: uUser applications (Grid apps) submit their work to the middleware which selects the “best” resources available to runs the job. uNetwork performance information is essential...because… Grid App Middleware Grid App Resource (SE) Resource (CE) Network Grid App Uzbekistan CERN

Citation preview

Page 1: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

1st EGEE ConferenceCork, April 2004

Network Monitoring: The GGF Perspective

Mark Leese Paul Mealor

Page 2: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

Contents

Simple really: Use cases - why this is important What GGF is doing

Page 3: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

The Grid?

Basic Grid principle:

User applications (Grid apps) submit their work to the middleware which selects the “best” resources available to runs the job.

Network performance information is essential...because…

Grid App

Middleware

Grid App

Resource

(SE)

Resource

(CE) Network

Grid App

Uzbekistan CERN

Page 4: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

Resource Brokers (RBs) are responsible for finding the best resource (Computing Element, CE) to be used for a job, e.g.:Run job at B, using copy of data from A, then store results at CAll other things being equal, take into account the data access requirements of the jobOut of the list of CEs capable of running the job, use network cost function to identify the CE with the “best” data access:

Consider “best” combination of data sources and sinks, e.g. IF source data = 10 GB AND resulting data will = 100 GB THEN pick CE based on performance to result storing SE (Storage Element).European Data Grid does something along these lines (Please, no one tell me that this is wrong)

Use Case 1: Resource Selection

Network Cost

FunctionEstimated transfer time

File source & destination

File size

Page 5: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

Use Case 2: Replica Selection

File replication = proven technique for improving data accessSpread multiple copies of same file across the Grid

Do you really want to get everything from CERN, everytime? Do you really want to get everything from your geographically nearest

site everytime?A file has Logical File Name (LFN) which maps to 1 or more PFNs (physicals)Replica Manager should include Replica Selection Service which uses network performance data (from somewhere) to find “best” replica.

5. GridFTP commands

Replica Catalogue

Replica Selection

Grid App

2. Multiple locations (PFNs)

1.LFN

4. Selected replica (PFN)

Net Mon Service

3. Get performance

data/ predictions

GGF looking at formally defining these (and other) use

cases

Page 6: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

Patience ;-) First we must look at web services.Essentially, an online application accessed using XML...…which makes it easier for other apps to use yours……which allows the Grid middleware to access our data

How are GGF addressing problem?

UDDI registry

WSPClient 3. Client requests WSDL doc

4. WSDL tells client how to interact

1. WSP registers service with registry

2. Client locates suitable service using registry

5. Service and client interact using XML messages, sent via SOAP

Page 7: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

By producing standards relating to network monitoring services.First with the Network Measurements Working Group (NM-WG):

Defining XML schemas for requesting tests and historic data, and publishing network measurements

Aims: to standardise communication, and… …use XML, for web services and OGSI model Simple use case…

All request & result messages can be formatted using standardised schemas = truly powerful combination

How are GGF addressing problem?

Network Monitoring

Servicetest request

(request schema)

tests results

(publication schema)

DANTE, Internet2, SLAC etc. already using NM-WG work.

Page 8: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

Standard measurements? Schemas based on NM-WG proposed measurement classification system:

describes a set of network characteristics and their classification hierarchy used for creating common schemata for describing network monitoring data using a standard classification maximises data portability

description

+

hierarchy

Page 9: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

So what can you ask for 1?

Initial schema requirements set. Four sections: what, where, when, how

What: Use GGF metric names, e.g. path.delay.oneWay Can request statistical data, with a specified sample interval, e.g.

daily averages for one-way delay over the last month After some “discussion”, multiple statistics in same request Can limit number of returned results to avoid overload

Where: Source and destination Flexible: IPv4|6, hostnames, or textual names such as “core

router” and “edge router” (e.g. for security)

Page 10: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

When: The primary means of specifying the time period we

are interested in (for tests or data retrieval) is:target Time (an absolute time or “now”)relative +ve and -ve time tolerances…

-ve time tolerance = 600 secstarget_time = 14:00-ve time tolerance = 600 secs

So what can you ask for 2?

= 13:50-14:10

Page 11: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

So what can you ask for 3?

Setting limit on number of results controls possibilities:when number of results = “all”: supply all matching

measurements in given time periodwhen number of results = 1: time data defines the

period for which a measurement is considered to be acceptable, e.g. 14:00 +/- 10 minutes

Can also give start & end time if you wish, but values are mapped to target_time & number of results will = all

“testing interval” controls how often tests are run

Page 12: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

So what can you ask for 4096?

How: Can supply values to act as parameters for tests, or

filters for querying past data, including tool name. Uses param specific tags or list of parameters:

<remoteParamList>-a –b 10 -c</remoteParamList>

Possible to set ranges for parameters…<tcpBufferSize range=“max”>4194304</tcpBufferSize><tcpBufferSize range=“min”>1048576</tcpBufferSize>

…and orders of preference. Unspecified params use receiving system’s defaults Can request reporting of actual param values used Can control whether a test is ever run

<tcpBufferSize>4194304</tcpBufferSize> <tcpBufferSize>1048576</tcpBufferSize>

Page 13: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

Is that all GGF is doing?

No, GGF Grid High Performance Networking Research Group also hard at work, modelling the network as a Grid resource so they can perform “advance reservation” etc.

Computing, storage and interconnecting network are all resources: Easier to manage

All can be reservedCapability discoveryExploit commonalities

Forms integrated stack

computing

middleware

Grid applications

network storage

“advance reservation”

Page 14: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

The network as a resource

To be achieved with set of network sub-services forming holistic network service.

Can't say more as this probably going to change quite a lot.

Want to know more? Then get involved!

Page 15: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

Network monitoring service Historic measurement data Predictions Allow clients to run scheduled tests On-demand (real-time) tests Provide less-frequently monitored information (network route, topology…) Event notifications, for all of the above Across multiple administrative domains for all of the above

NetworkMonitoring

Service

Network domain Y

Grid Middleware

Grid Applications

AutomatedTest

Systems

GOC/NOCAdmin

Software

Grid/NetOperations

OtherNetwork Services

OtherNetwork Services

OtherNetwork Services

Network Monitoring

Service

Network domain Z

Network Monitoring

Service

Network domain X

Diagram shows potential clients:

numerous and varied

Page 16: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

Will this be easy?

Probably not, but like all good car salespeople, I won’t tell you about the problems.

But the potential benefits are worth the effort!

Page 17: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

Conclusion

Grid network monitoring crucial to the Grid But you all know that already!

GHPN: looking at network services, inc. monitoring service

NM-WG: looking at how to interface to network monitoring services

Ambitious, but potential benefits justify efforts!

JRA4 SHOULD be involved!

Page 18: Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork, April 2004 Network Monitoring: The GGF Perspective

Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)

? ? ? ? [email protected] [email protected]

GET INVOLVED!http://www-didc.lbl.gov/NMWG/

http://forge.gridforum.org/projects/ghpn-rg

The End