44
CG-3.3-SRS-0012 PUBLIC 1 / 44 T ASK 3.3 G RID M ONITORING S OFTWARE R EQUIREMENTS S PECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS Work package: WP3 New Grid Services and Tools Partner(s): TCD, CYFRONET, ICM Lead Partner: TCD Config ID: CG-3.3-SRS-0012 Document classification: PUBLIC Abstract: This document specifies the software requirements for CrossGrid Task 3.3 ‘Grid Monitoring’.

T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

CG-3.3-SRS-0012 PUBLIC 1 / 44

T A S K 3 . 3 G R I D M O N I T O R I N G

S O F T W A R E R E Q U I R E M E N T S

S P E C I F I C A T I O N

Task 3.3 Grid Monitoring

Document Filename: CG-3.3-DOC-0001-SRS

Work package: WP3 New Grid Services and Tools

Partner(s): TCD, CYFRONET, ICM

Lead Partner: TCD

Config ID: CG-3.3-SRS-0012

Document classification: PUBLIC

Abstract: This document specifies the software requirements for CrossGrid Task 3.3 ‘GridMonitoring’.

Page 2: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

TASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 2 / 44

Delivery Slip

Name Partner Date Signature

From

Verified by

Approved by

Document Log

Version Date Summary of changes Author

0-1-1 9 May 2002 First Issue

Brian Coghlan, Stuart Kenny, Bartosz Balis,*U]HJRU]�0DMND��%DUWRV]�àDZQLF]HN�Joanna�.RVL VND��.U]\V]WRI�=LHOL VNL�6áDZRPLU�=LHOL VNL

0-1-2 31 May 2002

Addition of use cases,addition of requirementstable, removal of part ofappendix, rewording

Brian Coghlan, Stuart Kenny, Bartosz Balis,*U]HJRU]�0DMND��%DUWRV]�àDZQLF]HN�Joanna�.RVL VND��.U]\V]WRI�=LHOL VNL�6áDZRPLU�=LHOL VNL

Page 3: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

TASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 3 / 44

CONTENTS

1. INTRODUCTION............................................................................................................................................. 4

1.1. PURPOSE....................................................................................................................................................... 41.2. SCOPE........................................................................................................................................................... 41.3. DEFINITIONS, ACRONYMS, AND ABBREVIATIONS.......................................................................................... 51.4. REFERENCES............................................................................................................................... .................. 61.5. OVERVIEW.................................................................................................................................................... 6

2. OVERALL DESCRIPTION ............................................................................................................................ 7

2.1. PRODUCT PERSPECTIVE ................................................................................................................................ 72.1.1. System interfaces ................................................................................................................................ 102.1.2. User interfaces ................................................................................................................................... 142.1.3. Hardware interfaces........................................................................................................................... 142.1.4. Software interfaces ............................................................................................................................. 142.1.5. Communications interfaces ................................................................................................................ 162.1.6. Memory constraints ............................................................................................................................ 172.1.7. Operations.......................................................................................................................................... 172.1.8. Site adaptation requirements.............................................................................................................. 18

2.2. PRODUCT FUNCTIONS ................................................................................................................................. 182.2.1. APPLICATION MONITORING (OCM-G)......................................................................................... 182.2.2. NON-INVASIVE MONITORING (SANTA-G) .................................................................................... 192.2.3. Jiro-BASED MONITORING............................................................................................................... 20

2.3. USER CHARACTERISTICS............................................................................................................................. 212.4. CONSTRAINTS............................................................................................................................................. 212.5. ASSUMPTIONS AND DEPENDENCIES ............................................................................................................ 222.6. APPORTIONING OF REQUIREMENTS............................................................................................................. 22

3. SPECIFIC REQUIREMENTS ...................................................................................................................... 23

3.1. EXTERNAL INTERFACES.............................................................................................................................. 233.2. FUNCTIONS................................................................................................................................................. 25

3.2.1. OCM-G FUNCTIONS ........................................................................................................................ 253.2.2. SANTA-G FUNCTIONS ..................................................................................................................... 273.2.3. Jiro-BASED MONITORING FUNCTIONS........................................................................................ 30

3.3. PERFORMANCE REQUIREMENTS.................................................................................................................. 333.4. LOGICAL DATABASE REQUIREMENTS.......................................................................................................... 343.5. DESIGN CONSTRAINTS ................................................................................................................................ 343.6. STANDARDS COMPLIANCE .......................................................................................................................... 343.7. SOFTWARE SYSTEM ATTRIBUTES ................................................................................................................ 34

4. APPENDIXES ................................................................................................................................................. 35

4.1. API............................................................................................................................................................. 354.2. R-GMA...................................................................................................................................................... 35

4.2.1. R-GMA ARCHITECTURE.................................................................................................................. 354.2.2. R-GMA PROTOCOLS ........................................................................................................................ 374.2.3. R-GMA IMPLEMENTATION............................................................................................................. 37

4.3. JIRO ............................................................................................................................................................ 384.3.1. Jini BINDING MECHANISM............................................................................................................. 414.3.2. RULE ENGINE................................................................................................................................... 414.3.3. INTERFACES..................................................................................................................................... 42

5. INDEX.............................................................................................................................................................. 44

Page 4: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

IntroductionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 4 / 44

1. INTRODUCTION

1.1. PURPOSE

This document specifies the software requirements for CrossGrid Task 3.3 ‘Grid Monitoring’,including those for the OCM-G monitoring system for Grid applications, the additional services fornon-invasive monitoring, and the Jiro-based services for Grid-infrastructure monitoring. The intendedaudience is both the Task itself and dependent tasks.

1.2. SCOPE

A state of the art review for Grid monitoring has already been carried out by MTA-SZTAKI,Budapest. This can be obtained from:

http://web.datagrid.cnr.it/pls/portal30/docs/1445.pdf

This task will extend the Grid information system content to include three of the major sources ofperformance data: applications, instruments and infrastructure.

The products of Task 3.3 are:

(a) an OMIS-based application monitoring system, OCM-G,

(b) additional services, SANTA-G, for ad-hoc non-invasive monitoring, and

(c) Jiro-based services for Grid-infrastructure monitoring.

OCM-G is a distributed monitoring system for obtaining information on and manipulating paralleldistributed applications. The purpose of this system is to provide a basis for building tools supportingparallel application development. The benefit of using it is that it constitutes an autonomousmonitoring infrastructure accessible via a standardised interface, on top of which various tools can bebased. This approach provides abstraction and increases modularity – the tools themselves can bedeveloped independently from the monitoring system.

SANTA-G services are a specialized non-invasive complement to other more intrusive monitoringservices. The application of these services will be in validation and calibration of both intrusivemonitoring systems and systemic models, and also for performance analysis. The objectives are toallow information captured by external monitoring instruments to be introduced into the Gridinformation system, and to support analysis of performance using this information. The benefits arethat other users of the Grid information system can then access this detailed performance data.

The Jiro-based services for Grid-infrastructure monitoring are intelligent components for obtaininginformation from and manipulating Grid hardware devices. The application of the software is to gatherinformation from hardware devices, make autonomous decisions based on this information, and takenecessary actions. The objectives are to allow the user to specify desirable logic for managinghardware. The benefits are that management effort is transferred (partly) from the user to the system.

Page 5: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

IntroductionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 5 / 44

The addition of these three components will greatly expand the quality and quantity of the Gridinformation system content.

1.3. DEFINITIONS, ACRONYMS, AND ABBREVIATIONS

CommonRules IBM rule engine

CrossGrid The EU CrossGrid Project IST-2001-32243

DataGrid The EU DataGrid Project IST-2000-25182

EJB Enterprise Java Beans

FMA Federated Management Architecture

G-PM Grid Enabled Performance Measurement Tool

GUI Graphical User Interface

HTTP Hypertext transport protocol

HTTPS Secure hypertext transport protocol

J2EE Java2 Enterprise Edition

Java™ Rule Engine API Java runtime API for rule engines (JSR 94)

JDMK Java Dynamic Management Kit: the implementation of the JMXspecification plus additional features.

Jiro SUN Jiro, Implementation of the FMA specification

JMS Java Message Service

JMX Java Management Extension

MIB Management Information Base

OCM-G Grid-enabled OMIS-Compliant Monitor

OGSA Open Grid Services Architecture

OMIS On-line Monitoring Interface Specification

RDBMS Relational Database Management System

R-GMA DataGrid relational Grid monitoring architecture

RMI Remote Method Invocation

SANTA-G Grid-enabled System Area Network Trace Analysis

SNMP Simple Network Management Protocol

SOAP Simple Object Access Protocol

SQL Structured query language

WBEM Web-based Enterprise Management

XML Extensible markup language

Page 6: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

IntroductionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 6 / 44

1.4. REFERENCES

CommonRules available at http://alphaworks.ibm.com/CrossGrid CrossGrid Project Technical Annex CROSSGRIDANNEX1_V0.1.DOCDataGrid DataGrid Project Technical Annex DataGridPart_B_V2_51.docEthernet IEEE-802.11HTTP IETF RFC 1945Jiro http://www.jiro.com/OGSA The Physiology of the Grid: An Open Grid Services Architecture for Distributed

Systems Integration. I. Foster, C. Kesselman, J. Nick, S. Tuecke, January 2002.

http://www.globus.org/research/papers/ogsa.pdf

OMIS OMIS – On-line Monitoring Interface Specification. Version 2.0. Lehrstuhl fürRechnertechnik und Rechnerorganisation Institut für Informatik (LRR-TUM),Technische Universität München.

http://wwwbode.informatik.tu-muenchen.de/~omis/R-GMA DataGrid Project Deliverable 3.2 DataGrid-03-D3.2-0101-1-0

http://hepunx.rl.ac/edg/wp3/documentation/

SQL ANSI SQL 99 Standard

Task2.4 SRS Task2.4 Interactive and semiautomatic performance evaluation tools

CG-2.4-DOC-0001-1-0-DRAFT-A

TCPDump http://www.tcpdump.org/

XML Fallside, D.C. XML Schema Part O: Primer. W3C, Recommendation, 2001,

http://www.w3.org/TR/xmlschema-0/

1.5. OVERVIEW

This document provides the software requirements for the OCM-G, SANTA-G and Jiro-basedmonitoring services.

Section 2 provides the overall description of the components of Task 3.3. Section 2.1.1 contains apreliminary requirement analysis table for the main interfaces to the Grid Monitoring services.

Section 3 provides a more detailed description. Simple use cases are presented in Section 3.2.

Section 4 provides appendices of relevant information

Where necessary some sections have been divided into three separate parts, each providing therequirements for one of the Task 3.3 components, i.e. OCM-G, SANTA-G and Jiro.

We actively encourage those unfamiliar with Grid information systems to read Appendices 4.2 and 4.3and browse [OMIS] before continuing further.

Page 7: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 7 / 44

2. OVERALL DESCRIPTION

2.1. PRODUCT PERSPECTIVE

The Grid monitoring services and tools provide information to all dependent subsystems. Thesesubsystems use this information to establish the current and past state of the Grid. This may be toadapt their behaviour accordingly, or to predict future state, or any other similar functionality. Themonitoring services and tools gather the information from its sources, and as such are dependent onwhatever subsystems host those sources.

THE MONITORING SYSTEM

The monitoring system to be designed within Task 3.3 will provide information from the three majorsources of performance data: applications, instruments, and infrastructure.

Application information will be obtained by OCM-G. Specialised application monitors embedded inthe application address space will provide dynamic application data, such as lists of runningprocesses, CPU loads etc. It will also allow for the manipulation of applications, starting and stoppingprocesses, etc. The primary user of this data will be Task 2.4 within Workpackage 2. Task 2.4 isdeveloping ‘Interactive and semiautomatic performance evaluation tools’. They require access todynamic application data in order to accurately predict future performance and to provideperformance measurements. This data will also be used by Task 3.2, responsible for ‘Grid resourcemanagement’, in order to predict near future requirements and to guide scheduling actions.

Monitors embedded in the application address space will consume some of the host system resources.This can obviously affect the monitoring data obtained. A monitoring system that does not affect thesystem under study is referred to as a non-invasive monitoring system. SANTA-G is such a system.For example, communication between nodes in a system is done by sending data encoded as smallblocks of data. These blocks of data are referred to as packets. By tracing (copying the data containedin the packets) from the network using instruments, monitoring data can be obtained without causingany perturbation to the data. This packet data can then be persistently stored in a database andassociated with a time stamp that reflects the time it was traced from the network. The addition of thetime stamp means the trace data now accurately reflects the true temporal behaviour of the systemunder study. By using this approach SANTA-G can be used for ad-hoc experiments, which monitorthe behaviour of manufacturer’s equipment from the outside. The primary use of this data will beinternally in Task 3.3. The data will be used to both validate and calibrate the OCM-G system. Thedata can also be used by Task 2.4 to determine the level of perturbation within the invasivelymonitored data.

The management of extremely large networks is difficult, and becomes increasingly so as the networkexpands. As new devices are added, disk arrays, servers, databases, or routers, managementapplications must be altered to recognise them. Jiro is a new technology that aims to simply this byproviding intelligent management services for networked devices. These intelligent services enabledistributed resources to monitor themselves and should it become necessary, carry out correctiveactions. Within Task 3.3, Jiro components, such as the logging service, will be used to provideinformation on Grid infrastructure components, such as routers, switches (via SNMP), andcomputers (via kernel interfaces). In this way, as opposed to SANTA-G, Jiro will allow the

Page 8: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 8 / 44

monitoring of manufacturers equipment from the inside, i.e. providing access to ‘built-in’ information,such as that supplied by SNMP.

OCM-G

The OCM-G is able to gather information from user applications and provide it to higher levelsoftware components, typically tools such as performance analysis ones. Parts of OCM-G are stand-alone components, while other parts reside directly in the application address space. Some parts ofOCM-G have rather specialised performance requirements, and may have to be completelyindependent of other parts of the information system.

OMISInterface

PerformanceTools

OMISInterface

Application ApplicationMonitor

OMIS Control/Data

LocalMonitor

PerformanceData Storage

ServiceManager

Performance Data

Figure 2.1.1 OCM-G

Figure 2.1.1 shows the monitoring environment. The OCM-G parts are the Service Manager, LocalMonitor, and Application Monitor. A service manager is the part of OCM-G to which tools can submitrequests and from which they receive replies. Local monitors reside on each node of the target system.A local monitor receives OMIS requests from a service manager and its task is to execute the requestand pass the reply back to the service manager. An application monitor is embedded in each process ofthe monitored application. Its role is to execute performance-critical requests directly in theapplication context and buffer performance data before passing it to higher layers.

OCM-G shall interoperate with the Grid information system. At this point it is not clear what kind ofinteroperability this should be. OCM-G will probably use the information system to manage the start-up of the monitoring infrastructure. OCM-G might also be enabled to put the monitoring data into theinformation system if a client demands it. This can be useful for tools that perform statisticalobservations and need historic data (e.g. ‘what files have been most frequently accessed by anapplication during its many executions’).

SANTA-G

SANTA-G is specifically intended to introduce information captured by external monitoringinstruments into the Grid information system, and to avail of the information system for subsequentperformance analysis. All of the SANTA-G services are only dependent on the information system.

Invasive monitoring tools (such as OCM-G) have certain drawbacks, in particular the impact that theyhave on the system they are measuring (Invasive tools will consume some of the resources they aremeant to be measuring), known as perturbations. These perturbations are caused by factors such as theexecution of additional instructions and the additional operating system overhead. It is necessary to beable to both measure the level of these perturbations in the monitored data, and to minimise them. In

Page 9: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 9 / 44

order to do this a method is required to obtain monitoring data which is free from perturbations. Non-invasive monitoring techniques do not impact the system under study, and therefore the data obtainedusing these techniques can be used to validate the monitoring data obtained by the invasive tool. Inthis way non-invasive monitoring and invasive monitoring tools complement each other and arefrequently used together. The non-invasive data will also provide a means of measuring the level ofperturbation in the invasive data and this can then be used to calibrate the extent of perturbation by theinvasive tool and to minimise the perturbation. This yields a more scientifically sound monitoringsystem.

OC M -G In va s ive D a t a

S A N TA -G N o n -in va s ive D a ta

V a lid a t io nC a lib ra t io n

Figure 2.1.2 SANTA-G and OCM-G Interaction

The non-invasive monitoring information provided by SANTA-G is specifically intended for thevalidation and calibration of the invasive monitoring tools (such as OCM-G). This interaction betweenSANTA-G and OCM-G is illustrated in Figure 2.1.2. The data can also be used in the validation ofsystemic models as well as in the generation of realistic workloads to facilitate modelling andsimulation activities.

Jiro-BASED MONITORING

Jiro, which is a Java language-based implementation of the Federated Management Architecture, isdesigned for enterprise-sized applications, and supports network management protocols such asSNMP. Thus it seems to be an appropriate approach to the network-related issues within Gridmonitoring. Jiro components, such as the Logging Service, will be used for infrastructure monitoring,i.e. collecting information about user and node activity, network and node loads and networkconfiguration. Therefore this approach will complement the other Grid monitoring services. Animportant feature of the approach is that the management components can communicate with managedobjects using SNMP, WBEM or some proprietary protocols. This enables one to create a solution thatcan leverage the existing ‘legacy’ systems without a need to convert them to Java, so the proposedsolution is flexible and extensible.

This monitoring system will focus on open standard and Java component technology and will exploitadvanced concepts such as Jiro, Jini, EJB, JMX, JMS and Java Rule Engine. It will exploit XML forcomponent configuration only. The main concern of the system is interoperability with wellestablished industry monitoring systems based on SNMP. This interoperability will be achieved vialow level interfaces to save performance and to provide low granularity of monitoring operations. The

Page 10: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 10 / 44

enterprise level interoperability between OMIS-based application monitoring system and othersystems will be provided by exporting the constructed system functionality in the form of WebServices. Such an approach should be consistent with OGSA-compliant architectures.

The Jiro-based Grid monitoring system will be built above existing open industry standards related toJava and will not exploit any Globus 2.0 components. It will be designed with the already defined gridmonitoring system requirements in mind, but it will be implemented as a software framework, whichwill be both easy to further enhance and customise (for example, different output formats or differentcommunication protocols). Initially the Jiro-based monitoring will be totally independent of the OCM-G/SANTA-G/R-GMA Grid information system. In the future it may be interconnected with the Gridinformation System, perhaps in an OGSA-compliant fashion, or even reflected into it in much thesame way as MDS currently is reflected into R-GMA; this has yet to be determined. The Jiro systemarchitecture is summarized in Appendix 4.3.

2.1.1. System interfaces

The main system interfaces are to the:• Grid resource management component, Task 3.2• Optimisation of DataAccess component, Task 3.4• Performance evaluation tools, Task 2.4•

From the CrossGrid Technical Annex, the interfaces involving Task 3.3 and other Workpackages areconfined to interactions with Workpackage 2 as in Figure 2.1.3.

M PIverification

(2.2)

Benchm arks

(2.3)

Applications (W P1)executing on

G rid testbed (W P4)

Perform ance analysis (2.4)

Autom aticanalysis

Perform ance

m easurem ent

Analyticalm odel

Visualization

Application

Source

Code

GridM onitoring

(3.3)

Not now needed

Figure 2.1.3 Interfaces between Task 3.3 and other WPs as per the Technical Annex

Again from the CrossGrid Technical Annex, the interfaces between Task 3.3 and the remainder ofWorkpackage 3 are as in Figure 2.1.4.

Page 11: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 11 / 44

WP3Portals(3.1)

Roaming Access(3.1)

Grid ResourceManagement

(3.2)

Optimisation ofData Access

(3.4)

Tests andIntegration

(3.5)

ApplicationsWP1

End Users

WP1, WP2, WP5TestbedWP4

Performanceevaluation tools

(2.4)

GridMonitoring

(3.3)

Figure 2.1.4 Internal WP3 interfaces as per the Technical Annex

Figure 2.1.5 summarises the known internal interfaces of Task 3.3.

WP3Grid ResourceManagement

(3.2)

*ULG0RQLWRULQJ

�����

Optimisation ofData Access

(3.4)

Performanceevaluation tools

(2.4)

input

input

input

controlresult

result

result

WP2

Figure 2.1.5 Internal interfaces of Task 3.3 as per the Technical Annex

The following table shows a preliminary requirement analysis for Task 3.3’s main interfaces. It showsthe software requirement’s ID, the requirement’s description, and the source from which therequirement will be satisfied. The requirement description is generally a specific type of information

Page 12: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 12 / 44

required, or in some cases a description of specific functionality required. The source column, namesthe source which will satisfy the requirement, ie: the source from which the information will be mosteasily obtained: either the Globus MDS, the DataGrid R-GMA, unique Task 3.3 sources, or, in somecases, the task which will provide the necessary functionality or information.

ID Requirement Source

SR1.0 Task 2.4 Performance evaluation tools

SR1.1 Data transfers R-GMA

SR1.2 Synchronisation delay T3.3

SR1.3 I/O delay R-GMA

SR1.4 CPU utilization R-GMA

SR1.5 Network utilisation R-GMA

SR1.6 Storage utilisation MDS/R-GMA

SR1.7 Raw data [for automatic performance analysis] T3.3(OCM-G/SANTA-G)

SR2 Task 3.2 Grid Resource Management

SR2.1 Heterogeneity MDS

SR2.2 Load Average R-GMA

SR2.3 Availability MDS

SR2.4 Location of data files RC/MDS

SR2.5 Availability of replicas RC/MDS

SR2.6 Network delays MDS/R-GMA

SR2.7 Static application characteristics JDL

SR2.8 Dynamic application characteristics T3.3(OCM-G)

SR2.9 Architecture MDS

SR2.10 OS MDS

SR2.11 Static memory config MDS

SR2.12 Dynamic memory config R-GMA/T3.3

SR2.13 Resident libraries MDS

SR2.14 Required libraries JDL

SR2.15 Invoked shared libraries T3.3

SR2.16 Other infrastructure T3.3

SR2.17 Configuration T3.3

SR2.18 Extensibility

It must be easy to extend the monitoring system

to include new types of information as required.

T3.3

SR2.19 “Single Entry Point”

A single entry point to query information from the

monitoring system.

T3.3

SR2.20 Quick update of information

Updates of information should be quick (e.g. if a

machine changes its state and is no longer available)

T3.3

Page 13: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 13 / 44

SR3.0 Task 3.4 Optimisation of data access

SR3.1 Current user state T3.3

SR3.2 I/O information R-GMA/T3.3

SR3.3 Network information R-GMA

SR3.4 MSMS load T3.3

SR3.5 Queue length T3.3

SR3.6 Number of drives MDS

SR3.7 Drive throughput T3.3

SR3.8 File size T3.3

SR4.0 Task3.3 Grid Monitoring

SR4.1 Initial location of processes (to locate processes) T3.2

SR4.2 Migration decisions (to locate monitored processes) T3.2

SR4.3 Non-invasive trace data

Required for calibration and validation of invasive toolset (OCM-G)

T3.3(SANTA-G)

SR5 Task 4.2 Integration with DataGrid

SR5.1 Linux Red Hat 7.2

The recommended future platform for CrossGrid is

Linux Red Hat 7.2.

SR5.2 RPM v.4

The recommended software distribution tool is RPM v4.

SR5.3 GNUMake

GNUMake as a make tool.

SR5.4 CVS

CVS as a code versioning system.

SR5.5 UML

The UML as a modeling language.

SR5.6 CMT, SCRAM

CMT, SCRAM as a configuration management.

SR5.7 Tools for managing platform dependencies, packaging, exporting

Tools for managing platform dependencies, packaging, exporting are: autoconf,CMT, SCRAM.DAR, rpm, GRID install and export tool.

SR6 Task 5 Architecture Team

SR6.1 Heterogeneous computer and storage systems

We are working with heterogeneous computer and storage systems

SR6.2 Web Services

The architecture team expects that all new services will be compatible with WebService concept.

SR6.3 OGSA

Whole architecture must be compatible with OGSA specification.

Page 14: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 14 / 44

2.1.2. User interfaces

OCM-GN/A

SANTA-GSANTA-G data analysis tools will include a viewer GUI for Ethernet traces. This Viewer will have anumber of controls for direct or query based navigation through a trace and selection of a particulartrace or packet.

Jiro-BASED MONITORING

As we want to use the Enterprise Java Beans technology for collecting information, the natural way ofpresenting the information will be construction of some Java Server Pages or servlets, which wouldinteract with the system database.

Interfaces can be divided into two types:

• passive interfaces – interfaces that work in the query-response manner (between WebBrowserand J2EE server; between WebBrowser and MBean server)

• active interfaces – interfaces that provide notification to the user by sending events (betweenMRuleService and Application interested in grid state)

2.1.3. Hardware interfaces

Yet to be determined.

2.1.4. Software interfaces

In the context of the larger CrossGrid system, there is a primary known: CrossGrid will use the Globussoftware, and therefore the initial testbed will use the Globus MDS information system, which is dueto become obsolete at the end of 2002. CrossGrid will also use the DataGrid software, and hence theDataGrid R-GMA information system.

From the requirements table in Section 2.1.1 it can be seen that a significant proportion of theinformation required by the information users is already available via the Globus MDS or DataGrid R-GMA. For example, the DataGrid MDS schema definition for a computing element includes:

CEId :: single-valued, cis,{The identifier of the CE}

GlobusResourceContactString :: single-valued, cis,{The Globus resource contact string}

GRAMVersion :: single-valued, cis,{The GRAM version}

Architecture :: single-valued, cis,{The architecture of the hosts composing the CE}

OpSys :: single-valued, cis,{The operating system of the hosts composing the CE}

MinPhysicalMemory :: single-valued, integer, {The minimum value of the physical memory among the hosts associated to the CE} MinLocalDiskSpace :: single-valued, integer,

{The minimum local disk footprint}

Page 15: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 15 / 44

TotalCPUs :: single-valued, integer,{The number of total processors associated to the CE}

FreeCPUs :: single-valued, integer,{The number of free processors}

NumSMPs :: single-valued, integer,{The number of SMP hosts}

MinSPUProcessors :: single-valued, integer,{The minimum number of SPU processors (for SMP hosts)}

MaxSPUProcessors :: single-valued, integer,{The maximum number of SPU processors (for SMP hosts)}

TotalJobs :: single-valued, integer,{The number of jobs submitted to the CE}

RunningJobs :: single-valued, integer,{The number of currently running jobs submitted to the CE}

.. and so on ..

DataGrid have expended a lot of effort in formulating their MDS schema and populating it for theirTestbed 1 (DG-TB1), and in creating a compatible R-GMA (that can also access the base MDSinformation) that will become the primary information system for their Testbed 2 (DG-TB2).Replication of this represents a waste of time and effort.

Task 3.3 instead proposes three ways to create more information about the current state of the Grid.

The first (Task 3.3.1) will monitor Grid applications at runtime in an OMIS compliant manner,principally for use at runtime by OMIS compliant tools from Workpackage 2, but also for archival of aselected subset for subsequent post-processing for trend analysis and pattern recognition (therebycreate new prediction result information). The second (Task 3.3.2) will non-invasively monitor Gridcomponents, and create a relational trace database of this information, again OMIS compliant, andagain for post-processing. A final activity (Task 3.3.3) will monitor infrastructure components usingJiro technology.

These flows will supplement the existing information system content rather than involve design of anew information system, i.e. the effort will concentrate on adding value. For the MDS, an appropriateextension to the DataGrid MDS schema will need to be formulated and populated with details of theCrossGrid Testbed 1 (CG-TB1). This should also be done for the R-GMA, again for CG-TB1.

The consequence of this is that, at least for CG-TB1, a significant proportion of the informationrequired by the other tasks and workpackages can then be satisfied by the existing informationsystems. A further consequence is that these information systems become available for use by the newinformation sources. Where useful OMIS, result and trace information will be routed through them,and result and trace information can use the persistent database provisions of the R-GMA. The Jiroinformation might be reflected into them, in much the same way as MDS currently is reflected into R-GMA. The resulting information flows will be as per Figure 2.1.6:

Page 16: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 16 / 44

Figure 2.1.6 Information system to be used within Task 3.3.

The interface to R-GMA is well documented in the DataGrid Information and Monitoring (WP3)Architecture Report. See Appendix 4.3 for a summary.

Task 3.3, in general, will make use of Globus, and SANTA-G, in particular, will also make use ofTCPDump/Libpcap. This program is used to capture packets from the network which match a givenexpression. It will be used to create the raw trace files for Ethernet traces.

2.1.5. Communications interfaces

OCM-G

OCM-G components will communicate via secure sockets using either the globus_io interface orsystem socket interface with GSS-API for authentication (and possibly encryption). Interoperabilitywith R-GMA is also planned which involves communication with RGMA servlets.

The R-GMA makes use of servlet technology, which communicates using HTTP. Refer to theDataGrid R-GMA Architecture document for more information.

SANTA-G

SANTA-G will communicate with R-GMA using the R-GMA protocols (i.e. using HTTP).

Jiro-BASED MONITORING

Other applications will be able to submit queries by contacting a dedicated EJB (this interface is to bedeveloped). Jiro provides a standardised interface (JDMK/JMX) to the managed resources. Thecommunication scheme, which may use a variety of protocols, is depicted in Figure 2.1.7.

WP3 JiroServices

(3.3.3-ICM)

InformationSystem

OMISService Manager

+ Perf Tools(3.3.1-CYFRO)

result

input

input

Non-invasiveMonitoring(3.3.2-TCD)

input

MDS(Globus)

OMISApplication Monitor

+ Local Monitor(3.3.1-CYFRO)

Jiro info

OMIS infoApplications

Instruments

Infrastructure

input

ctrl/data

PerformanceInformation

Post-processing(3.3.1-ICM)

input

resultdB

Page 17: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.1 Product Perspective

CG-3.3-SRS-0012 PUBLIC 17 / 44

Figure 2.1.7 Standard JMX communication scheme

2.1.6. Memory constraints

The only memory constraints known at this time are applicable to SANTA-G and relate to secondarystorage. There must be sufficient secondary storage available to store the binary trace files generatedby TCPDump.

2.1.7. Operations

OCM-G

There are three primary operations that can be performed in interaction with OCM-G:

• Init open connection to the monitoring system

• Request send OMIS request

• Finalize close connection to the monitoring system

These operations will be available to clients by means of an appropriate API.

SANTA-G

The operations that can be performed in interaction with SANTA-G are yet to be determined.

Jiro-BASED MONITORING

The operations that can be performed in interaction with Jiro-based monitoring are yet to bedetermined.

Page 18: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.2 Product Functions

CG-3.3-SRS-0012 PUBLIC 18 / 44

2.1.8. Site adaptation requirements

For OCM-G, and SANTA-G, a nearby host must have the DataGrid R-GMA software installed andconfigured. For Jiro-based monitoring a nearby host must have the Jiro services installed.

2.2. PRODUCT FUNCTIONS

This section assumes the primary existing information system is the DataGrid R-GMA (see Section2.5). This assumption should be re-evaluated over time as new Grid information systems becomeavailable.

2.2.1. APPLICATION MONITORING (OCM-G)

OCM-G (Task 3.3.1) will monitor Grid applications at runtime, principally for use at runtime by thetools from Workpackage 2, which assumes the application-level monitoring environment to bedeveloped will comprise an autonomous monitoring system. Efficiency of application monitoring willbe ensured by specialised application monitors, embedded in the application address space, and byefficient performance-data storage. The issue of scalability will be addressed by distributing themonitoring system into service managers (an intermediate layer between tools and local monitors),local monitors (for direct control of local application processes), and application monitors (for localhandling of performance-critical actions in co-operation with local monitors). The OMIS specificationwill be used to build the communication layers from tools-to-service managers and service managers-to-local monitors. The system will be extendible to provide new, additional functionality. The controland information flows will be separated; control will be exercised over channels built via Globussockets (with GSI security), whilst data will flow via the R-GMA. This will be implemented as shownin Figure 2.2.1.

RGMAConsumer

API

OMISInterface

OMISInterface

Application ApplicationMonitor

Synchronous Control/Data via Globus sockets

PerformanceInformation

Post-processing(3.3.1-ICM)

Invasive Monitoring

LocalMonitor

RGMAProducer

API

ProducerServlet

PerformanceData Storage

ConsumerServlet(s)

ArchiverServlet

ConsumerServlet

RGMAArchiver

API

ServiceManager

PerformanceTools

Asynchronous Information via R-GMA

dB

Figure 2.2.1 OCM-G implementation

Page 19: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.2 Product Functions

CG-3.3-SRS-0012 PUBLIC 19 / 44

Note that there may be many Local instances interacting with a single Service Manager, and there maybe many Service Managers. Note also that information from the Application Monitor can be stored inlogfiles in the Performance Data Storage, and subsequently retrieved via the Local Monitor by issuingappropriate SQL queries to the Producer. Synchronization of control and data will be guaranteed byensuring the control and data flow through the one executing process, which has both Globus socketand R-GMA interfaces.

Another activity in this subtask will archive a selected subset of this information and post-process itfor trend analysis and pattern recognition, and thereby create new prediction result information. Hereone or more Consumer servlets will be used by an Archiver servlet to gather selected information thatthe Application Monitor has stored in the Performance Data Storage, where again the selection can bespecified via SQL queries (again see Figure 2.2.1). An Archiver has a built-in DBProducer servlet tostore the information persistently using a RDBMS, such as MySQL. The post-processing applicationcould then take its input from the DBProducer and returns its results to new tables in the same (oranother) DBProducer.

The OCM-G will accept all requests compliant with OMIS 2.0. This includes two types of requests:unconditional and conditional.

The unconditional requests have a single effect on the target system and yield an immediate response.They are composed of one or more actions which specify either information to be returned ormanipulations to be performed. For example request stop process p_1 will attempt to stop processidentified as p_1, while request return process list for node n_1 will return a list of attached (seeSection 3.2) processes residing on node identified as n_1. All requests return a status information(whether the request succeeded or not).

The conditional requests specify an event and a list of actions, which shall be executed whenever theevent occurs. Thus, responses to conditional requests can be produced multiple times, any time thespecified event takes place. Example: when function X is invoked, return the time stamp of this event.

For a detailed description of OMIS services, syntax and semantics of requests and specification of dataformat returned from the monitoring system refer to [OMIS].

2.2.2. NON-INVASIVE MONITORING (SANTA-G)

SANTA-G will non-invasively monitor Grid components using software, and create a relational tracedatabase of this information. There are three main functions associated with this activity (see Figure2.2.2):

1. Allow a user to initiate non-invasive tracing of grid resources, by using open source software,specifically TCPDump. Collect the trace data, and provide access to the data through the Gridinformation system.

2. Allow a user to select the required subset of trace data, by way of the Grid information system,persistently store this subset of data in a relational trace database and provide access to thisstored data for further analyses.

Page 20: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.2 Product Functions

CG-3.3-SRS-0012 PUBLIC 20 / 44

3. Provide the information required by dependent subsystems within the Grid services and ToolsSystem, i.e.: Task 3.2, and Task 3.4, and to external subsystems, i.e.: Task 2.4.

SANTA-G Trace Tools

Grid Information System

Task 3.2 Task 3.4 Task 2.4

SANTA-G Viewer

Trace Database

Function 2: Provide information to dependent tasks.

Functi on1: Collect and publ ish trace data.

Function 3: All ow user to col lect and view trace data.

Figure 2.2.2 SANTA-G Functions

Further functions will provide performance analysis post-processing. Interaction with OCM-G will bevia an OMIS-compliant interface, supporting a subset of the functionality described in Section 2.2.1.

2.2.3. Jiro-BASED MONITORING

The Jiro-based monitoring system will be built as a three-tier architecture consisting of aninstrumentation layer, a monitoring service layer, and a management application. As the dominatingimplementation software technology in the CrossGrid project is Java, a natural choice for the Jiroinstrumentation and monitoring service layers structuralisation is JMX (Java Management Extension).

The instrumentation layer will be built of Mbeans (Management beans) registered with MbeanServer.The communication with management systems will be performed by connectors using RMI, HTTPprotocols or JMS. It is assumed that synchronous and asynchronous communication modes will beprovided by the MbeanServer.

Mbean functionality will be defined with an XML-encoded file interpreted by Java Rule Engine. Suchan approach will provide easy and dynamic customisation of Mbean functionality to existingrequirements. Changes of the Mbean monitoring function will be performed by loading a new XMLconfiguration file. The XML configuration files will be stored in a dedicated configuration repository.The dynamic Mbean model will be used to obtain greater configurability.

The information collected by the MbeanServer could be stored in a SQL database. This database willbe served by an application server, running EJB components providing access to the database over

Page 21: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.3 User Characteristics

CG-3.3-SRS-0012 PUBLIC 21 / 44

JDBC. EJB components could act as MbeanServer observers or actively poll the MbeanServer to getvalue parameters. This server will also be accessible by the monitoring data analyses tools providing auser interface from the web browser. Such an approach should lead to a very flexible and scalablesystem.

The system will exploit Jini and Jiro technology to support:

• dynamic binding,

• dynamic deployment, and

• a powerful notification system.

The binding function will be supported by the Jini Lookup Service. This mechanism is illustrated inAppendix 4.3.1.

A key feature of the proposed monitoring system architecture is interoperability with existingmonitoring system such as SNMP or CNMP. The dedicated interface modules will be constructed forthis purpose. By using these modules, Mbeans will be able to get data, for instance, from SNMPagents.

The system could optionally be accessed as a web service. A dedicated gateway will be constructed toprovide access over SOAP to the monitoring system functionality.

The Jiro-based monitoring system will be constructed as a software framework; hence a newfunctionality should be easy to add. Building the system over Java component technology shouldresult in not only static, but dynamic reconfigurability of the system as well. This feature will besupported by dynamic component discovery, dynamic deployment procedures, dynamic binding, andintrospection mechanisms.

2.3. USER CHARACTERISTICS

The users will be those applications and users who need access to the CrossGrid information System.

2.4. CONSTRAINTS

The entity must have a valid Grid certificate or a derived grid proxy certificate that authenticates themas an approved Grid entity of an approved Grid Virtual Organisation.

Constraints includea) Regulatory policies …b) Hardware limitations (e.g., signal timing requirements) …c) Interfaces to other applications …d) Parallel operation …e) Audit functions …

Page 22: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Overall DescriptionTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

2.5 Assumptions

CG-3.3-SRS-0012 PUBLIC 22 / 44

f) Control functions …g) Higher-order language requirements …h) Signal handshake protocols (e.g., XON-XOFF, ACK-NACK) …i) Reliability requirements …j) Criticality of the application …k) Safety and security considerations …

2.5. ASSUMPTIONS AND DEPENDENCIES

The main assumption in this SRS is the use of RGMA by the Grid Monitoring components (seeSection 2.1.4, Section 2.2, and Appendix 4.2).

The risks are as follows. Firstly the MDS will be obsolete by the end of 2002, therefore we can onlyconsider the R-GMA in relation to the new information sources. Secondly it is likely there will bedifficulties of some form, but this applies whatever is proposed, and to be optimistic: ‘where there is awill there is a way’. Thirdly it places great reliance on the DataGrid R-GMA, but this has at least beenreleased in its first revision, DataGrid themselves are keen to assist Task 3.3, and the creation of alarger body of expert R-GMA developers is reassuring. Fourthly the initial R-GMA release is missingmany essential functions, in particular the all-important security, a substantive way of finding the bestsource of an item of information (i.e. a Mediator), a full consideration of scaling and performance, andany measure to guarantee robustness (e.g. replication); all these functions are, however, in theDataGrid development plan. Finally it renders CrossGrid hostage to DataGrid’s timetables, but at leastthose are 12-15 months further advanced.

At this time, the nett risk might be considered to be acceptable relative to the benefits. The positionshould be re-evaluated over time as new Grid information systems become available, e.g. OGSAcompliant systems (note that DataGrid plan to modify R-GMA to be OGSA compliant). For thepresent one should assume exploitation of the DataGrid R-GMA information system.

2.6. APPORTIONING OF REQUIREMENTS

Yet to be determined.

Page 23: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.1 External Interfaces

CG-3.3-SRS-0012 PUBLIC 23 / 44

3. SPECIFIC REQUIREMENTS

3.1. EXTERNAL INTERFACES

OCM-G

From the user viewpoint, inputs to OCM-G are OMIS requests; outputs are data whose semantics isspecific to a particular OMIS request. For details, refer to [OMIS].

SANTA-G

There are three main inputs to SANTA-G. The first is the user stimulus to the TCPDump software forEthernet tracing. This is the input required from the user to configure and initiate a trace of the systemunder study. These inputs will be in the form specified for TCPDump. The raw trace data, obtained byTCPDump, is stored in binary trace files. The data from these files forms the second input into the gridinformation system. The final input is in the form of SQL statements. A sensor application willintroduce the description of the data available by using SQL ‘CREATE TABLE’ statements. ForEthernet tracing this will be in the form:

CREATE TABLE PKT (

pkt_id …

file_id …

ts_s …

ts_us …

cap_len …

act_len …

);

Both the SANTA-G tools and custom clients, in the form of RGMA Consumers, will access the databy using SQL queries. For network tracing the SQL query [SELECT] will be of the form:

SELECT PKT.pkt_id,

PKT.file_id,

PKT.ts_s,

PKT.ts_us,

PKT.cap_len,

PKT.act_len

FROM PKT;

SANTA-G outputs will include the raw trace data from TCPDump. The trace data obtained byTCPDump, will be output to binary trace files. Another output will be in the form of ResultSets. Theresponse to a SQL query will be in the form of a ResultSet. The final output will be the relational tracedata. Trace data retrieved by the SANTA-G tools or a custom client may be persistently stored in arelational database (see Section 3.4)

Page 24: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.1 External Interfaces

CG-3.3-SRS-0012 PUBLIC 24 / 44

Jiro-BASED Monitoring

The Jiro agents will be able to collect (input) information about network connectivity (reachability,round-trip times) as well as about availability of particular hosts at each site, such as gatekeepers,resource brokers, computing elements and so on. Moreover, we want them to collect host-specificinformation, such as CPU loads and available storage space, which will be acquired from the operatingsystems’ kernels.

It is also intended to monitor not only computing, storage and networking hardware (i.e. workstations,PC’s, network routers, switches and other appliances with SNMP agents), but another importantinfrastructure element, which is a typical services’ activity. These services’ include (but are not limitedto): certification authorities, resource brokers, and gatekeepers.

The information collected by the system entities will be filtered in order to reduce its volume. Thefiltering activity will be defined in RuleXML and performed by the Java Rule Engine controlling theagents. Then, it shall be published (output) by some asynchronous notification mechanism, such as theJiro Event Service or Java Messaging Service. We also want to use the Jiro Logging Service forkeeping track of past events. The filtering activity will enable the site manager to control the amountof traffic between nodes where the agents operate, and nodes where the information observers operate.Web browser interfaces will also publish information about the past and current state of the Grid, andthe current state of particular resources. The following list indicates the type of information which Jirowill collect (this list is not final):

For computers:

(Solaris and Linux, some of the parameters are only available under linux)

Memory usage (free, used, total, buffers, shared, cached memory)

Swap usage (free used, total)

CPU statistics (user, system, idle time, CPU load average)

Disk space usage (on every mounted partition) (free, used, total)

Users logged on a system

Information on running (sleeping, stopped, zombie) processes

Network statistics (number of packets sent/received etc.)

Number of interrupts per IRQ

IO memory map

Registered port regions

Modules loaded by the system

Devices configured to work with a system

Machine CPU information (type, speed etc.)

File systems type

List of PCI devices on a system

System information (uptime, idle time, operating system name, host name)

For routers and switches:

All (needed) information is available via SNMP (e.g. interfaces load)

Page 25: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.2 Functions

CG-3.3-SRS-0012 PUBLIC 25 / 44

3.2. FUNCTIONS

3.2.1. OCM-G FUNCTIONS

There are two actors associated with OCM-G: monitoring tools and application processes. Thefollowing figure, Figure 3.2.1, shows these actors with their corresponding use cases.

All of the use cases relating to objects (e.g. a process), detach an object, obtain information from anobject (e.g. get process status), and manipulate an object (e.g. stop a process), can only be performedon objects attached by the ‘Attach an object’ action (i.e. a tool only sees objects that were explicitlyattached). This applies to nodes and processes. Threads are automatically visible once their ‘container’processes are attached. The actions of these object related use cases are realized by the OMISinformation services.

The ‘Detect an event’ use case detects an event, and then carries out corresponding programmedactions. These ‘actions’ are OMIS information and/or manipulation services (e.g. whenever a newprocess is created, increment a counter). A request for detecting an event is always combined withcorresponding actions.

Application processEnrol at the m onitoring system

Connec t to m onitoring sys tem

Disconnec t from m onitoring sys tem

At ta ch an objec t

Detach an objec t

Obtain objec t info

M anipulate objec t

M onitoring tool

Detec t an event

Figure 3.2.1 OCM-G Use Case Diagram

Page 26: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.2 Functions

CG-3.3-SRS-0012 PUBLIC 26 / 44

The target system shall be viewed by the OCM-G as a hierarchical set of objects. These shall includethe whole system, nodes, processes, threads, etc. For grid monitoring, OMIS is planned to be extendedwith another type of object – sites which will comprise multiple nodes. The objects will be identifiedby unique tokens, for example p_1, n_1, etc.

The OCM-G shall provide a set of monitoring services for requesting information from and doingmanipulations on an application. A typical monitoring session between a tool and the monitoringsystem shall consist of the following stages (from the tool’s viewpoint):

• Connect to the monitoring system

• Attach to objects (typically application processes) you want to monitor

• Send monitoring requests (and process returned information)

• Disconnect from the monitoring system

Thus, each tool has its associated scope of the system being observed, i.e., the set of objects it wants tomonitor. Monitoring services can be requested only on the explicitly attached objects, otherwise therequest is erroneous.

For each monitoring request, a list of applicable objects shall always be specified. For example Stopprocesses [ p_1, p_2, p_3 ].

The monitoring system shall ensure that:

• The request is distributed to each object involved

• The returned data is assembled and returned to a tool as a single reply

For example, request Return CPU load on nodes n_1, n_2, n_3 will return a list of three elementscontaining information on CPU load on nodes n_1, n_2, and n_3.

To support performance analysis of applications, OCM-G shall detect and associate actions with thefollowing two events:

1. Invocation of function X has started

2. Invocation of function X has ended

These two events shall be able to be combined with proper actions to obtain performancemeasurements such as the volume of data transferred between processes or the delay of datatransmission. Example of a complex request: When invocation of function X has started in process [p_1, p_2, p_3 ] return the token of the process in which the event took place and the time stamp of theoccurrence. This request is applied to processes identified as p_1, p_2 and p_3, whereas the result ofeach execution will be the particular process identifier (i.e. one of p_1, p_2, p_3) and a time stamp.

To support efficient performance analysis, the OCM-G shall provide:

1. The ability to buffer gathered data in the application’s address space and request it on demand.

2. Efficient data structures to store preprocessed data.

• Counters for storing information representable on integer values.

• Integrators for storing information that needs floating-point representation.

Page 27: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.2 Functions

CG-3.3-SRS-0012 PUBLIC 27 / 44

Counters and integrators shall be objects which can be created in both local (application process) andglobal (monitoring session) contexts, and will be identifiable by appropriate tokens (e.g. c_g_1,c_l_1). Various operations on these objects will be defined (e.g. Increment counter c_g_1 by n).

3.2.2. SANTA-G FUNCTIONS

Figure 3.2.2 shows the use cases, with their associated actors, for the SANTA-G tools:

Validation Software Calibration Software

Create Trace Data

View Trace Data

Publish Trace Data

User

Dependent Subtasks

Retrieve Trace Data

Invasive Tool

Figure 3.2.2 SANTA-G Use Cases

The User actor is any general user who requires access to the non-invasive trace data generated bySANTA-G. The Invasive Tool actor, however, represents a user of an invasive toolset. The Validationand Calibration Software Actors are generalisations of the Invasive Tool actor. These represent a userwho requires the non-invasive trace data in order to both validate and calibrate the invasive tools. TheDependent Tasks actor represents those tasks within CrossGrid who require data from the GridMonitoring tools. The Create Trace Data use case and the Publish Trace Data use case are expandedin the sequence diagram Figure 3.2.4. A sequence diagram showing the Retrieve Trace Data use casecan be seen in Figure 3.2.5. The remaining use case, View Trace Data, is fairly trivial and is illustratedin the sequence diagram below (Figure 3.2.3):

Page 28: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.2 Functions

CG-3.3-SRS-0012 PUBLIC 28 / 44

U s er

V i ew er

C ons um er

G rid Inform at ion S y s tem (R-G M A )

S tart V iewer

V i ew (F il e Id , P ack et Id )

c o llec t pac k et da ta

D is p lay P ac k et()

C l os e View er

Figure 3.2.3 The View Trace Data Use Case

SANTA-G services shall be based on a relational model of the acquired information, supported by theDataGrid R-GMA. See the Appendix 4.2 for a brief description of the R-GMA functionality andterminology.

A trace shall be initiated by using an unmodified version of the open source software TCPDump.These trace files may require pre-processing, which shall be carried out by software running on thenode that hosts these files. The validity of the trace data is defined by TCPDump.

These trace files shall be accessed by a sensor application, also running on that host, which shallinstantiate the necessary Canonical Producers [CP, one for each table], to represent the databasestructure. A CP shall be instantiated by specifying a SQL create table statement (see Section 3.1),which shall define the table the producer provides. Each CP shall register itself with theCanonicalProducer Servlet. This process is shown in Figure 3.2.4.

Page 29: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.2 Functions

CG-3.3-SRS-0012 PUBLIC 29 / 44

SANTA-G Trace Tools

Sensor App

CanonicalProducer

CanonicalProducer Servlet

Create Trace File

CreateProducers()

Register

publishData(traceType)

CanonicalProducer shall be created for each table required.

CanonicalProducer Servlet registers the CanonicalProducers with the R-GMA registry.

Figure 3.2.4 Function 1, Creating the trace data.

The R-GMA shall enter each CP in a R-GMA registry, which shall allow consumers to locate it. Thisaction is hidden within the R-GMA.

A SANTA tool running on another host shall then be able to use an R-GMA Archiver to acquire data(see Figure 3.2.5), by calling the Archiver API, which contacts an Archiver servlet. In the spirit of R-GMA, a SQL query statement, of the form shown in Section 3.1, shall be used to define what data is tobe acquired.

The Archiver uses one or more internal consumers, each of which contacts a R-GMA registry to locatethe CPs which hold the relevant tables. The consumers then query these CPs. Each CP in response tothis query shall contact a CanonicalProducer Query Engine, which shall run on the node that hosts thetrace files. All these shall propagate the SQL SELECT statement, or a derivative thereof. The QueryEngine shall perform seek operations on the trace files and return the data in the form of a result set tothe CP, and subsequently the Archivers’ consumer. The data shall then be inserted into the Archivers’DBProducer and persistently stored in a RDBMS database, where it can be accessed by furtherSANTA tools.

In cases where the entire logfiles are transferred, unfiltered, to the Archiver, it may be sensible toreduce this generic process to simply using a DBProducer.

Other SANTA tools shall then be able to access the trace database using their own consumer interface.Again, a SQL query (a SELECT statement) shall define what information is to be accessed calling theConsumer API, which contacts a Consumer Servlet. The servlet contacts a R-GMA registry to locatethe Archiver which holds the trace database. The consumer shall then be able to query the Archiver,which in turn queries the RDBMS database and returns the result set via the R-GMA to the SANTAtool.

Page 30: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.2 Functions

CG-3.3-SRS-0012 PUBLIC 30 / 44

Trace DB App

Archiver

CanonicalProducer Servlet

CanonicalProducer Canonical Query Engine

new Archiver()

getData

getData

getData

queryTraceFile

trace data

trace data

trace data

insert()

Archiver stores data persistently in trace database

The query engine queries the trace file for the required data.

Some internal R-GMA mechanisms have been elided for clarity.

Figure 3.2.5 Function2/3, Collecting the trace data

Thus the non-invasive trace data shall become an integral part of the Grid information system content,where it can be used to validate and calibrate the invasive monitoring mechanisms.

3.2.3. Jiro-BASED MONITORING FUNCTIONS

The Jiro-based monitoring systems consist of two layers (see Figure 3.2.6):

• Management Facade components which are Jiro services representing hardware devices

• Management Policies components which are Jiro services holding management logic

Page 31: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.2 Functions

CG-3.3-SRS-0012 PUBLIC 31 / 44

Hardware

JiroAgent

JiroAgent

JiroAgent

Management Policies

JiroAgent

JiroAgent

JiroAgent

Management Facade

Jiro

Ser

vic e

s

Figure 3.2.6 Jiro monitoring services

Management Facade services will be developed for all the types of hardware devices which exist onthe Grid. Services can interact directly with hardware or through SNMP, WBEM, JDMK or otherAPIs, such as monitoring interfaces implemented in systems’ kernels. Users can get informationdirectly from these services. Wherever possible, the system is going to interact with existing software.It will be necessary, however, to develop some low-level facilities, such as modules gatheringinformation from system kernels.

Management Policies services provide general configurable components for embedding managementlogic. Users are able to define logic using sensors, which provide information and actions. Sensors areManagement Facade services. Actions are defined in the system. Users are able to write conditions onwhich actions take place.

Below is an example of typical scenario:

• A user creates a Management Policies component with logic: “computer1.freeSpace < 100then sendEvent(‘warnings.freespace)’”, where computer1 is the name of a ManagementFacade service, freeSpace is the name of a parameter, and sendEvent is the name of an actionwhich accepts one parameter named topic. The sendEvent action sends an event using the JiroEvent Service.

• The Management Policies work on one of the Jiro Management Stations

If the free space on the disk reaches a threshold value an event is sent to the user.

Page 32: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.2 Functions

CG-3.3-SRS-0012 PUBLIC 32 / 44

To clarify the system architecture two use cases are presented (In order to focus only on the systemarchitecture some simplifications are made).

The first use case shows interaction between J2EE/SQLDatabase and Mbean (see sequence diagram inFigure 3.2.7). It shows how J2EE/SQLDatabase could obtain information directly from MBean.EBean is created in J2EE Server, which registers in MBean broadcaster to receive events. EBeanconverts events into an appropriate INSERT statement.

���&RQILJXUDWRU ���64/�'DWDEDVH

(%HDQ

0%HDQ

QHZ

VXEVFULEH

HYHQW

64/�,16(57

HYHQW64/�,16(57

Figure 3.2.7 Sequence diagram for J2EE/SQL database - MBean interaction

The second use case shows the interaction between an Application and MruleService (some notes onthe MruleService are included in Appendix 4.3). A sequence diagram for that use case (depicted inFigure 3.2.8) shows how applications interested in a grid state could obtain information usingMRuleService. Application creates MRuleService passing it RuleXML with appropriate managementlogic. MRuleAgent creates RuleEngine passing it RuleXML. MRuleAgent subscribes to all MBeansspecified during creation. Every received event is added to RuleEngine as a new fact. RuleEngine isrun on receiving each event. During RuleEngine execution information is gathered from sensors.Conclusions trigger appropriate actions.

Page 33: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.3 Performance Requirement

CG-3.3-SRS-0012 PUBLIC 33 / 44

���$SSOLFDWLRLQ

05XOH6HUYLFH

0%HDQ 0%HDQ

5XOH(QJLQH

FUHDWH�DJHQW��5XOH;0/�

QHZ��5XOH;0/�

VXEVFULEH�IRU�HYHQWV

HYHQW

DGG�IDFW��HYHQW�

UXQ

JHW�SDUDPHWHU

DFWLRQ

HYHQW

Figure 3.2.8 Sequence diagram for Application - MRuleService interaction

3.3. PERFORMANCE REQUIREMENTS

The performance requirements for the Monitoring System will be determined by the users. Thefollowing performance requirements are stated in the Task 2.4 SRS and as such will providepreliminary requirements:

• The dynamic data on applications shall be accessible within a time that is not significantlyhigher than the communication latency between the host where the monitored piece of theapplication is located and the host where the G-PM tool is executed.

• The non-critical data on Grid infrastructure shall be accessible in less than 30 seconds.

As stated these form preliminary requirements, and as such will most likely be subject to change.

Page 34: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

Specific RequirementsTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

3.4 Database Requirement

CG-3.3-SRS-0012 PUBLIC 34 / 44

3.4. LOGICAL DATABASE REQUIREMENTS

OCM-G

To be determined.

SANTA-G

When a user retrieves trace data it is stored persistently in a relational database. For an Ethernet tracedatabase the primary key is composed of FileId and PacketId. This is the case for every table.

Jiro-BASED MONITORING

To be determined.

3.5. DESIGN CONSTRAINTS

Design constraints include:

• All software must be compatible with Datagrid and Globus.

• All software must be compatible with Redhat 6.2+.

• All software should run on PCs with Intel processors, this is in order to be compatible withDataGrid testbed.

• Software should use a specific version of Globus as agreed by the CrossGrid project, currentlyGlobus 2.x.

3.6. STANDARDS COMPLIANCE

Grid monitoring will comply with two main external specifications:

OMIS: The Online-Monitoring Interface Specification

SQL: RGMA makes use of SQL and therefore complies with the currentSQL standard. It currently implements only a subsection of SQL.

Ethernet IEEE-802.11: For SANTA-G the Ethernet trace formats must be compatible with theEthernet IEEE-802.11 Standard.

3.7. SOFTWARE SYSTEM ATTRIBUTES

Not yet defined.

Page 35: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

AppendixesTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 35 / 44

4. APPENDIXES

4.1. API

OCM-G

To be defined.

SANTA-G

SANTA-G makes use of R-GMA components, and as such does not provide an API of its own. TheAPI is specified by the R-GMA. This API is specified in the R-GMA User Guide, Appendix A, whichcan be seen at: http://marianne.in2p3.fr.datagrid/documentation/

Jiro-BASED MONITORING

To be defined.

4.2. R-GMA

The R-GMA is not a general distributed RDBMS system, but a way to use relational model in adistributed environment where ACID (Atomicity, Consistency, Isolation and Durability) properties arenot considered essential. However, the R-GMA can be viewed as one huge logical database,partitioned according to certain criteria (specified by a WHERE clause as a predicate).

4.2.1. R-GMA ARCHITECTURE

The architecture consists of three objects, Producers, Consumers, and a Registry, as shown in Figure4.2.1. Producers announce that they can publish specific information into the R-GMA by subscribing(announcing) to the Register. This they do by issuing a SQL ‘CREATE TABLE’ specifying thestructure of the table that they can publish. Consumers retrieve this information via SQL ‘SELECT’,which triggers discovery of the location of the information, i.e. a look-up of the Registry (currentlythis is done rather crudely – DataGrid are developing a Mediator to optimize the discovery). Acanonical producer doesn’t publish anything until queried, it just announces where it can be accessedand how the information is structured and then waits for a query. More elaborate producers may domore; they may even begin publishing once they are instantiated. Actual publishing is via a SQL‘INSERT’, which may take place only in response to a query (and then as either a single item or astream of items), or may be take place to a buffer or database in expectation of subsequent queries.Thus:

Producers announce: SQL ‘CREATE TABLE’

Consumers collect: SQL ‘SELECT’

Producers publish: SQL ‘INSERT’

Page 36: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

AppendixesTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 36 / 44

Consumer

Producer

Registry

register

lookup

Figure 4.2.1 R-GMA Architecture

The R-GMA is written in Java, and is based on servlet technology, as shown in Figure 4.2.2.

A Producer is a class that communicates with a ProducerServlet object (nearby but not necessarily onthe same host) via sockets. The class introduces insignificant overhead. The servlet, on the other hand,requires the Tomcat server, lots of memory, and significant overhead, but then it can reside on anotherhost. When the class is instantiated the SQL ‘CREATE TABLE’ is sent to the ProducerServlet andthence to the RegistryServlet object, so that it then knows where the information is and where it is.Then the class can publish the information via an ‘insert’ method that issues an SQL ‘INSERT’ to theProducerServlet.

5HJLVWU\

�,QWHUIDFH

5HJLVWU\

6HUYOHW

6FKHPD�,QWHUIDFH

6FKHPD

6HUYOHW

,QYRFDWLRQ

,QIRUPDWLRQ

&OLHQW�VLGH

3URGXFHU

6HUYOHW

&RQVXPHU

6HUYOHW

6HQVRU

3URGXFHU

&RQVXPHU

$SSOLFDWLRQ

$UFKLYHU

6HUYOHW$SSOLFDWLRQ

$UFKLYHU

+LGGHQ�FRPSRQHQWV

'%3URGXFHU

6HUYOHW'%3URGXFHU

Figure 4.2.2 R-GMA Servlets

Page 37: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

AppendixesTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 37 / 44

4.2.2. R-GMA PROTOCOLS

For R-GMA, SQL queries are issued using the HTTP/HTTPS protocol:

http://localhost:8080/ProducerServlet/constructProducer?tableName=cpuLoad&flags=1

Result-sets are returned via the XML protocol:

<GMA-Response … >

<ResultSet>

<rowMetaData>

<colMetaData>loadavg</colMetaData>

<colMetaData>timestamp</colMetaData>

</rowMetaData>

<row>

<col>0.15<col>

<col> … <col>

<row>

<row>

<col>0.15<col>

<col> … <col>

<row>

</ResultSet>

</GMA-Response>

At the Producer servlet a ResponseWriter converts a result-set object to XML; at the Consumer servleta XMLConverter restores them to Java objects.

4.2.3. R-GMA IMPLEMENTATION

In the implementation, each application that uses a servlet does so via an API. There are four kinds ofservlets that interact directly with the client side: a Producer (a basic buffering or streaming producer)and a DBProducer (that renders information persistent by storing it using a RDBMS such as MySQL),and a Consumer and an Archiver. The Archiver is a special kind of servlet that can have multipleConsumer interfaces inserting selected information into a DBProducer, which can then itself bequeried – this is expected to be intensively used within Task 3.3 for post-processing of monitoredinformation. A fifth servlet, so-far called a FileProducer (actually the canonical producer), is indevelopment, and is also expected to be intensively used by Task 3.3 for access to independently-generated logfiles. Thus the implementation is as shown in Figure 4.2.3.

Page 38: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

AppendixesTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 38 / 44

&RQVXPHU�6HUYOHW

5HJLVWU\$3,

&RQVXPHU�6HUYOHW

5HJLVWU\$3,

&RQVXPHU�6HUYOHW

5HJLVWU\$3,

&RQVXPHU�6HUYOHWDQG�0HGLDWRU

5HJLVWU\$3,

6HQVRU�&RGH�

3URGXFHU$3,

$SSOLFDWLRQ�&RGH�

&RQVXPHU$3,

3URGXFHU6HUYOHW�

5HJLVWU\$3,

5HJLVWU\�6HUYOHW

6FKHPD$3,

6FKHPD�6HUYOHW�

(YHQW�'LFWLRQDU\

'%3URGXFHU6HUYOHW

$SSOLFDWLRQ�&RGH�

$UFKLYHU$3,

'%3URGXFHU

$UFKLYHU�6HUYOHW

&RQVXPHU$3,&RQVXPHU$3,&RQVXPHU$3,&RQVXPHU$3,

lookup

subscribe

Figure 4.2.3 R-GMA Implementation

4.3. JIRO

The system architecture is depicted in Figure 4.3.1.

The Jiro-based monitoring system uses a five level architecture (see Figure 4.3.2):

• Instrumentation level

• Agent level

• Management logic level

• Database level

• User interface level

The instrumentation level provides JMX manageable resources. The instrumentation of a givenresource is provided by one or more Managed Beans, or MBeans, which are either standard ordynamic. Standard MBeans are Java objects that conform to certain design patterns derived from theJavaBeans™ component model. Dynamic MBeans conform to a specific interface which offers moreflexibility at run-time.

The agent level provides a specification for implementing agents. Management agents directly controlthe resources and make them available to remote management applications. Agents are usually locatedon the same machine as the resources they control, although this is not a requirement. The agents will

Page 39: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

AppendixesTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 39 / 44

communicate with higher-level entities via so-called ‘connectors’, which will serve different protocols.SOAP and Java RMI will be the primary protocols.

SQL Database

1

MbeanServer

ServerConnector

Resource

Java Virtual MachineJiro Station

Agent Application

ClientConnector

RegisteredMbean

MbeanProxy

Database Interface Server

Notificationbuffer

O2

O1 EBean

1

O1 O2

Proxy

ClientNotificationInterface

Observer RegistrationNotfication Propagation

EJB ApplicationServer

EBean

JSP

WWW Server

JSP

WWW Client

WWW Client

WWW Client

SQL Database

1

MbeanServer

ServerConnector

Resource

Java Virtual MachineJiro Station

Agent Application

ClientConnector

RegisteredMbean

MbeanProxy

Database Interface Server

Notificationbuffer

O2

O1 EBean

1

O1 O2

Proxy

ClientNotificationInterface

Observer RegistrationNotfication Propagation

EJB ApplicationServer

EBean

JSP

WWW Server

JSP

WWW Client

WWW Client

WWW Client

1

MbeanServer

ServerConnector

Resource

Java Virtual MachineJiro Station

Agent Application

ClientConnector

RegisteredMbean

MbeanProxy

Database Interface Server

Notificationbuffer

O2

O1 EBean

1

O1 O2

Proxy

ClientNotificationInterface

Observer RegistrationNotfication Propagation

EJB ApplicationServer

EBean

JSP

WWW Server

JSP

WWW Client

WWW Client

WWW Client

Figure 4.3.1 Jiro-based monitoring system architecture

Features provided by MBean Server:

• Listing and filtering MBeans by their symbolic name

• Discovering and publicising the management interface of MBeans

• Accessing MBean attributes for reading and writing

• Invoking operations defined in the MBean interface

• Registering and deregistering listeners for MBean notifications

Since SNMP is widely used to monitor network devices. JDMK provides tools for generating MBeaninterfaces from SNMP MIB descriptors. This way a SNMP agent can be accessed through a MBeaninterface.

Page 40: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

AppendixesTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 40 / 44

Figure 4.3.2 The five-layer system architecture

The management logic level provides the mechanism for embedding management logic. Its purpose isto provide both filtering and notification mechanisms. It is based on the RuleEngine technology. Wehope to use the Java™ Rule Engine API (JSR 94), which is defined by Java Community Process.Because it is not available yet (currently it is in Community Review stage of JCP) for now we will useCommonRules 3.0 from IBM.

The database level provides a layer for storing data.

A dynamic Jini lookup service and discovery mechanism (Jiro/FMA) is used to link components fromagent and management logic level together.

Page 41: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

AppendixesTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 41 / 44

4.3.1. Jini BINDING MECHANISM

All MbeansServers will be registered with the Lookup Service as shown in Fig.4.3.3. The secondbinding mechanism - based on discovery protocol - will also be provided by the service. The conceptof this binding procedure has been depicted in Fig.4.3.4.

Agent 1

Agent 2

Agent 3

Manager

Discovery Client

Discovery Request

Reply

Responding Object

(Multicast)

Agent 1

Agent 2

Agent 3

Manager

Discovery Client

Discovery Request

Reply

Responding Object

(Multicast)

Figure 4.3.3 Active agent discovery procedure

Agent 1

Agent 2

Agent 3

Jini Lookup Server Lookup Client

Register Request

Lookup Request/Reply

Mbean

Binding Operation

Figure 4.3.4 Jini Lookup Service binding procedure

4.3.2. RULE ENGINE

The rule engine may be viewed as a sophisticated if/then statement interpreter. The if/then statementsthat are interpreted are called rules. The "if" portions of rules contain conditions such as

Page 42: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

AppendixesTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 42 / 44

"shoppingCart.totalAmount > $100". The "then" portions of rules contain actions such as"recommendDiscount(5%)". Inputs to the rule engine are a rule set and some data objects. The outputsfrom the rule engine are determined by the inputs and may include the originally input data objectswith possible modifications, new data objects and side effects such as "sendMail(’Thank you forshopping’).

Rule-based systems differ from standard procedural or object-oriented programs in that there is noclear order in which code executes. Instead, the knowledge of the expert is captured in a set of rules,each of which encodes a small piece of the expert’s knowledge. Each rule has a left hand side and aright hand side. The left hand side contains information about certain facts and objects which must betrue in order for the rule to potentially fire (that is, execute). Any rules whose left hand sides match inthis manner at a given time are placed on an agenda. One of the rules on the agenda is picked (there isno way of predicting which one), and its right hand side is executed, and then it is removed from theagenda. The agenda is then updated (generally using a special algorithm called the Rete algorithm),and a new rule is picked to execute. This continues until there are no more rules on the agenda.

The typical rule-based system has a relatively static set of rules, while the collection of facts(knowledge base) changes continuously with time. New facts arrive and old ones are removedcontinuously.

The CommonRules package includes a rule engine that provides inferencing capability, an approachwhich emphasizes the clear and clean separation of data (represented by business objects) and logic(represented by rules). The connection between business rules and external objects is achieved viasensors, effectors, dynamic object mapping and variable bindings. Any condition in a rule can belinked to an external object via a sensor to provide dynamic, real time data mapping. Likewise, theconclusion of a rule can trigger an action with dynamic data binding via an effector.

Events received by MRuleAgent will be treated as new facts. Sensors will be used to interact withagents operating in pulling model. Effectors will be used to perform actions defined by the user (e.g.notification).

4.3.3. INTERFACES

An external Web Browser interface to a ‘J2EE Service’ provides information about the past andcurrent state. This could be information about separate resources or summarized data.

An external Web Browser interface to an ‘Mbean Server’ provides information about a particularresource. This information provides only the current state of the resource. Connection is made directlyto the MBean Server through JDMK Protocol Adaptors.

An externally accessible MruleService provides a notification mechanism for applications interested inimmediate notifications about certain states of the grid. It is based on the SOAP protocol. RuleXML isused to specify the conditions that will result in a notification.

Page 43: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

AppendixesTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 43 / 44

An internal MBeanServer – MruleService interface is standardised by the Jiro JMX specification(Connector mechanism). Information from a MBean will interact with the RuleEngine in two ways:

• by subscribing MRuleService itself as an MBean listener. MRuleService receives events andinserts them into RuleEngine environment as new facts.

• by using the sensor mechanism of RuleEngine. Sensors will be used to interact with agentsoperating in the pulling mode. Information from MBeans will be pulled from MBean whenRuleEngine executes a rule which involves a given MBean.

An internal MBeanServer - J2EE Server interface is provided. EJB subscribes itself as an MBeanlistener. EJB receives events, which are converted into appropriate SQL commands and inserted into aSQL Database.

An internal MRuleService - J2EE Server interface is also provided. MRuleService works as a filteringmechanism. Information captured from MBeans is filtered by RuleEngine and inserted into a SQLDatabase.

Page 44: T 3.3 G M S R SPECIFICATION · CG-3.3-SRS-0012 PUBLIC 1 / 44 TASK 3.3 GRID MONITORING SOFTWARE REQUIREMENTS SPECIFICATION Task 3.3 Grid Monitoring Document Filename: CG-3.3-DOC-0001-SRS

IndexTASK 3.3 GRID MONITORING SOFTWAREREQUIREMENTS SPECIFICATION

CG-3.3-SRS-0012 PUBLIC 44 / 44

5. INDEX

A

Application monitoring 4,10,18

C

CommonRules 5,6,40,42

E

EJB 5,9,20,21,43

I

Invasive monitoring 8,9,30

J

J2EE 5,14,32,42,43

Java™ RuleEngine 32,40,43

Jiro 4,5,6,7,9,10,14,15,16,17,18,20,21,24,30,31,34,35,38,39,40,43

N

Non-invasive monitoring 4,7,9,19

O

OCM-G 4,5,6,7,8,9,10,12,13,14,15,16,17,18,19,20,23,25,26,34,35

OGSA 5,6,10,13,22

OMIS 4,5,6,8,10,15,17,18,19,20,23,25,26,34

R

R-GMA 5,6,10,12,14,15,16,18,22,28,29,35,36,37,38

S

SANTA-G 5,6,7,9,10,12,13,14,16,23,24,26,31,32,33,48