Upload
ulric
View
26
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Grid Discovery and Monitoring Systems. Laura Pearlman USC/Information Sciences Institute With materials from Ben Clifford and others from the Globus Project Team. Outline. Overview of information systems Some real implementations Globus MDS2 / BDII Globus MDS4 Inca GMA / R-GMA. - PowerPoint PPT Presentation
Citation preview
Grid Discovery and Monitoring Systems
Laura PearlmanUSC/Information Sciences Institute
With materials from Ben Clifford and others from the Globus Project Team
Outline
Overview of information systems Some real implementations
Globus MDS2 / BDII Globus MDS4 Inca GMA / R-GMA
Discovery and Monitoring
Discovery: finding resources that exist, at any moment, possibly meeting some criteria E.g., “find linux boxes with Java 1.5 installed”
Monitoring: determining the state of one or more resources E.g., “how much memory is free on machine X”?
“Monitoring” and “Discovery” information sometimes overlap “find me machines with 2G memory” vs. “how much
memory does Machine X have”
Examples of Useful Information
Characteristics of a compute resource Software available, networks connected to, load, type of CPU, disk space
Characteristics of a network Bandwidth and latency, protocols
Information about a service Contact info, version number, etc.
Who uses this information?
Individual users, trying to pick the ‘best’ resource Brokers or workflow systems trying to find suitable
resources VO administrators who want to know the state of
every resource. System administrators may use this information, but
probably also have local site monitoring systems in place
What Interfaces are Needed?
Graphic and command-line interfaces for individual users and administrators
Programmatic interfaces for brokers, workflow systems, etc.
Asynchronous notifications for administrators “send me mail when we’re almost out of
disk space”
Monitoring/Discovery Problems in Grids
Dynamic in nature VOs come and go Resources join and leave VOs Resources change status and fail
Geographically distributed users Geographically distributed resources Heterogeneous implementations
Grid Information: Facts of Life Information is always old Distributed state hard to obtain Components will fail
We must deal with this gracefully Scalability and overhead Many different usage scenarios
Resource Discovery/Monitoring
Distributed users and resources Variable resource status Variable grouping
RR
RR
R
R
?
?
R
RR
R
R R
R
R R?
?R
R
R
dispersed users
VO-A VO-B
network
RR
Resource Discovery/Monitoring
Some resources have failed A network partition has occurred Still, some work can get done…
RR
RR
R
R
?
?
R
RR
R
R R
R
R R?
?R
R
R
R R
dispersed users
VO-A VO-B
network
Scalability
Large numbers Many resources Many users
Independence Resources shouldn’t affect one another VOs shouldn’t affect one another
Graceful degradation of service “As much function as possible” Tolerate partitions, prune failures
Failure Scenarios
User is disconnected Resource fails or is disconnected Discovery service fails or is disconnected Network partition
When a user is disconnected
This should not adversely affect other users Some state (such as the user’s subscriptions) may
need to be cleaned up. Some systems use soft-state to deal with this
issue: Subscriptions are valid for a limited time and must
be periodically refreshed If the user does not come back in time to refresh the
subscription, it will be removed automatically.
When a resource disappears
Monitoring services should indicate that the resource is no longer there
Discovery services should stop advertising the resource
Neither of these can be gauranteed to happen instantaneously.
When a discovery service dies
Users cannot discover new resources. They may have old information cached –
this data is still useful, although it degrates in quality/usefulness.
Users can contact the resources directly and determine their status.
Some implementations allow for mirroring of discovery services.
When the network is partitioned
This could be seen as a generalization of some the previous scenarios – all of the previous scenarios can be modelled as appropriate network partitions.
If there is a discovery service in a user’s partition, the user should be able to discover resources in that partition.
Information Systems
We sometimes refer to Discovery and Monitoring as “Information Systems” This is misleading, as we’re not including
general-purpose database systems Discovery and Monitoring information is:
Often stale as soon as it’s reported Sometimes inconsistent Often updated by running probes, either on-
demand or periodically
Discovery Services Used to locate monitoring services with information
about resources. May cache some resource data
May even cache enough resource data to act as a monitoring system.
Generally involve a database-like query interface Languages like ldap, xpath, sql
Usually a relatively small number (maybe even just one, or one with a mirror) are deployed in a VO.
Two Models for Discovery Services
DiscoveryService
MonitoringService
Monitoring & Discovery
ServiceMonitoring
Service
MonitoringService
MonitoringService
MonitoringService
MonitoringService
Monitoring Services
Used to monitor the state of a resource Service interface usually involves db-like queries
With languages like ldap, xpath, sql Often also provides for asynchronous notification
Typically also includes a back-end provider interface Allows locally-written scripts, programs, etc. to
collect information for the monitoring service Typically deployed on each host that houses a
resource.
How Different Implementations Differ
Overall architecture Are monitoring and discovery separate?
Wire protocol LDAP, Web Services, custom
Query Language LDAP, Xpath, SQL
Caching Strategies Schemas
Really more a deployment issue
MDS2 / BDII history
MDS2 was developed as part of the Globus Toolkit It’s now superseded by MDS4, which has a
different architecture. BDII is a reimplementation of MDS2 by
EGEE, and is still in use.
MDS2 Architecture Overview
The Grid Resource Information Service (GRIS) collects information about a local resource and responds to requests for that information Uses pluggable information providers
The Grid Index Information Service (GIIS) aggregates information from various GRIS servers
Users may query the GIIS for aggregated information or query the GRIS servers directly.
GIIS servers may be arranged hierarchically.
MDS2 Architecture
GRIS
IP IP
GRIS
IP IP
GRIS
IP IP
GIIS GIIS
GIIS
MDS2 GIIS Grid Index Information Service (GIIS) servers
aggregate information from GRIS servers and other GIIS servers. These other servers register themselves to the GIIS
server. Registrations must be periodically refreshed
GIIS servers cache information (results from previous queries).
If a GIIS server receives a query for which there is no fresh cached information, it forwards the query to its registered servers.
MDS2 GRIS
A Grid Resource Information Server (GRIS): Runs on each host that has resources to be
monitored. Accepts requests for information about local
resources May come from users or GIIS servers
Runs a local “information provider” to collect and format the information
Unless the requested information is cached and relatively fresh
Caches the information and replies to the request
MDS2 Query Language
Both the GIIS and GRIS servers use LDAP as the service protocol and query language.
LDAP Basics Hierarchical data model Each entry has a distinguished name and a set of
attribute/value pairs Distinguished name
Is a collection of name-value pairs Must be unique Determines the entry’s place in the hierarchy
Each entry’s DN must include its parent’s DN
Queries Can search on attributes or DNs Results can include children (or not) or include only
certain attributes.
MDS4 Overview MDS4 is a redesign of MDS The MDS4 Index Service acts as both a monitoring
and discovery service. Uses WSRF standard resource property queries as its
query interface. A second monitoring service, the MDS4 Trigger
Service, examines aggregated information and takes action when certain conditions are met. E.g., “send email when a remote system appears to
be down”. MDS4 uses WSRF standards for its query and
registration interfaces.
WS-Resource Review
A WS-Resource is a Web Service that exposes internal state as Resource Properties An XML element of arbitrary complexity
Each WS-Resource has a Resource Property Document An XML document that includes all its Resource
Properties Example: The WS-GRAM service advertises
information about its associated queues and clusters as a resource property.
Retrieving Resource Properties
GetResourceProperty Gets a single named resource property
GetMultipleResourceProperties Gets a set of named resource properties
QueryResourceProperty Returns the results of a query against a resource’s
resource property set Subscription/notification
Clients subscribe and get periodic or occasional notifications
What this means… Standard requests can be used to get state
information from any WS-Resource. This means that every WS-Resource is also a
monitoring service! But not necessarily monitoring anything (i.e.,
providing any interesting state) We sometimes want information from sources
other than WS Resources Non-WSRF services General system information Catalogues of installed software
Service Groups Review A service group is a service that represents a group
of other services or resources Service groups contain Service Group Entries
(SGEs), which consist of: The address of the SGE itself, The address of the Service Group that the SGE
belongs to, and A Content element consisting of arbitrarily-formatted
data SGEs are created via the Service Group Add
request
The MDS4 Index Service
Acts as a Discovery Service Gathers information from other WS-
Resources Including other Index Servers
Acts as a Monitoring Service Caches all the information it gathers Also has a pluggable interface for
Information Providers Programs or Java classes that gather information
An MDS4 Index Deployment
Index
GRAM RFT
Index
GRAM RFT
Index
Index Index
IP IP
The MDS4 Index Data Model
The Index Service keeps its data as a Service Group Registering a new resource to be monitored
is accomplished by adding a service group entry to the service group.
The data in each SGE contains both: Configuration information
E.g., “query the X resource property from server Y”
and the actual collected data.
Index Data Model (simplified)
Index Service Group
SGE SGE
SG EPR SGE EPR Content
Config Data
GLUECE
Queue Cluster
Name State Name OS
RP EPR
GetRP
Data Model continued
In the Index Service data model, data is grouped with its configuration information
Can have the “same” data two different places in the tree, if it was acquired from two different information sources. E.g., information about a host’s load
average from two different GRAM servers running on that host.
Relatively easy to find where each piece of data came from.
How the Index Updates its Data
Periodically, the Index Service examines each SGE in its Service Group
If the SGE’s registration has expired and not been renewed, it is destroyed.
Otherwise, the Index looks at the Config part of the SGE content, gathers data as specified by that config information,
and updates the data in the Data part of the SGE content
Data is updated periodically, not on demand.
Querying the Index Service
The Index Service advertises its service group as a resource property You can fetch the whole thing with GetRP or
GetMultipleRPs Most people use QueryRP to query it.
QueryRP allows you to specify a dialect and a query Currently, only Xpath is supported as a
dialect
XPath Queries
Search an XML document and return some subset of the XML entities.
If an entity is included in the results, it’s included in its entirety Unlike LDAP, no way to leave out attributes
or children
MDS4 Trigger Service
A second monitoring service in MDS4 The Index is geared more towards queries
intended for resource location and selection.
The Trigger service is intended to alert people to problems. Can be configured to take action (e.g., send
mail to an administrator) when issues arise.
MDS4 Trigger Service Maintains information in a service group, like the
Index Service SGE config information also includes an xpath
query and an action The action is the name of a program to run.
Periodically, the trigger service looks at each SGE in its servicegroup: It evaluates the SGE’s xpath query against the SGE’s
data. If the query returns true, it runs the program
specified by the action.
MDS4 WebMDS
Provides a simple HTTP interface to query an MDS Index Service Really, to query resource properties of any
WS-Resource Optionally applies XSLT transforms to the
query results. Designed as a user interface, to be used
with a web browser But some people are using it to provide a
REST-like interface to MDS4.
INCA
Monitoring system developed at SDSC Users define tests for Inca to run. Inca runs them and stores the results in a
database. Users can view the results on a web page. Can be configured to send mail if tests fail,
etc. Can run tests using the user’s credentials
From the Inca 2.1 User’s Guide, http://inca.sdsc.edu/releases/2.1/guide/userguide.html
Inca Query Interface
Uses an SQL database internally End-users can query using a web page or
receive notifications via email. A web-services interface is also available
Uses a custom query language Overall a nice monitoring/testing
framework Not designed as a discovery service
GMA (Grid Monitoring Architecture)
Proposed architecture with three components: Producers produce information Consumers consume information Directories keep track of what information
is available what producers can be queried, not the actual data
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Diagram from “A Grid Monitoring Architecture”, B. Tierney et al., http://www-didc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-2.pdf
R-GMA
Relational Grid Monitoring Architecture Implements the GMA model
Except that users never interact with the directory service (called a “registry” in R-GMA)
A consumer service does that instead, and users query the consumer service.
Uses SQL as its query language.
An R-GMA Query
Diagram from “R-GMA: Architectural Design” at http://www.r-gma.org/arch-consumers.html
•Client sends SQL query to Consumer Service•Consumer Service contacts registry for list of producers to contact•Consumer service queries producers and buffers results•Client retrieves results from consumer service
For More Information
Globus: http://www.globus.org Inca: http://inca.sdsc.edu R-GMA: http://www.r-gma.org XML / Xpath / XSLT: http://www.w3c.org