Upload
pauline-mccarthy
View
217
Download
0
Embed Size (px)
Citation preview
Performance and Exception Monitoring Project
Tim Smith CERN/IT
2000/11/02 Tim Smith: HEPiX @ JLab 2
Overview
Motivation Objectives
Analysis and Design Prototyping Perspective and Future
2000/11/02 Tim Smith: HEPiX @ JLab 3
Motivation
Alarm Recovery action
Monitoring System
Local Remote
Process killer Console Resource planning
Accounting Security Inventory
Independent systems No single overview Duplicated collection
Host based: Want Service Perceived problems not
real Scalability
2000/11/02 Tim Smith: HEPiX @ JLab 4
Motivation
Alarm Recovery action
Monitoring System
Local Remote
Console Resource planning
Accounting Security Inventory
Configuration Collection Transport Repository mgmt Display
2000/11/02 Tim Smith: HEPiX @ JLab 5
Objectives
To provide tools in which the alarms and displays are orientated to the overall service provided:
User end-to-end views, Quality of service views Managerial views of resource usage / evolution / failure rates Service provider views, and detailed machine views Link the alarms to both the monitoring and corrective actions
To provide service level metrics To provide a uniform monitoring infrastructure
Coordinated central repositories + Common logging format Averaging and archiving of logged information Correlations between logged information
Multiple input routes; extensible moni. clients Modular tools; demonstrated scalability
2000/11/02 Tim Smith: HEPiX @ JLab 6
Process
Analysis User Requirements Document Current Tools survey
Enterprise/Cluster mgmt, Pub domain, other labs, building blocks, DAQ, Run Control, Slow Control
Goal / Question / Metric formalism System Requirements Document
Design Interfaces Document Prototyping
2000/11/02 Tim Smith: HEPiX @ JLab 7
Goal / Question / Metric
Ensure quality of Interactive Service Sufficient nodes? Low enough load? Slow to respond to commands? Contactable via network
Network daemons alive No nologin Free ptys Connection test from remote node
2000/11/02 Tim Smith: HEPiX @ JLab 8
PEM Architecture
UserInterface
MonitoringAgent
MonitoringBroker
MeasurementRepository
ConfigurationRepository
CorrelationEngine
AccessServer
1
1
1
1
1
1
1
1
1 1..n
1..n
1..n
1..n
1..n
1..n1..n
OutsidePEM
2000/11/02 Tim Smith: HEPiX @ JLab 9
Configuration Repository
<TAG>
</TAG>
Parser
<TAG>
</TAG>
<TAG>
</TAG>
<TAG>
</TAG>
<TAG>
</TAG>
XML-DBMS
jdbc RDBMS
Viewers XercesFrom Apache
XML-DBMS freeware(Tried XSU from Oracle)
XMLSchema
Loading the DB
Host, Host typeMetrics, Services
2000/11/02 Tim Smith: HEPiX @ JLab 10
Configuration Repository
<TAG>
</TAG>
Parser
<TAG>
</TAG>
<TAG>
</TAG>
<TAG>
</TAG>
<TAG>
</TAG>
XML-DBMS
jdbc RDBMS
XML DB
Querying the DB
jdbc
ConfigurationItems
Java Objects
2000/11/02 Tim Smith: HEPiX @ JLab 11
Correlation Engine
To correlate metrics from the MRS according to configuration in the CRS Metric collections: trends + multiple machines Samplings: Union for read efficiency from MRS
Example Java Classes: Correlation coordinator Sampling cache Evaluators Timers
2000/11/02 Tim Smith: HEPiX @ JLab 12
Publish / Subscribe : Java RMI Interfaces Document
Events
UserInterface
MonitoringAgent
MonitoringBroker
MeasurementRepository
ConfigurationRepository
CorrelationEngine
AccessServer
metric stream
metric value
exception
configuration
2000/11/02 Tim Smith: HEPiX @ JLab 13
Monitoring Agent/Broker I
SNMP extended existing infrastructure Multithreaded broker loading DB
JMX / JDMK JMX public specification: managed resources Plugable agents Reported several important bugs Demo at JavaOne conference
Linux/NT remote reset Netlogger instrumentation
Opened up license negotiations
2000/11/02 Tim Smith: HEPiX @ JLab 14
Monitoring Agent/Broker II
C Low overhead
SNMP
/proc
netlogger
Script
Spool
Monitoring Process Spool Manager Monitoring Broker
Not yet … DMTF DMI, CMI
2000/11/02 Tim Smith: HEPiX @ JLab 15
PEM Futures
Today: CERN CC needs it Prototype for ALICE MDC III in January
Tomorrow: Tier-0 RC / GRID node need it More complete management solutions
Integrate into the Fabric Management WP ‘GRIDification’
Rapidly evolving technologies Lots of middleware
Lots of companies wanting collaboration still need framework
2000/11/02 Tim Smith: HEPiX @ JLab 16
Configuration
Management
Alarm
Recovery Actions
Inventory
Resource Planning
Security
PEM in Perspective
PC Hardware
Console Mgmt
Power Mgmt/Remote Reset
OS Installation/Update
OS Configuration/Update
Application Inst/Update
Monitoring