Upload
garey-cooper
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
1
Cluster Monitoring with EPICS and SNMP
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
2
Motivation• We wish to monitor the ALICE HLT analysis cluster – 500 PCs
• The analysis of data obtained from the ALICE experiment will take a long time, therefore a stable analysis cluster is needed
• To ensure stability, this cluster must be constantly monitored
• Using the EPICS architecture with SNMP support it is possible to monitor such a PC cluster
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
3
Contents• Cluster Management
– SNMP
• MIB Trees
• SNMP Operations
• Using data from SNMP
– EPICS
• Overview
• Channel Access
• Record Display
• Device Support
– devSNMP
• Management Possibilities
• Test Implementation
– Overview
– Software
– Monitored Resources
– Example Implementation
– Extended Implementation
• Extension Possibilities
• Current State
• Summary
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
4
Cluster Management• Nowadays PC clusters are widely used for data analysis in many
settings, such as in physics experiments or commercial organisations
• These clusters often consist of hundreds to thousands of individual PCs (nodes)
• In order to maintain a healthy, efficient cluster, key resources of the nodes must be monitored, eg:
– Hard disk usage
– Processor usage
– Running processes, etc...
• What is the best way of obtaining this information from the nodes?
– Self monitoring?
– Operating system logging?
– SNMP?
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
5
Simple Network Management Protocol• Simple Network Management Protocol (SNMP) is a management
protocol for gathering statistical data about network/host traffic and the behaviour of network components
• It is a telecom industry standard protocol and therefore most standardized organizations and main vendors support SNMP
• It creates an extensive Management Information Base (MIB) on the host system, which is a database of information useful for network management
• MIB objects are organised in a tree structure that includes public (standard) and private branches
• These MIBs contain key system resource information which can be used for monitoring purposes
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
6
MIB Tree - Graphical View
sysDescr = 1 sysUpTime = 3
dskTotal = 6 dskAvail = 7
mgmt = 2
iso = 1
org = 3
dod = 6
internet = 1
MIB-2 = 1
private = 4
system = 1
enterprises = 1
ucdavis = 2021
dskTable = 9
dskEntry = 1
• MIB tree can referred to symbolically or numerically
– Eg: iso.org.dod.internet.mgmt.mib-2.system.sysUpTime = 1.3.6.1.2.1.1.3
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
7
MIB Tree - Output View+--iso(1)
| +--org(3) | +--dod(6) | +--internet(1) | +--directory(1) | +--mgmt(2) | | | +--mib-2(1) | | | +--system(1) | | | | | +-- -R-- String sysDescr(1) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -R-- ObjID sysObjectID(2) | | +-- -R-- TimeTicks sysUpTime(3) | | +-- -RW- String sysContact(4) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -RW- String sysName(5) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -RW- String sysLocation(6) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -R-- INTEGER sysServices(7) | | | Range: 0..127 | | +-- -R-- TimeTicks sysORLastChange(8) | | | Textual Convention: TimeStamp | | |
| | +--sysORTable(9) | | | | | +--sysOREntry(1) | | | Index: sysORIndex | | | | | +-- ---- INTEGER sysORIndex(1) | | | Range: 1..2147483647 | | +-- -R-- ObjID sysORID(2) | | +-- -R-- String sysORDescr(3) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -R-- TimeTicks sysORUpTime(4) | | Textual Convention: TimeStamp | | | +--interfaces(2) | | | | | +-- -R-- Integer32 ifNumber(1) | | | | | +--ifTable(2) | | | | | +--ifEntry(1) | | | Index: ifIndex | | | | | +-- -R-- Integer32 ifIndex(1) | | | Textual Convention: InterfaceIndex | | | Range: 1..2147483647 | | +-- -R-- String ifDescr(2) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -R-- EnumVal ifType(3)
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
8
SNMP Operations - Overview• SNMP has simple client-server interactions with few operations to
access information held in the MIB tree:
– {Get} {Set} {GetNext} {Walk} {Table} {Trap} {Translate}
• These operations can query local MIB trees, or those of networked machines
SNMPAgent
MIBM
an
age
d D
evice
SNMPAgent
MIB
SNMPAgent
MIB
SNMPAgent
MIB
SNMPAgent
Network
SNMP Operation
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
9
SNMP Operations - Command Struct.• Typical SNMP {get} command structure:
Operation Community PC to Query MIB Object to query
• Output:
MIB Object queried Object Type Object Value
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
10
Using Data from SNMP• Once the information has been obtained from the MIB trees it must be
fed into a control system for it to be useful in a management context
• This might process the information, store it for later analysis, or simply display it using a Graphical User Interface (GUI)
• Many systems currently exist:
– EPICS
– Ganglia
– Lemon
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
11
EPICS - Overview• One such system is the Experimental Physics and Industrial Control
System (EPICS)
– www.aps.anl.gov/epics
• It is currently in use in over 12 organizations to control devices in major projects such as Particle Accelerators, Telescopes, and Large Experiments
– GSI, SLAC, ANL, DESY, LANL, ...
• Therefore, huge support and knowledge base
• It is based on a client/server network model, with servers holding information in Records which can be accessed by the clients
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
12
EPICS - Architecture
RecordField 1: xField 2: yField 3: z
RecordField 1: xField 2: yField 3: z
EPICS Clients
EPICS Servers
Network
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
13
EPICS - Channel Access• Remote access to EPICS records is achieved through the Channel
Access (CA) protocol
• This requires a CA server to be running on the EPICS server, and a CA client to be running on the EPICS client
• These are usually already integrated into EPICS clients/servers when they are created
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
14
EPICS - Architecture
RecordField 1: xField 2: yField 3: z
RecordField 1: xField 2: yField 3: z
EPICS Clients
EPICS Servers
Network
CA Server CA Server
CA ClientCA Client
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
15
EPICS - Record Display• The information from EPICS records can be displayed by a GUI:
MEDM
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
16
EPICS - Record Display
GumTree
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
17
EPICS - Device Support• Records can be interfaced to numerous devices
• These devices can be hardware or software
• Interfacing allows information from device to be input into EPICS records
• This interfacing is known as device support
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
18
EPICS - Architecture
RecordField 1: xField 2: yField 3: z
RecordField 1: xField 2: yField 3: z
EPICS Clients
EPICS Servers
Network
CA Server CA Server
CA ClientCA Client
Support Support
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
19
Device Support for SNMP - devSNMP• devSNMP is the device support for SNMP
• Allows the input of data from SNMP into EPICS records
– Sets input field of a record to an SNMP {get} operation
• It is configured for the open source product, NET-SNMP
– This is simply one particular implementation of SNMP
– www.net-snmp.org
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
20
Device Support for SNMP - devSNMP• SNMP {get} command:
• Record definition file:
record (stringin, “System_Description"){
field (DTYP,"Snmp")
field (INP,"@localhost public system.sysUpTime.0 STRING:100")
field (SCAN,"5 second")}
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
21
Management Possibilities• EPICS records are capable of carrying out simple calculations and
conditionality relations – nothing very complicated
• The data from SNMP can therefore be used to control other devices interfaced with EPICS records
• One reaction possibility is an SNMP {set} operation, which writes values to a MIB
• However, the current release of devSNMP supports only {get} operation
• Other SNMP command support planned for the future
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
22
Test Implementation - Overview• Carried out at the Linux PC Cluster at the Kirchhoff Institute for
Physics, University of Heidelberg
• 32 PCs running SuSE 9 Linux OS
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
23
Test Implementation - Software• EPICS Servers:
– 30 cluster nodes (2.4 and 2.6 kernels) running EPICS soft IOCs with devSNMP
– NET-SNMP tool set and libraries installed on each node
• EPICS Clients:
– Two cluster nodes (2.6 kernel) running an installation of Motif Editor and Display Manager (MEDM) on an EPICS base
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
24
Test Implementation - Architecture
MEDM MEDM
RecordInp: SNMP
RecordInp: SNMP
CA Server CA Server
CA ClientCA Client
RecordInp: SNMP
CA Server
SNMPAgent
MIB
devSNMP
SNMPAgent
MIB
SNMPAgent
MIB
devSNMP devSNMP
Network
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
25
Test Implementation - Info. Flow
MEDM
CA Client
MEDM
CA Client
RecordInp: SNMP
CA Server
RecordInp: SNMP
CA Server
RecordInp: SNMP
CA Server
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
26
Test Implementation - Mon. Resources• Some resources monitored:
– Hard disk partition usage (total, available, used, percentage used, alarm limit)
– Avg CPU usage over 1 min
– System up time (from SNMP daemon start)
– Inbound Packet Errors
– Uncast Outbound Packets
– SNMP daemon process check
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
27
Example Implementation - DESY• Currently EPICS with devSNMP is being used at DESY to monitor key
switches and routers
– Network Traffic
– Status
• Solaris and Linux PC clusters to be monitored in the future
• In total around 25 managed devices, but this is increasing all the time
• More information on EPICS/devSNMP at DESY:
– http://www-mks2.desy.de/content/e4/e40/e41/e12212/index_ger.html
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
28
Extension Possibilities• EPICS has limitations as a management system:
– EPICS is a static system.
– Records have limited analysis and reaction capabilities, in particular, no rule based events
• For dynamic management we can forward information from EPICS records to an expert management system – SysMES (Camilo Lara, et al.)
• Allows complex analysis and reaction to the data obtained from SNMP
• Management system must have CA Client to communicate with EPICS records
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
29
Current State• Interface between CA Client and SysMES has been written
• Interface between the cluster monitoring systems LEMON and Ganglia have been defined and we are in the process of implementation
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
30
Current State - Architecture
MEDM MEDM
RecordInp: SNMP
RecordInp: SNMP
CA Server CA Server
CA ClientCA Client
RecordInp: SNMP
CA Server
SNMPAgent
MIB
devSNMP
SNMPAgent
MIB
SNMPAgent
MIB
devSNMP devSNMP
SysMESClientInterface
CA Client
Network
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
31
Summary• SNMP:
– Is the standard for network management in almost all modern networked devices (eg: PCs, work stations, bridges, switches, routers, ...)
– Widely implemented protocol with a large knowledge base
– Very low system resource usage
– A lot of system information is stored in node MIB Trees (which SNMP can access)
• EPICS:
– Widely implemented control system with a huge support base
– Allows input and output to a vast array of devices
• Through device support for SNMP, these can be combined to create a monitoring system
• This can be extended by forwarding the monitoring data to an expert management system (such as SysMES)
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
32
Thanks• Many thanks to all who have helped, but especially:
– Camilo Lara Coordinator, KIP
– Albert Kagarmanov devSNMP at DESY
Marcelo AlcocerKIP / ICL
CBM Conference 2006Cluster Monitoring with EPICS and SNMP
33
The End
Thank you for your attention
Any questions?