33
Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1 Cluster Monitoring with EPICS and SNMP

Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Embed Size (px)

Citation preview

Page 1: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

1

Cluster Monitoring with EPICS and SNMP

Page 2: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

2

Motivation• We wish to monitor the ALICE HLT analysis cluster – 500 PCs

• The analysis of data obtained from the ALICE experiment will take a long time, therefore a stable analysis cluster is needed

• To ensure stability, this cluster must be constantly monitored

• Using the EPICS architecture with SNMP support it is possible to monitor such a PC cluster

Page 3: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

3

Contents• Cluster Management

– SNMP

• MIB Trees

• SNMP Operations

• Using data from SNMP

– EPICS

• Overview

• Channel Access

• Record Display

• Device Support

– devSNMP

• Management Possibilities

• Test Implementation

– Overview

– Software

– Monitored Resources

– Example Implementation

– Extended Implementation

• Extension Possibilities

• Current State

• Summary

Page 4: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

4

Cluster Management• Nowadays PC clusters are widely used for data analysis in many

settings, such as in physics experiments or commercial organisations

• These clusters often consist of hundreds to thousands of individual PCs (nodes)

• In order to maintain a healthy, efficient cluster, key resources of the nodes must be monitored, eg:

– Hard disk usage

– Processor usage

– Running processes, etc...

• What is the best way of obtaining this information from the nodes?

– Self monitoring?

– Operating system logging?

– SNMP?

Page 5: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

5

Simple Network Management Protocol• Simple Network Management Protocol (SNMP) is a management

protocol for gathering statistical data about network/host traffic and the behaviour of network components

• It is a telecom industry standard protocol and therefore most standardized organizations and main vendors support SNMP

• It creates an extensive Management Information Base (MIB) on the host system, which is a database of information useful for network management

• MIB objects are organised in a tree structure that includes public (standard) and private branches

• These MIBs contain key system resource information which can be used for monitoring purposes

Page 6: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

6

MIB Tree - Graphical View

sysDescr = 1 sysUpTime = 3

dskTotal = 6 dskAvail = 7

mgmt = 2

iso = 1

org = 3

dod = 6

internet = 1

MIB-2 = 1

private = 4

system = 1

enterprises = 1

ucdavis = 2021

dskTable = 9

dskEntry = 1

• MIB tree can referred to symbolically or numerically

– Eg: iso.org.dod.internet.mgmt.mib-2.system.sysUpTime = 1.3.6.1.2.1.1.3

Page 7: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

7

MIB Tree - Output View+--iso(1)

   |   +--org(3)      |      +--dod(6)         |         +--internet(1)            |            +--directory(1)            |            +--mgmt(2)            |  |            |  +--mib-2(1)            |     |            |     +--system(1)            |     |  |            |     |  +-- -R-- String    sysDescr(1)            |     |  |        Textual Convention: DisplayString            |     |  |        Size: 0..255            |     |  +-- -R-- ObjID     sysObjectID(2)            |     |  +-- -R-- TimeTicks sysUpTime(3)            |     |  +-- -RW- String    sysContact(4)            |     |  |        Textual Convention: DisplayString            |     |  |        Size: 0..255            |     |  +-- -RW- String    sysName(5)            |     |  |        Textual Convention: DisplayString            |     |  |        Size: 0..255            |     |  +-- -RW- String    sysLocation(6)            |     |  |        Textual Convention: DisplayString            |     |  |        Size: 0..255            |     |  +-- -R-- INTEGER   sysServices(7)            |     |  |        Range: 0..127            |     |  +-- -R-- TimeTicks sysORLastChange(8)            |     |  |        Textual Convention: TimeStamp            |     |  |          

  |     |  +--sysORTable(9)            |     |     |            |     |     +--sysOREntry(1)            |     |        |  Index: sysORIndex            |     |        |            |     |        +-- ---- INTEGER   sysORIndex(1)            |     |        |        Range: 1..2147483647            |     |        +-- -R-- ObjID     sysORID(2)            |     |        +-- -R-- String    sysORDescr(3)            |     |        |        Textual Convention: DisplayString            |     |        |        Size: 0..255            |     |        +-- -R-- TimeTicks sysORUpTime(4)            |     |                 Textual Convention: TimeStamp            |     |            |     +--interfaces(2)            |     |  |            |     |  +-- -R-- Integer32 ifNumber(1)            |     |  |            |     |  +--ifTable(2)            |     |     |            |     |     +--ifEntry(1)            |     |        |  Index: ifIndex            |     |        |            |     |        +-- -R-- Integer32 ifIndex(1)            |     |        |        Textual Convention: InterfaceIndex            |     |        |        Range: 1..2147483647            |     |        +-- -R-- String    ifDescr(2)            |     |        |        Textual Convention: DisplayString            |     |        |        Size: 0..255            |     |        +-- -R-- EnumVal   ifType(3)

Page 8: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

8

SNMP Operations - Overview• SNMP has simple client-server interactions with few operations to

access information held in the MIB tree:

– {Get} {Set} {GetNext} {Walk} {Table} {Trap} {Translate}

• These operations can query local MIB trees, or those of networked machines

SNMPAgent

MIBM

an

age

d D

evice

SNMPAgent

MIB

SNMPAgent

MIB

SNMPAgent

MIB

SNMPAgent

Network

SNMP Operation

Page 9: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

9

SNMP Operations - Command Struct.• Typical SNMP {get} command structure:

Operation Community PC to Query MIB Object to query

• Output:

MIB Object queried Object Type Object Value

Page 10: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

10

Using Data from SNMP• Once the information has been obtained from the MIB trees it must be

fed into a control system for it to be useful in a management context

• This might process the information, store it for later analysis, or simply display it using a Graphical User Interface (GUI)

• Many systems currently exist:

– EPICS

– Ganglia

– Lemon

Page 11: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

11

EPICS - Overview• One such system is the Experimental Physics and Industrial Control

System (EPICS)

– www.aps.anl.gov/epics

• It is currently in use in over 12 organizations to control devices in major projects such as Particle Accelerators, Telescopes, and Large Experiments

– GSI, SLAC, ANL, DESY, LANL, ...

• Therefore, huge support and knowledge base

• It is based on a client/server network model, with servers holding information in Records which can be accessed by the clients

Page 12: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

12

EPICS - Architecture

RecordField 1: xField 2: yField 3: z

RecordField 1: xField 2: yField 3: z

EPICS Clients

EPICS Servers

Network

Page 13: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

13

EPICS - Channel Access• Remote access to EPICS records is achieved through the Channel

Access (CA) protocol

• This requires a CA server to be running on the EPICS server, and a CA client to be running on the EPICS client

• These are usually already integrated into EPICS clients/servers when they are created

Page 14: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

14

EPICS - Architecture

RecordField 1: xField 2: yField 3: z

RecordField 1: xField 2: yField 3: z

EPICS Clients

EPICS Servers

Network

CA Server CA Server

CA ClientCA Client

Page 15: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

15

EPICS - Record Display• The information from EPICS records can be displayed by a GUI:

MEDM

Page 16: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

16

EPICS - Record Display

GumTree

Page 17: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

17

EPICS - Device Support• Records can be interfaced to numerous devices

• These devices can be hardware or software

• Interfacing allows information from device to be input into EPICS records

• This interfacing is known as device support

Page 18: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

18

EPICS - Architecture

RecordField 1: xField 2: yField 3: z

RecordField 1: xField 2: yField 3: z

EPICS Clients

EPICS Servers

Network

CA Server CA Server

CA ClientCA Client

Support Support

Page 19: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

19

Device Support for SNMP - devSNMP• devSNMP is the device support for SNMP

• Allows the input of data from SNMP into EPICS records

– Sets input field of a record to an SNMP {get} operation

• It is configured for the open source product, NET-SNMP

– This is simply one particular implementation of SNMP

– www.net-snmp.org

Page 20: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

20

Device Support for SNMP - devSNMP• SNMP {get} command:

• Record definition file:

record (stringin, “System_Description"){

field (DTYP,"Snmp")

field (INP,"@localhost public system.sysUpTime.0 STRING:100")

field (SCAN,"5 second")}

Page 21: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

21

Management Possibilities• EPICS records are capable of carrying out simple calculations and

conditionality relations – nothing very complicated

• The data from SNMP can therefore be used to control other devices interfaced with EPICS records

• One reaction possibility is an SNMP {set} operation, which writes values to a MIB

• However, the current release of devSNMP supports only {get} operation

• Other SNMP command support planned for the future

Page 22: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

22

Test Implementation - Overview• Carried out at the Linux PC Cluster at the Kirchhoff Institute for

Physics, University of Heidelberg

• 32 PCs running SuSE 9 Linux OS

Page 23: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

23

Test Implementation - Software• EPICS Servers:

– 30 cluster nodes (2.4 and 2.6 kernels) running EPICS soft IOCs with devSNMP

– NET-SNMP tool set and libraries installed on each node

• EPICS Clients:

– Two cluster nodes (2.6 kernel) running an installation of Motif Editor and Display Manager (MEDM) on an EPICS base

Page 24: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

24

Test Implementation - Architecture

MEDM MEDM

RecordInp: SNMP

RecordInp: SNMP

CA Server CA Server

CA ClientCA Client

RecordInp: SNMP

CA Server

SNMPAgent

MIB

devSNMP

SNMPAgent

MIB

SNMPAgent

MIB

devSNMP devSNMP

Network

Page 25: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

25

Test Implementation - Info. Flow

MEDM

CA Client

MEDM

CA Client

RecordInp: SNMP

CA Server

RecordInp: SNMP

CA Server

RecordInp: SNMP

CA Server

Page 26: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

26

Test Implementation - Mon. Resources• Some resources monitored:

– Hard disk partition usage (total, available, used, percentage used, alarm limit)

– Avg CPU usage over 1 min

– System up time (from SNMP daemon start)

– Inbound Packet Errors

– Uncast Outbound Packets

– SNMP daemon process check

Page 27: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

27

Example Implementation - DESY• Currently EPICS with devSNMP is being used at DESY to monitor key

switches and routers

– Network Traffic

– Status

• Solaris and Linux PC clusters to be monitored in the future

• In total around 25 managed devices, but this is increasing all the time

• More information on EPICS/devSNMP at DESY:

– http://www-mks2.desy.de/content/e4/e40/e41/e12212/index_ger.html

Page 28: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

28

Extension Possibilities• EPICS has limitations as a management system:

– EPICS is a static system.

– Records have limited analysis and reaction capabilities, in particular, no rule based events

• For dynamic management we can forward information from EPICS records to an expert management system – SysMES (Camilo Lara, et al.)

• Allows complex analysis and reaction to the data obtained from SNMP

• Management system must have CA Client to communicate with EPICS records

Page 29: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

29

Current State• Interface between CA Client and SysMES has been written

• Interface between the cluster monitoring systems LEMON and Ganglia have been defined and we are in the process of implementation

Page 30: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

30

Current State - Architecture

MEDM MEDM

RecordInp: SNMP

RecordInp: SNMP

CA Server CA Server

CA ClientCA Client

RecordInp: SNMP

CA Server

SNMPAgent

MIB

devSNMP

SNMPAgent

MIB

SNMPAgent

MIB

devSNMP devSNMP

SysMESClientInterface

CA Client

Network

Page 31: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

31

Summary• SNMP:

– Is the standard for network management in almost all modern networked devices (eg: PCs, work stations, bridges, switches, routers, ...)

– Widely implemented protocol with a large knowledge base

– Very low system resource usage

– A lot of system information is stored in node MIB Trees (which SNMP can access)

• EPICS:

– Widely implemented control system with a huge support base

– Allows input and output to a vast array of devices

• Through device support for SNMP, these can be combined to create a monitoring system

• This can be extended by forwarding the monitoring data to an expert management system (such as SysMES)

Page 32: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

32

Thanks• Many thanks to all who have helped, but especially:

– Camilo Lara Coordinator, KIP

– Albert Kagarmanov devSNMP at DESY

Page 33: Marcelo Alcocer KIP / ICL CBM Conference 2006 Cluster Monitoring with EPICS and SNMP 1

Marcelo AlcocerKIP / ICL

CBM Conference 2006Cluster Monitoring with EPICS and SNMP

33

The End

Thank you for your attention

Any questions?