94
LHCb PC Farm Monitoring and Control System Domenico Galli, Bologna JCOP Project Team Meeting Genève, 28 April 2005

LHCb PC Farm Monitoring and Control System

  • Upload
    suchi

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

LHCb PC Farm Monitoring and Control System. Domenico Galli, Bologna. JCOP Project Team Meeting Genève, 28 April 2005. Outline. Overview, aim of the system, software framework, software component and their deployment, releases. Guidelines followed in software development. Main Components: - PowerPoint PPT Presentation

Citation preview

Page 1: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System

Domenico Galli, Bologna

JCOP Project Team Meeting

Genève, 28 April 2005

Page 2: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 2Domenico Galli

Outline Overview, aim of the system, software framework,

software component and their deployment, releases.

Guidelines followed in software development.

Main Components: Task Manager

Monitoring System

IPMI Power Manager

Utilities: Logger

Process Controller

Page 3: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 3Domenico Galli

Overview LHCb PC Farm Monitoring and Control

System has been designed mainly for the LHCb L1/HLT event filter farm.

The system has been developed for Linux PCs but will be ported to MS Windows in order to be used also for PC involved in LHCb detector monitor and control.

It uses DIM (Distributed Information Management System) as communication layer and is accessible both through a command line interface and through a PVSS graphical interface.

Page 4: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 4Domenico Galli

Aim of the System Monitoring

Display of relevant parameters concerning the status of the farm.

Induce the transition of a state machine to an alarm state when the monitored parameters indicate error/warning conditions.

Control Action execution (system reboot, process

start/stop, etc.) triggered by manual command or by a state machine transition.

Page 5: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 5Domenico Galli

PVSS (Prozessvisualisierungs- und Steuerungs-System)

To build an interface of the farm monitor system coherent with the monitor of the detector hardware, we make use of PVSS SCADA (Supervisory Control and Data Acquisition) tool.

PVSS provides: runtime DB, automatic archiving of data to permanent

storage; alarm generation; easy realization of graphical

panels; various protocols to

communicate via network.

DIM

Page 6: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 6Domenico Galli

Sensors and Actuators PVSS need to be interfaced with farm nodes:

to receive monitor data;

to issue command to the nodes;

On each node a few light processes (light-weight servers) runs:

monitor sensors;

command actuators.

PVSS-to-nodes interface is achieved using DIM light-weight network communication layer.

Page 7: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 7Domenico Galli

Farm node

sensor

actuator

PVSS

DIM (Distributed Information Management System)

DIM network communication layer is already integrated with PVSS: It is light-weight and efficient. It allows bi-directional communication. It uses a name server for

services/commands publication and subscription.

Page 8: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 8Domenico Galli

Monitoring and Control System Components Light-weight servers (to be installed on each farm node):

Task Manager Server 9 Monitor Servers Logger Server

Control software to be installed on each control PC: Power Manager (IPMI) Server Process Controller

Command line clients (can run on any node on the network): 3 Task Manager Clients 2 Power Manager Clients 1 Logger Client

PVSS clients (can run on any node on the network): Task Manager & logger Panel. 9 Monitor Panels. Power Manager Panel.

Page 9: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 9Domenico Galli

Monitoring and Control System Component Deployment

Sub-Farm 1Sub-Farm node

DIM server

IPMI BMC(firmware)

sensor

actuator

Sub-Farm nodeDIM server

IPMI BMC(firmware)

sensor

actuator

Sub-Farm nodeDIM server

IPMI BMC(firmware)

sensor

actuator

Control PC

DIM server

IPMI DIMServer

sensor

actuator

PVSS EVM

PVSS DBM

PVSS ARCH

PVSS-DIM

PVSS-FSM

PVSS Dist

DIM-DNS

PVSS CTRL

Sub-Farm 2Control PC

PVSS Dist

Global Control PCPVSS Dist

Monitor Console PCPVSS Remote UI

Page 10: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 10Domenico Galli

Releases 0.1 – 22 November 2004

0.2 – 13 January 2005 New Task Manager features

Process Controller

1.0 – before June 2005 IPMI Power Manager.

New PVSS panels.

PVSS panels to set the threshold for Finite State Machine.

PVSS trend plots and archiving.

Page 11: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 11Domenico Galli

Sensor Access to System Data In Linux kernel 2.4 procfs is the only file

system-like interface to internal data structures in the kernel (the other interface is ioctl()/sysctl()).

In Linux kernel 2.6 sysfs has been added to shows all of the devices (virtual and real) and their inter-connectedness within the system.

In forthcoming kernel versions: procfs: will contain only process statistics.

sysfs: will contain device statistics (network interfaces, TCP/IP stack, temperatures and fan speed, SCSI interfaces, etc.).

Page 12: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 12Domenico Galli

Sensor Access to System Data (II) At present, in kernel 2.6:

network interfaces on both procfs and sysfs;

temperatures and fan speed on sysfs;

TCP/IP stack, CPU states, memory, on procfs.

Monitor sensors at present use procfs (excluding temperature sensor which uses sysfs in the version for the kernel 2.6).

In the future probably most of the sensors (all but prosess sensor) should be modified to access sysfs.

Page 13: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 13Domenico Galli

Sensor Access to System Data (III) In MS Windows the interface to internal data

structures in the kernel is different.

A procfs interface is provided by cygwin, but is rather poor.

We are investigating on using WMI API (Windows Management Interface, .NET platform) to access internal data structures in the kernel for monitoring.

Page 14: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 14Domenico Galli

Guidelines for Light-Weight Server Developement Guidelines followed in sensors and actuators

development: Function written in plain C with particular control of

memory allocation (e.g., if possible, memory is allocated once and for all during sensor initialization).

Low level access (not stream access) to procfs and sysfs and one-shot data read.

Each time a (even partial) read operation is performed on the procfs/sysfs the kernel takes time to produce the entire set of data provided in the file.

When possible for complex tasks use maintained libraries (like libprocps) to cope with changes in kernel version.

Page 15: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 15Domenico Galli

Task Manager It is a tool to start, list and stop processes on

every farm node from a central console. It uses TCP as transport protocol, through the

DIM network communication layer. To track processes (e.g. to list or to stop them)

it set an additional environment variable (a descriptive string, named UTGID, User Assigned Thread Group Identifier) to every started process.

no more then one process can be started with the same UTGID.

This way, the stamp imposed to processes survives to an incidental Task Manager crash.

UTGID can be defined by the user or can be automatically generated as <binary executable image>_<instance #>

Page 16: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 16Domenico Galli

Task Manager Components tmSrv: Task Manager Server (to be executed on

each farm node).

tmStart, tmLs, tmKill, tmStop: Task Manager command-line clients (can be executed on any PC on the network).

tableViewUTGID.pnl, startPanel.pnl, killPanel.pnl, stopPanel.pnl: PVSS GUI clients (can be executed on any PC on the network with a PVSS remote UI).

Page 17: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 17Domenico Galli

Task Manager Features The started process can have a clean

environment (except UTGID) or can inherit the task manager environment.

An arbitrary number of new evironment variables can be added to started processes.

The stdout and stderr of started processes can be thrown to /dev/null or can be redirected to the logger.

It can start processes as daemons (process group leader, i.e. new session ID, umask reset, no controlling tty, ignored SIGCLD).

It can set the scheduler of the started processes (time sharing, fifo or round-robin).

Page 18: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 18Domenico Galli

Task Manager Features (II) It can set the nice level (for the evaluation of the

dynamic priority) to the processes started with the time sharing scheduler.

It can set the static (real-time) priority to processes started with fifo and round-robin scheduler.

It can set the username of the started processes.

It is immediately signaled about the termination of a started process (through the SIGCHLD signal) and:

It logs alternatively the exit code or the number of the signal which caused the process to stop;

It immediately refresh the list of started processes (published using DIM).

Page 19: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 19Domenico Galli

Task Manager Features (III) The kill DIM CMD can send a chosen signal to a

single process or to all the processes whose UTGID matches a certain POSIX.2 wildcard pattern.

The stop DIM CMD sends a chosen signal to a process and returns immediately but it also triggers off the deferred sending of a SIGKILL signal to the same process (in a different thread).

This way the process is left the chance to exit gracefully on a SIGTERM reception, but, if it fails, it is stopped abruptly by a SIGKILL after a certain, chosen, delay.

Page 20: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 20Domenico Galli

Task Manager Command Line Interface tmStart [-m hostname_pattern][-c][-D NAME=value...][-d]

[-s scheduler][-p nice_level][-r rt_priority][-n user_name][-u utgid][-o][-e][-w wd] path

[arg...]

Starts a new process on one or more farm PCs. tmLs[-m hostname_pattern][utgid_pattern]

Lists processes which have UTGID set.

tmKill[-m hostname_pattern][-s sig] utgid_pattern

Sends a signal to process(es).

tmStop[-m hostname_pattern][-s sig][-d delay] utgid_pattern

Sends a signal to process(es).

If the process(es) is not dead after delay seconds, sends a SIGKILL signal. Without blocking the client.

Recognize POSIX.2 wildcard pattern (*, ?, character classes [027], ranges [3-7], complementation [!027] or [!3-7]) in hostname and UTGID.

Page 21: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 21Domenico Galli

Task Manager PVSS interface

Page 22: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 22Domenico Galli

Task Manager PVSS interface (II)

Page 23: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 23Domenico Galli

Task Manager PVSS interface (III)

Page 24: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 24Domenico Galli

Task Manager PVSS interface (IV)

Page 25: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 25Domenico Galli

Task Manager PVSS interface (V)

Page 26: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 26Domenico Galli

Monitor Sensors 9 light-weight monitor sensors for nodes

developed: Temperatures and fans speeds; CPU info (CPU number, brand and model, cache size, clock). CPU states (user, system, nice, idle, iowait, irq, softirq); Hardware interrupt rates (separately per CPU and per irq

source); Memory usage; Process status (including scheduling class and real time

priority); Network Interface Card counters’ rates and error fractions; Network Interface Interrupt Coalescence

TCP/IP stack rates and error fraction.

Page 27: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 27Domenico Galli

1 - Temperature and Fan Speed Sensor It collects temperature of the sensors integrated

on the motherboard and fan speeds. Uses lm_sensors software. Needs bus drivers (for ISA or I2C/SMBus) and sensor

chip drivers. Integrated in Linux kernel tree since kernel 2.6.

This server has been developed in 2 versions: one for Linux kernel 2.4 which get data from procfs and the other for Linux kernel 2.6, which get the data from sysfs.

E.g., on sysfs: /sys/devices/platform/i2c-0/0-0290/temp1_input /sys/devices/platform/i2c-0/0-0290/fan1_input

Page 28: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 28Domenico Galli

1 - Temperature and Fan Speed Sensor (II) Hardware compatibility issues:

Most bus drivers have been ported to kernel 2.6;

However, only 43% of the chip drivers have been ported.

The IPMI alternative: IPMI (Intelligent Platform Management Interface) v1.5

can get the same information in a more portable way (it is OS-independent).

The configuration of the IPMI environment monitoring system is responsibility of the hardware vendor (more uniformity and reliability of data is expected).

Probably, in new software releases, temperatures and fan speeds will be collected using IPMI LAN interface.

Page 29: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 29Domenico Galli

1 – Temperature and Fan Speed Sensor: PVSS Interface

Node hostname

Page 30: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 30Domenico Galli

1 – Temperature and Fan Speed Sensor: PVSS Interface (II)

Page 31: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 31Domenico Galli

2 – CPU Information Sensor This server provides static informations

about the CPU(s), e.g.: The CPU brand and model identifier string;

The CPU family, model and sub-version (revision) identifier;

The clock frequency of the CPU and the CPU cache size.

The number of hyper-threading cores in that physical CPU.

The CPU computational power in bogomips.

Page 32: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 32Domenico Galli

2 – CPU information Sensor: PVSS Interface

right-click

Page 33: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 33Domenico Galli

3 - CPU States Sensor It collects, and evaluates as a percentage, both

the aggregate values and the specific per-CPU values of the fraction of time spent by the CPUs performing different kinds of work:

user: normal processes executing in user mode. nice: niced processes executing in user mode. system: processes executing in kernel mode. idle: not working. iowait: waiting for I/O to complete. irq: servicing interrupts (only in kernel ≥ 2.6). softirq: servicing softirqs (only in kernel ≥ 2.6).

Moreover it collects the global context switch rate. (useful to check the operation of process scheduling).

Page 34: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 34Domenico Galli

3 - CPU States Sensor: PVSS Interface

Page 35: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 35Domenico Galli

3 - CPU States Sensor: PVSS Interface

Page 36: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 36Domenico Galli

4 - Hardware Interrupt Sensor It collects the interrupt rates issued by

hardware device drivers. Data are partitioned per-CPU and per-driver

(timer/PIC/local APIC, rtc, eth0, eth1, etc.).

Average values and maximum values (since DIM server startup) are evaluated.

Useful to control IRQ-to-CPU affinity of the network interfaces.

Page 37: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 37Domenico Galli

4 - Hardware Interrupt Sensor: PVSS Interface

kernel 2.6

Page 38: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 38Domenico Galli

5 - Memory Usage Sensors It collects memory usage statistics.

Available quantities depends on kernel version.

Main collected quantities (more details): Total/Low/High Memory occupation.

Disk cache.

Virtual memory management.

Swapping and paging.

Vmalloc.

Page 39: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 39Domenico Galli

5 - Memory Usage Sensors: PVSS Interface

over-committing

More recently used

Still not copied to disk

Page 40: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 40Domenico Galli

5 - Memory Usage Sensors: PVSS Interface (II)

right-click

Page 41: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 41Domenico Galli

6 - Process Status Sensor It collects for each task the process status, like

“top” or “ps”.

Must cope with deep changes in Linux threading model.

LinuxThreads, kernel ≤ 2.4.19: Each thread has a unique process ID (PID). getpid() function therefore returns different values (PID) for

the different threads of the same process.

NPTL, Native POSIX Threading Library, kernel ≥ 2.4.20

Each thread has a unique identifier called TID (Thread Identifier) while the PID has been replaced by TGID (Thread Group Identifier).

getpid() function therefore returns the same value (TGID) for all threads in a process.

Page 42: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 42Domenico Galli

6 - Process Status Sensor (II) Following changes in threading model, process

data format in procfs depends on kernel version: /proc/<pid>/… until kernel ≤ 2.4.19. /proc/<tgid>/task/<tid>/… starting with kernel ≥

2.4.20.

To cope with changes in kernel version, the process status sensor access kernel data in procfs by means of the maintained (but undocumented) library libproc-3.2.3.so:

from http://procps.sourceforge.net/

(more details)

Page 43: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 43Domenico Galli

6 - Process Status Sensor: PVSS Interface (basic panel)

3-thread process: same TGID, same UTGID, different TIDs

No UTGID (not started by the Task Manager)

TS: time sharing (Linux Default)

Page 44: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 44Domenico Galli

6 - Process Status Sensor: PVSS Interface (advanced panel I)

Page 45: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 45Domenico Galli

6 - Process Status Sensor: PVSS Interface (advanced panel II)

SIZE: size of the core image of the task (code+data+stack)VSIZE: virtual memory usage (lib+exe+data+stack)RSS: Non-swapped physical memory

Page 46: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 46Domenico Galli

6 - Process Status Sensor: PVSS Interface (advanced panel III)

signals

Page 47: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 47Domenico Galli

6 - Process Status Sensor: PVSS Interface (advanced panel IV)

processor

S: Interruptible sleeps: session leaderl: multi-threaded

Page 48: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 48Domenico Galli

7 - Network Interface Sensor It collects Network Interface Card counters and

evaluates transmission rates and error fractions.

It reads also interface name (eth0, eth1, etc.), IP address and MAC address of the system boards.

Cope with the problem of 32-bit rx_bytes/tx_bytes counters in kernel when a Gigabit Ethernet interface is transmitting/receiving at full speed (but sensor must be called at least once every 34 s).

Average values and maximum values (since DIM server startup) are evaluated and can be reset.

(more details)

Page 49: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 49Domenico Galli

7 - Network Interface Sensor: PVSS Interface

If ≠ 0, cable problem or loose connector

frame/sbit/s

bytes/frame

Page 50: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 50Domenico Galli

7 - Network Interface Sensor: PVSS Interface (II)

Page 51: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 51Domenico Galli

8 – Network Interface Interrupt Coalescence Sensor Evaluate the interrupt coalescence ratio of the

network interface, i.e., the ratio between the number of frames received/transmitted by a network interface and the number of interrupts raised by the network interface card.

Usually the Gigabit Ethernet NICS but store the received frames in a buffer and raise an interrupt after an appropriate delay to deliver more than one frame for each interrupt handle execution (thus reducing CPU utilization up to 30% in frame receiving and up to 11% in frame sending).

This mechanism can be tuned by setting appropriate parameters on the NIC (e.g., for Intel e1000, InterruptThrottleRate, RxIntDelay, RxAbsIntDelay, TxIntDelay, TxAbsIntDelay).

Page 52: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 52Domenico Galli

8 – Network Interface Interrupt Coalescence Sensor: PVSS Interface

1.8 Ethernet frames/interrupt

Page 53: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 53Domenico Galli

9 - TCP/IP Stack Sensor It collects TCP/IP stack counters and evaluates

rates and error fraction. Essentially routing errors and

fragmentation/reassembly errors. Average values and maximum values (since DIM

server startup or since the last reset) are evaluated and can be reset.

Collected quantities (more details): IP, TCP, UDP I/O rates; IP forwarding and fragmentation/reassembling rates; IP, TCP, UDP error fractions; IP forwarding and fragmentation/reassembling

fractions; IP forwarding and fragmentation/reassembling error

fractions;

Page 54: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 54Domenico Galli

9 - TCP/IP Stack Sensor: PVSS Interface

Page 55: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 55Domenico Galli

9 - TCP/IP Stack Sensor: PVSS Interface (II)

Page 56: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 56Domenico Galli

9 - TCP/IP Stack Sensor: PVSS Interface (III)

Page 57: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 57Domenico Galli

9 - TCP/IP Stack Sensor: PVSS Interface (IV)

Page 58: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 58Domenico Galli

9 - TCP/IP Stack Sensor: PVSS Interface (V)

Page 59: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 59Domenico Galli

9 - TCP/IP Stack Sensor: PVSS Interface (VI)

Page 60: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 60Domenico Galli

Finite State Machine Alarm Generation A PVSS script periodically compare the

monitored value with its threshold, and if it is exceeded, a state machinetransition istriggered.

A button onthe monitorpanels toconfigure thethresholds forwarning &error state ofthe statemachine.

Page 61: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 61Domenico Galli

Finite State Machine Alarm Generation (II) If the button is pressed, a new panel is open, in

which an “expert user”can set-up thealarmthresholds.

Page 62: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 62Domenico Galli

Power Manager It is a tool to switch-on, switch-off, power-

cycle, shut-down and show power status of every farm node from a central console.

Runs on the Control PCs.

To communicate with clients (which send commands and check status), uses the DIM network communication layer, as server.

To operate on nodes (which are switched on and off), uses IPMI (Intelligent Platform Management Interface), as client.

Make use of IPMItool’s libintf_lan.so library, hacked, in order to make it thread-safe (no more global variables, no more signals & longjmps to time-out).

Control PC

IPMI

powermanager

DIMclient

DIMSFN-001-03

BMCPower

ManagerServer

Farm Node

Page 63: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 63Domenico Galli

Power Manager – IPMI Interfaces IPMI has two kinds of interfaces:

KCS (Keyboard Controller Style) interface (AKA open interface)

Local interface (interface to the host OS), unauthenticated.

Can be accessed through the openIPMI linux software.

Can’t be used to swich on a PC or to power cycle a hung-up PC.

LAN interface Network interface, session-based, authenticated.

Designed to be always available (even when the system is powered down or when the OS is hung or inactive).

Hardware implementation.

OS independent.

Page 64: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 64Domenico Galli

Power Manager – IPMI LAN Interface Server side (farm node):

Harware implementation.

NIC hardware redirects also to BMC the Ethernet frames containing datagrams destined to UDP port 623.

Configured by means of PC startup configuration utility.

May use DHCP to set up networkparameters.

No need of additionalsoftware.

Client side (control PC). Client software, e.g.: IPMItool,

freeIPMI, IPMIsh, LHCb PowerManager Server.

ManagementNetwork

Controller

(BMC)Baseboard

ManagementController

Control PC(IPMI client)

UDP port 623

LAN

Farm node

otherEthernetframes

Page 65: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 65Domenico Galli

Power Manager Deployment Each Control PC runs a DIM server interfaced to

IPMI and publishes, for each node, a command and a service.

Control PC

Power ManagerServer

SFN-001-01

BMC

SFN-001-02

BMC

SFN-001-03

BMC

SFN-001-04

BMC

SFN-001-05

BMC

IPMI

DIM Services:/SFN-001-01/power_status/SFN-001-02/power_status/SFN-001-03/power_status

DIM Commands:/SFN-001-01/power_switch on|off|soft_off|cycle/SFN-001-02/power_switch on|off|soft_off|cycle/SFN-001-03/power_switch on|off|soft_off|cycle

PVSS-DIMclient

PVSSGUI

Farm Nodes

DIM

CMD-lineclient

Page 66: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 66Domenico Galli

Power Manager Commands DIM CMD: /<HOSTNAME>/power_switch takes an

argument, which can be (from IPMI specifications):

on: power-up the chassis. off: power-down the chassis (without a clean shut-down

of the OS). cycle: power-down, wait 1 second, and power-up again. soft_off: initiate a soft-shutdown of OS via ACPI by

emulating a fatal over-temperature condition. hard_reset: pulse the system reset signal. pulse_diag: pulse a version of a diagnostic interrupt

that goes directly to the processor(s). This is typically used to cause the operating system to do a diagnostic dump (OS dependent).

Page 67: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 67Domenico Galli

Power Manager Features It copes with long IPMI response times (> 0.7

s) and with very long timeout times (~ 16 s) in case of a disconnected node:

by using one thread for each node to be contacted, in order to parallelize IPMI connections.

It copes with IPMI ability to receive only one command at a time:

if the NIC BMC is processing a command, it is not able to receive or queue other commands. The second command fails.

E.g.: while the Power Manager is doing a periodic update of the power status of a certain node, it is not able to switch on/off the same node.

Power Manager arbitrates between commands sent to the same node and is able to defer overlapped commands.

Page 68: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 68Domenico Galli

Power Manager Features (II) It copes with IPMI configuration in which OS and

BMC have the same IP address: Two answers are sent back from the node:

one from the OS (ECONNREFUSED);

one from the BMC.

ECONNREFUSED takes priority over any other received datagram;

that means that the Connection Refused shows up before the response packet, regardless of the order they were sent out. (unless the response is read before the connection refused is returned)

Page 69: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 69Domenico Galli

Power Manager Development Status A power manager DIM server and 2 command-

line DIM clients are ready and working: ipmiSrv: Power Manager Server (to be executed on each Control

PC). At present recognize only on and off command arguments. pwSwitch: Power Manager command-line client (can be

executed on any PC on the network). pwStatus: Power Manager tty-oriented client (can be executed

on any PC on the network).

Tested on a Dell PowerEdge SC 1425 without OS.

A PVSS DIM client is under development. Basically one PVSS panel showing:

A list of the controlled nodes with their power status (on, off).

Buttons for power on / off / soft_off / cycle / power_reset / pulse_diag.

Page 70: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 70Domenico Galli

Power Manager Command-Line Clients pwSwitch [-m hostname] on|off|cycle|soft_off

issues a switch command to host hostname;

hostname can be a POSIX.2 wild-card pattern;

if hostname is not specified command is issued to all nodes found on DIM DNS.

pwStatus [-m hostname] returns the power status of the host hostname (on, off, not reachable);

hostname can be a POSIX.2 wild-card pattern;

if hostname is not specified the status of all nodes found on DIMDNS is returned.

Page 71: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 71Domenico Galli

Power Manager Command-Line Clients

N.B.: nodelhcbcn2 isdisconnected!

service time out

command time out

command time out

Page 72: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 72Domenico Galli

DIM Logger It is a tool which allows to centrally collect

messagges (debug/info/warning/error/fatal) from applications in execution on the farm.

Uses TCP as transport layer through the DIM network communica-tion layer.

Uses aPOSIX.1FIFO (akanamed pipe)as localbuffer.sent to stderr

by ld.so

Page 73: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 73Domenico Galli

DIM Logger Features It can collect messagges sent to stderr/stdout

from system software (e.g. the dynamic linker, see screen shot) by redirecting stderr to the FIFO.

It is congestion-proof. In a quasi-congested network all application trying to

send log messages via TCP could hung-up.

DIM logger can use (and can, of course, don’t use) a Linux extension of POSIX.1 FIFO (non-blocking RW open of the FIFO, O_RDWR|O_NONBLOCK|O_APPEND) which allows:

to make non-blocking write to the FIFO.

if the FIFO fills-up completely due to network congestion, to automatically drop messages.

Page 74: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 74Domenico Galli

DIM Logger Components logSrv: Logger Server (to be executed on each

farm node).

logViewer: Logger tty-oriented client (can be executed on any PC on the network).

loggerPanel.pnl: Logger PVSS GUI clients (can be executed on any PC on the network with a PVSS remote UI).

Page 75: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 75Domenico Galli

Process Controller It is a process to be executed on the control PCs,

which controls the processes executing on all the farm nodes, immediately restarting them in case of crash.

Reads from an XML file (in future through DIM) the list of the processes which must be executed on each farm node and their execution environment (command-line arguments, environment variables, user, scheduler type, priority, respawn parameters, etc.).

Works by contacting the Task Manager on each farm node. A process crash triggers the process respawn in a

few tenths of seconds (through a mechanism based on SIGCHLD asynchronous signal).

Moreover, a respawn control mechanism is implemented:

If a process is respawned more than N times in T1 seconds, respawn is disabled for T2 seconds.

Page 76: LHCb PC Farm Monitoring and Control System

More Details

Page 77: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 77Domenico Galli

5 - Memory Usage Sensors (I) It collects memory usage statistics.

Available quantities depends on kernel version.

Main collected quantities: Total/Low/High Memory occupation. High memory

(above ~896 MiB of physical memory) can’t be used for kernel data structures and is slower to access than low memory (the kernel must use tricks to access this memory). A system crash if it gets out of low memory.

Disk cache: Buffers (relatively temporary storage for raw disk blocks), Cached (in-memory cache for files read from the disk), Mapped (files which have been mmaped, such as libraries).

Slab: in-kernel data structures cache.

Page 78: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 78Domenico Galli

5 - Memory Usage Sensors (II) Main collected quantities (cont’d):

Virtual memory management. Active (used more recently and usually not reclaimed unless absolutely necessary), Inactive (less recently used and more eligible to be reclaimed for other purposes), Dirty (waiting to get written back to the disk), Committed_AS (related to Linux memory over-committing policy: estimate of how much RAM is needed to make a 99.99% guarantee that the OOM killer, Out Of Memory killer, is not invoked).

Page 79: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 79Domenico Galli

5 - Memory Usage Sensors (III) Main collected quantities (cont’d):

Swapping and paging: SwapTotal, SwapFree and SwapUsed (memory which has been evicted from RAM, and is temporarily on the disk); SwapCached (memory that once was swapped out, is swapped back in, but still also is in the swapfile; if memory is needed, it doesn't need to be swapped out AGAIN because it is already in the swapfile; this saves I/O); PageTables: amount of memory dedicated to the lowest level of page tables.

Vmalloc: total/used/chunk: vmalloc is a mechanism to map physically non-contiguous memory areas to a contiguous area in virtual memory. Used for storing the swap map information and for loading kernel modules into memory.

back

Page 80: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 80Domenico Galli

6 - Process Status Sensor (I) Collected quantities for each task:

CMD: command name (only the executable name). CMDLINE: command with all its arguments as a string. USER (alias EUSER): effective user name. GROUP (alias EGROUP): effective group ID of the

process. TGID: (Was PID) thread group ID number of the process. UTGID: User assigned unique Thread Group Identifier. TID (alias LWP, SPID): Thread ID, aka light-weight

process ID. PPID: Parent process ID. NLWP: (alias THCNT) Number of light-weight process

(threads) in the process.

Page 81: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 81Domenico Galli

6 - Process Status Sensor (II) Collected quantities for each task (cont’d):

SIZE (alias SZ): size (in kB) of the core image of the task (code+data+stack).

RSS: (alias RSZ, RES) resident set size (in kB): the non-swapped physical memory that a task has used.

SHARE (alias SHR): the amount of shared memory (in kB) used by the task.

VSIZE (alias VSZ, VIRT): virtual memory usage (in kB) of entire process (lib+exe+data+stack).

PSR (alias P): processor that process is currently assigned to (useful to check the operation of process CPU affinity).

Page 82: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 82Domenico Galli

6 - Process Status Sensor (III) Collected quantities for each task (cont’d):

STAT (alias S): multi-character process state: First character:

D Uninterruptible sleep (usually I/O); R Running or runnable (on run queue); S Interruptible sleep (waiting for an event to complete); T Stopped, either by a job control signal or because it is being

traced; X dead (should never be seen); Z Defunct ("zombie") process, terminated but not reaped by its

parent.

Following characters: < high-priority (not nice to other users); N low-priority (nice to other users); L has pages locked into memory (for real-time and custom IO); s is a session leader; l is multi-threaded (using CLONE_THREAD, like NPTL pthreads

do); + is in the foreground process group.

Page 83: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 83Domenico Galli

6 - Process Status Sensor (IV) Collected quantities for each task (cont’d):

%CPU: The task's share of the CPU time since the last update (like top, not like ps), expressed as a percentage of total CPU time per processor.

%MEM: ratio of the process's resident set size to the physical memory on the machine, expressed as a percentage.

CLS: scheduling class of the process: – not reported

TS SCHED_OTHER (time-sharing, dynamic priority, linux default);

FF SCHED_FIFO (real-time, static priority, first in first out);

RR SCHED_RR (real-time, static priority, round robin);

? unknown value.

Page 84: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 84Domenico Galli

6 - Process Status Sensor (V) Collected quantities for each task (cont’d):

RTPRIO: real-time (static) priority. Defined only for real-time tasks (scheduled with SCHED_FIFO or SCHED_RR). It is set to “N/A” for time-sharing tasks (SCHED_OTHER) .

NI: nice value. This ranges from 19 (nicest) to -20. Defined only for time-sharing tasks (SCHED_OTHER). It is set to “N/A” for real-time tasks (SCHED_FIFO or SCHED_RR).

PRI (alias PR): the (dynamic) priority of the task. Defined only for time-sharing tasks. It is set to “RT” for real-time tasks.

STARTED (alias START): time the command started. ELAPSED: elapsed time since the process was started. TIME: cumulative CPU time.

Page 85: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 85Domenico Galli

6 - Process Status Sensor (VI) Collected quantities for each task (cont’d):

TTY (alias TT): controlling tty (terminal).

PENDING: mask of the pending signals.

CATCHED (alias CAUGHT): mask of the caught signals.

IGNORED: mask of the ignored signals.

BLOCKED: mask of the blocked signals.

back

Page 86: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 86Domenico Galli

7 - Network Interface Sensor (I) Collected quantities:

rx_bitRate, tx_bitRate: the total number of bits received and transmitted in a second.

rx_packetsRate, tx_packetsRate: the total number of Ethernet frames received and transmitted in a second.

rx_multicastRate: The total number of multicast Ethernet frames received in a second.

rx_bytes4packet, tx_bytes4packet: the average number of bytes contained in a received or transmitted Ethernet frame.

rx_errorsFrac: fraction of bad Ethernet frames received.

tx_errorsFrac: fraction of transmitted Ethernet frames with packet transmit problems.

Page 87: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 87Domenico Galli

7 - Network Interface Sensor (II) Collected quantities (cont’d):

rx_droppedFrac, tx_droppedFrac : fraction of received and transmitted Ethernet frames dropped by operating system due to buffer overflows or throttling policy.

rx_fifo_errorsFrac, tx_fifo_errorsFrac: fraction of received and transmitted Ethernet frames which encountered the condition of receiver/transmitter fifo overrun.

rx_frame_errorsFrac: fraction of received Ethernet frames with frame alignment errors.

collisionsFrac: fraction of transmitted frames which generates Ethernet collisions in half duplex network.

tx_carrier_errorsFrac: fraction of transmitted Ethernet frames which encountered a condition of transmission errors due to loss of carrier. If this ratio is greater than zero there is probably a cable/connector problem or a bad duplex setting.

back

Page 88: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 88Domenico Galli

9 - TCP/IP Stack Sensor (I) Collected quantities: IP rates:

InReceivesRate, InDeliversRate , ForwDatagramsRate: the rate of IP datagrams received, delivered (to IP user-protocols) and forwarded (to their final destination) from all the network interfaces.

OutRequestsRate: the rate of IP datagrams which local IP user-protocols supplied to IP in requests for transmission.

ReasmReqdsRate, FragReqdsRate: the rate of IP fragments received in a second, which needed to be reassembled at this entity and the rate of datagrams in a second that need to be fragmented.

Page 89: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 89Domenico Galli

9 - TCP/IP Stack Sensor (I) Collected quantities: TCP rates:

InSegsRate, OutSegsRate: the total number of TCP segments received or sent in a second.

Collected quantities: UDP rates: InDatagramsRate, OutDatagramsRate: the rate of

UDP datagrams delivered to UDP users or sent.

Page 90: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 90Domenico Galli

9 - TCP/IP Stack Sensor (III) Collected quantities: IP error fractions:

InHdrErrorsFrac: fraction of input IP datagrams discarded due to errors in their IP headers.

InAddrErrorsFrac: fraction of input IP datagrams discarded because the IP address was not a valid address to be received at this entity.

InUnknownProtosFrac: fraction of input IP datagrams discarded because of an unknown or unsupported protocol.

InDiscardsFrac: fraction of valid input IP datagrams which were discarded (e.g., for lack of buffer space).

OutNoRoutesFrac: fraction of output IP datagrams discarded because no route could be found to transmit them to their destination.

OutDiscardsFrac: fraction of valid output IP datagrams discarded (e.g., for lack of buffer space).

Page 91: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 91Domenico Galli

9 - TCP/IP Stack Sensor (IV) Collected quantities: IP

forwarding/fragmentation/reassembling fractions:

ForwDatagramsFrac: fraction of input datagrams which are forwarded.

InDeliversFrac: fraction of input datagrams which are successfully delivered to IP user-protocols.

ReasmReqdsFrac: average number of fragments received for each datagram received.

ReasmOKsFrac: fraction of received IP datagrams which needed to be reassembled which was successfully reassembled.

FragReqdsFrac: fraction of output datagrams that needed to be fragmented.

FragCreatesFrac: average number of fragments created for each datagram to be sent.

Page 92: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 92Domenico Galli

9 - TCP/IP Stack Sensor (V) Collected quantities. IP

fragmentation/reassembling error fractions (cont’d):

ReasmTimeoutFrac: fraction of failures detected by the IP re-assembly algorithm due to reassembling time-out.

ReasmFailsFrac: fraction of failures detected by the IP re-assembly algorithm for whatever reason.

FragFailsFrac: fraction of output datagrams that needed to be fragmented whose fragmentation failed.

Page 93: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 93Domenico Galli

9 - TCP/IP Stack Sensor (VI) Collected quantities. TCP error fractions:

RetransSegsFrac: fraction of output segments which are retransmitted.

OutRstsFrac: fraction of output segments containing the RST flag.

InErrsFrac: fraction of input segments received in error.

Page 94: LHCb PC Farm Monitoring and Control System

LHCb PC Farm Monitoring and Control System. 94Domenico Galli

9 - TCP/IP Stack Sensor (VII) Collected quantities. UDP error fractions:

NoPortsFrac: fraction of received UDP datagrams for which there was no application at the destination port.

InErrorsFrac: fraction of received UDP datagrams that could not be delivered for reasons other than the lack of an application at the destination port.

back