30
LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I. , Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca, 28-30 October 2015

LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

LOGO

Monitoring system of the JINR Tier-1 and Tier-2

Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca, 28-30 October 2015

Page 2: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

The monitoring system: conceptual phases

Use of the monitoring system

Implementation of a monitoring system obeying the requirements

Model building of the monitoring system

Definition of primary criteria for the monitoring system development

Study analysis of existing systems

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Page 3: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Tier-1 hardware: Control and monitoring facilities

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The tape library

The cooling system

The UPS module Control, computing and disk servers

General view of the complex

● Control, computing and disk servers: ssh, ipmi ● Tape library: http, snmp ● Cooling system: http, snmp ● Uninterruptable power supply: http, snmp

Page 4: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Tier-2 hardware

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

● Control, computing and disk servers: ssh, ipmi ● Uninterruptable power sypply: http, snmp ● Cooling system: http, snmp

Дисковые и вычислительные сервера

Общий вид комплекса

The UPSs

The control, computing and disk servers

General view of the Tier-2 complex

Page 5: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

The monitoring system: Suitability

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Tier-2 and Tier-1 hardware has similar control and tracking facilities

Problems needing solution: • Implementing a united tracking system • Implementing a united storage system of

hardware data sensors • Implementing a prompt response to

hardware failure

Page 6: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

The monitoring system: Selection Criteria

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Versatility

Organize encompass and comfortable interface • The chart system and history storage • The notification system • The data visualization system

Inclusion in the monitoring system of the new hardware • Home-made plugins for gathering sensor data

Authentication system • Kerberos support

Module structure • Addon instalations

Page 7: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Overview of existing monitoring systems

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Expandable Kerberos Modularity Versatility

Nagios Yes Yes Yes Yes

Ganglia Yes No No Cluster monitoring system

Zabbix Yes Yes No Yes

Icinga Yes Yes Yes Yes

Icinga2 Yes Yes Yes Yes

Page 8: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Nagios monitoring system family

op5

Nagios 4.1 Icinga2

Icinga

Shinken

Nagios 3.5

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Page 9: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

The monitoring system: processing data algorithm

Informational display show

State table show

Processing sensor-collected data

Hardware

Data visualization

WEB interface

Gathering data plugins

Sensor-collected data

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Page 10: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

The monitoring system: Principle of work

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Nagios

Plugins

Notification

Visualization

Hardware

Data storage system

Page 11: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

The monitoring system structure scheme

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Page 12: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Computing servers

Hardware gathering data

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Gathering data plugins

Special plugins carry out data gathering Libraries used for gathering data: •Netsnmp •Subprocess (Popen, PIPE)

Check_temperature Check_airflow

Check_smart Check_cpu

Check_tape_status …

Check_capacity Check_load

The tape library The cooling system

UPS

Storage serves

Check_raid_status Check_dir

cmd1=netsnmp.snmpget( netsnmp.Varbind('.1.3.6.1.4.1.318.1.1.14.3.3.1.5.1'), Version = 2, DestHost=argHost, Community="public")

def make_command(command): return Popen(command, shell=True,stdout=PIPE,stderr=PIPE).communicate()[0].strip() failed_status = make_command("/root/sbin/twstatshort | awk '{print $3}' | grep u0 | awk '{print $1}'")

Page 13: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Organization of the SMS notification system

The monitoring system runs sms notification script: Notify-service-by-sms

Notification system uses configuration files to define which notification it will use.

Defined by the gathering data plugin

Failure Analysis Notification

Sms notification plugins

"""INSERT INTO outbox (DestinationNumber,TextDecoded,Coding) VALUE ('"""+str(argNumber)+"""', '"""+str(argOption)+"""_ """+str(argHost)+"""_"""+str(argInterface)+"""', 'Default_No_Compression');""")

SMS sending service

sms.jinr.ru

Page 14: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Pnp4nagios: Template creation

Pnp4nagios allows flexible tuning charts by using own templates

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Pnp4nagios by default use “Default Template”

Page 15: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Management

The monitoring system allows issuing notifications. If the servers are down, it allows changing their states to “downtime” or “acknowledgement”

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Page 16: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Data visualization system

Information panel Network maps Unified state

tables

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Page 17: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Informational displays

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Computing and storage servers Network UPS Cumputing cluster load

Cooling system

Page 18: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Monitoring system: implementation and usage

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

The Nagvis web interface allows running the monitoring system on a TV screen without any supplementary device

Page 19: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Monitoring system performance

The max server load is about 5 cores of 24. It is about 20-25% load

Page 20: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Monitoging system usage

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Monitoring system access based by kerberos https protocol carry out connection protect

Monitoring system

B

E

C

D

A Cooling hardware operators

Supervisors

MICC operators

System administrators

Tier-1 operators

Page 21: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Conclusions

Organized operational reporting system about Tier-1 and Tier-2 in real time

Disigned visualization chart templates

Writen plugin allows to organize SMS notification

Writen configuration files, which allow gathering data from hardware to United system

Writen plugins for gathering and processing data from hardware

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

As a result the monitoring system of the JINR Tier-1 and Tier-2 has been developed and put

into operation

Page 22: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

LOGO

Page 23: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

LOGO

Page 24: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Nagios web interface

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Currently the monitoring system includes about 700 hosts Number of service for stable work equal about 3.5k

Page 25: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Pnp4nagios chart system

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Pnp4nagios allows to draw several lines per chart

Ppnp4nagios allows tuning charts

Pnp4nagios stores data in RRD. It’s allow to use many addons for chart customization

Page 26: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Chart system

For organize chart system used pnp4nagios templates + nagios_hightchart addon

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Page 27: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Tier-1 Informational display

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Page 28: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Алгоритм работы системы графиков

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

1. Execute plugin 2.Store Perfdata into Spool files 3. Move Spool File into Spool directory 4. Scan Spool directory 5. Execute Perfdata Command 6. Update RRD Database 7. Write XML Meta Data

Page 29: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

NagVis visualization system

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

various sensors and images

Network maps

Gadget for display various parameters

Page 30: LOGO Monitoring system of the JINR Tier-1 and Tier-2LOGO Monitoring system of the JINR Tier-1 and Tier-2 Kashunin I., Mitsyn V., Dolbilov A., Trofimov V. ROLCG 2015 Conference, Cluj-Napoca,

COMPANY LOGO

Gathering data from Nagios

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Broker module

check_mk livestatus

NagVis backend

Check_mk_livestatus: 1) Allow doesn’t use database; 2) Allow export configs to different servers.

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Nagios Unix socket NagVis