19
Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia, July 8, 2013

Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Embed Size (px)

Citation preview

Page 1: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Development of the distributed monitoring system for the NICA cluster

Ivan Slepov(LHEP, JINR)

Mathematical Modeling and Computational Physics Dubna, Russia, July 8, 2013

Page 2: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,
Page 3: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,
Page 4: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

The MultiPurpose Detector – MPDto study Heavy Ion Collisions at NICA

Page 5: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Software for MultiPurpose Detector

MpdRoot Framework

components:

Detectors simulation

Data reconstruction

Event analysis

ROOT + FairRoot (FairBase + FairSoft software packages) =

Page 6: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Software for MultiPurpose Detector

MpdRoot Framework

components:

Detectors simulation

Data reconstruction

Event analysis

ROOT + FairRoot (FairBase + FairSoft software packages) =

Page 7: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Software for MultiPurpose Detector

MpdRoot Framework

components:

Detectors simulation

Data reconstruction

Event analysis

ROOT + FairRoot (FairBase + FairSoft software packages) =

Page 8: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Software for MultiPurpose Detector

MpdRoot Framework

components:

Detectors simulation

Data reconstruction

Event analysis

ROOT + FairRoot (FairBase + FairSoft software packages) =

Page 9: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Computing resources for MPD data processing

CPU: 128 XEON cores GPU: ~1500 TESLA cores

Page 10: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Computing resources for MPD data processing

CPU: 128 XEON cores => in future ~10 000 XEON cores GPU: ~1500 TESLA cores

Page 11: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Motivation to develop monitoring system

- Computing resources information (free space, memory, cpu, etc)

- System load (load average, processes)

- MPD software information (FairSoft version)

- Cluster software information (SGE, xrootd, proof)

- User tasks monitoring (batch processing and interactive jobs)

MPD users need more information about all own cluster nodes and public computers!

Page 12: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Monitoring system schemes

MySQLDB

BASH Scripts

DSHSoftware

Cronrun job

PHPScripts

WEBInterface

MySQLDB

Scheme 1 – for collect general information

Page 13: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Monitoring system schemes

MySQLDB

BASH Scripts

DSHSoftware

Cronrun job

PHPScripts

WEBInterface

MySQLDB

Scheme 1 – for collect general information

WEBInterface

PHPScripts

DSHSoftware

BASHScripts

MySQLDB

Scheme 2 – for collect information about user tasks and provide data management

Page 14: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Web-interface for

Monitoring system

1. MPD software information

2. Computing resources information

3. System load

4. User tasks monitoring

Page 15: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Monitoring system web-interfaceUser tasks

Page 16: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Monitoring system web-interfaceInteractive nodes

Page 17: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Access to the monitoring system on websitempd.jinr.ru

Page 18: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

Thank you for your attention!

Page 19: Development of the distributed monitoring system for the NICA cluster Ivan Slepov (LHEP, JINR) Mathematical Modeling and Computational Physics Dubna, Russia,

MPD users need more information about all own cluster nodes and public computers!

Why? If, for example, the concept of grid uses a layer of abstraction from the resources.

Because MPD software now still under development and needs testing and debugging.

Motivation to develop system monitoring