View
229
Download
0
Category
Preview:
Citation preview
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Monitoring of HPC andEmbedded Systems
Dennis Hoppe
EXCESS Workshop
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Agenda
• Rationale for Monitoring
• ATOM Monitoring Framework– Monitoring in HPC
– Monitoring of Embedded Systems
• Usage Examples– EXCESS
– DreamCloud
– PHANTOM
• Summary
EXCESS Workshop 2
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
WHY MONITORING?
EXCESS Workshop <#>
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Rationale for Monitoring
EXCESS Workshop
Maintenance
Accounting
Storage Monitoring
Hardware Performance
Power and Energy Monitoring
Application Profiling
4
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Demand for Energy Efficiency Mirrored in EU Projects
• CoolEmAll [CoolEmAll, 2011]
• DreamCloud [DreamCloud, 2013]
• ECO2Clouds [ECO2Clouds, 2012]
• ExaSolvers [ExaSolvers, 2013]
• EXCESS [EXCESS, 2013]
• JUNIPER [JUNIPER, 2012]
• PHANTOM [PHANTOM, 2015]
EXCESS Workshop 5
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
It’s all about saving energy
• Energy consumption is the major challenge in HPC (Exascale Challenge) [Ashby et al., 2010]
– Energy consumption must be a design goal in future algorithm design
– Standardization of interfaces and APIs to collect energy consumption data is needed
– Use of fine-grained measurement tools to evaluate energy saving effects on performance and vice versa
• Greening of the HPC domain will become as important the greening movement of the automotive domain
EXCESS Workshop 6
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Requirements on Current Monitoring Tools [Hoppe et al., 2015]
Key Property Zabbix Nagios OpenNMS
Architecture
Non-Intrusiveness
Scalability
Timeliness ()
Granularity
Extensibility
Data Storage
Visualization
Adaptability
Predictability
EXCESS Workshop 7
Key properties defined in [Aceta et al., 2013], Katsaros et al., 2011], [Telesca et al., 2014]
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Requirements on Current Monitoring Tools [Hoppe et al., 2015]
Key Property Zabbix Nagios OpenNMS
Architecture
Non-Intrusiveness
Scalability
Timeliness ()
Granularity
Extensibility
Data Storage
Visualization
Adaptability
Predictability
EXCESS Workshop 8
None of the existing monitoring solutions fully satisfies the requirements imposed by current and future projects!
Towards a novel monitoring framework
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
ATOM MONITORING FRAMEWORK
EXCESS Workshop 9
Image source: navantis.com
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Key Features of ATOM
• Analyzing the system's run-time context
• Low-intrusive, highly scalable architecture
• Flexible, language independent plug-in system
• RESTful Web service to push and retrieve data
• Light-weight and easy-to-grasp user library
• Integration with PBS resource manager (HPC) for on-demand monitoring of applications and infrastructure
• Web-based front-end for data exploration and analysis
EXCESS Workshop 10
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
ATOM Architecture
EXCESS Workshop
– MONITOR: ATOM monitoring server
– ACTOR: ATOM metric collector
– Rickshaw (D3.js)
– NodeJS
– Elasticsearch
11
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
MONITORING IN HPC
EXCESS Workshop 12
Image source: hlrs.de
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
ATOM Setup on the HLRS/EXCESS Cluster
EXCESS Workshop
• Cluster is used for software development, testing, profiling, evaluations within HLRS and for external project partners:• highly configurable and extensible; current power consumption is
roughly between 0.5 and 2.0 kW• power measurement framework integrated with PBS system; no further
performance overhead is induced while profiling applications
13
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
HLRS Power and Performance Measurement System
EXCESS Workshop 14
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Metric Gathering
• PAPI-C
• RAPLProcessor
• /proc/meminfo, /proc/vmstat
• iostatMemory
• PAPI-CNetwork
• Nvidia SMIGraphic Cards
• External Measurement SystemSystem
• ATOM monitoring API (HTTP, C, Java, Python, ...)Software
EXCESS Workshop 15
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Sampling Rate of Metrics [Hoppe et al., 2016]
• User-defined rates for each plug-in via configuration file– external power measurements up to 50kHz
– stable support for sampling at up to 50 Hz (= 20ms)
– ATOM allows for a 50 times higher resolution than standard monitoring
EXCESS Workshop 16
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
MONITORING OFEMBEDDED SYSTEMS
EXCESS Workshop 17
Image source: movidius.com
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Support for Movidius Myriad2
• MV0182 development board is integrated into the EXCESS cluster
– connections through Ethernet (also USB is supported)
– integrated with PBS resource manager (well-established in HPC)
– shunts and A/D converters are integrated via daughter card MV0198
• Arduino MEGA 2560 micro controller
– connects to MV0182 through the I2C bus
– connects to node of EXCESS cluster via USB serial interface
• Monitoring plug-in
– collects data from Arduino, and pushes it into the database
EXCESS Workshop 18
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Movidius Myriad2 Experiment Visualization
EXCESS Workshop <#>
http://mf.excess-project.eu
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Integration of Myriad2 into HPC Workflow
EXCESS Workshop <#>
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
USAGE EXAMPLES
EXCESS Workshop 21
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
EXCESS (FP7 Project) [EXCESS, 2013]
• Energy-aware scheduling with StarPU
EXCESS Workshop 22
StarPU ATOMMonitoring
Broker
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
DreamCloud (FP7 Project) [DreamCloud, 2013]
EXCESS Workshop 23
• Exploit monitoring data to improve task allocation
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
PHANTOM (H2020 Project) [PHANTOM, 2015]
• Extend ATOM’s support for embedded systems
EXCESS Workshop 24
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
SUMMARY
EXCESS Workshop 25
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Take Away Messages
• ATOM monitoring framework– is a light-weight, and easy to use monitoring framework
focusing on HPC and embedded system support
– has fundamental performance and energy metric support
– easily extendable through a convenient plug-in system
– offers users various interfaces to explore profiling data (i.e. front-end, RESTful service; clients in Java, C and Python)
• Increasing demand across multiple projects including EXCESS, DreamCloud, JUNIPER, and PHANTOM
EXCESS Workshop 26
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Open Source
• Apache License v2.0
• Github (https://github.com/excess-project)
– monitoring-frontend
– monitoring-server
– monitoring-agent
– monitoring-api
– monitoring-setup-ansible
• API documentation
– https://excess-project.github.io/monitoring-server
EXCESS Workshop 27
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
References• [Ashby et al., 2010]
– The Opportunities and Challenges of Exascale Comp., Summary Report of the Advanced Scientific Comp. Advisory Committee (ASCAC) Subcommittee at the US Department of Energy Office of Science, 2010.
• [CoolEmAll, 2011] http://tricoryne.man.poznan.pl
• [DreamCloud, 2013] http://www.dreamcloud-project.eu
• [ECO2Clouds, 2012] http://eco2clouds.eu
• [ExaSolvers, 2009] http://www.parallelintime.org/projects/sppexa.html• [EXCESS, 2013] http://www.excess-project.eu
• [Hoppe et al., 2015]– First Prototype of Monitoring Framework for the Conventional HPC and Movidius Platforms, Technical Report FP7-
611183 D3.3, EU FP7 Project EXCESS, February 2015.
• [Hoppe et al., 2016]– Lessons Learned and Final Remarks, Technical Report FP7-611183 D3.5, EU FP7 Project EXCESS, February 2016.
• [JUNIPER, 2012] http://www.juniper-project.eu
• [Katsaros et al., 2011]– Monitoring: A fundamental Process to provide QoS Guarantees in Cloud based Platforms, Cloud Computing:
Methodology, System, and Applications, 2011.
• [Khabi et al., 2016]– Report on the Final Evaluation Results and Discussion, Technical Report FP7-611183 D5.8, EU FP7 Project EXCESS,
August 2016.
• [PHANTOM, 2015] http://www.phantom-project.org
• [Telesca et al., 2014]– System Performance Monitoring of the ALICE Data Acquisition System with Zabbix, Journal of Physics, 2014.
EXCESS Workshop 28
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
APPENDIX
EXCESS Workshop <#>
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
RESTful Web Service
http://mf.excess-project.eu:3030
EXCESS Workshop <#>
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
ATOM Client in C
EXCESS Workshop 31
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Web Front-End: List of Experiments
EXCESS Workshop 32
26/08/2016:: ::
::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :::::
::
Visualization of Metric Data
EXCESS Workshop 33
Recommended