Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Spring HEPiX 2017 - Budapest - April 27th
The musings of a data junkie
Data, data and more data
Cary Whitney
Spring HEPiX 2017 - Budapest - April 27th
Data
Spring HEPiX 2017 - Budapest - April 27th
Realization
Data collectionvs
Monitoring
Spring HEPiX 2017 - Budapest - April 27th
HEPiX Twiki
Spring HEPiX 2017 - Budapest - April 27th
Monitoring (Myths/Issues/New)
• Students can do anything.• Without an understanding of the data, any dashboard or monitoring is pretty basic at
best.• He who collects the data, knows the data.
• Many stakeholders actually think that the ones who collect the data can build meaningful graphs and monitoring. The collectors may know the tools better but not the data.
• Building a comprehensive dashboard and monitoring is easy. Just throw it together.• Hey, now that you have all this data, can I copy it?
• There is a strong desire to copy some or all the data into other places.• RabbitMQ monitoring plugin
• You can have speed or monitoring but not both.• Cray SEDC data plugins. Direct streaming from the Cray system.• Elastic on Docker running on the Mac• Elastic upgrade to v5
• Kibana complained if there was still v2 components in the mix.• Kopf was broken, Cerebro is the new replacement for Kopf, for Elasticsearch
management.• Basic logstash, kibana and elastic stats now in kibana• Logstash config reload coming along
• rsyslog rework of the center’s syslog infrastructure, based on relp• ElastAlerts, netdata, GPFS and Lustre work this summer.
Spring HEPiX 2017 - Budapest - April 27th
netdata
Spring HEPiX 2017 - Budapest - April 27th
netdataMonitoring
Spring HEPiX 2017 - Budapest - April 27th
OpenDCIM
Spring HEPiX 2017 - Budapest - April 27th
OpenDCIM visual
Spring HEPiX 2017 - Budapest - April 27th
Cori, a Cray XC40 based system
12
Haswell
16,128 Cores203 TB Memory2004 Nodes
52
632,672 Cores1 PB Memory9304 Nodes
KNL
Cray Dragonfly topology 45 TB/s bisectional bandwidth
Burst Buffers1.8PB SSD dynamic storage
Lustre Scratch disk space30PB
700 GB/s
For only 7 MW of peak power
Spring HEPiX 2017 - Budapest - April 27th
Building Power (HPL run)
Spring HEPiX 2017 - Budapest - April 27th
Grafana Max CPU Overview
Spring HEPiX 2017 - Budapest - April 27th
Overview of Node
Spring HEPiX 2017 - Budapest - April 27th
Grafana Monitoring
Spring HEPiX 2017 - Budapest - April 27th
Elastic Docker
• Main doc page: https://twiki.cern.ch/twiki/bin/view/HEPIX/Monitoring• Instructions: https://twiki.cern.ch/twiki/bin/view/HEPIX/Instructions
1. Install Docker on you system.2. Create Docker yml file to load in Elastic, Kibana and Grafana3. Setup some local directories to store the data.4. Get your Elastic Index(es)5. Start the Elastic Docker6. Load the Index(es)7. Configure Kibana and Grafana8. Away you go with your local index
Spring HEPiX 2017 - Budapest - April 27th
Elastic Size
Spring HEPiX 2017 - Budapest - April 27th
Data Volumes (Single Day)
Size (GB) doc count (M) Descriptionmodbus 15.4 99.7 Serial based industrial devices
2500 PDU stripes and 849 PDU panels and substation
collectd 108.75 807.8 Linux system stats
SEDC 27.6 261.4 Cray power, environmental and job
Syslog 4.25 21.95 Logs from all systems/devices of the center
weather 0.017 0.044 Davis Weather station outside
onewire 0.940 5 Computer room temperature network over 1800 sensors
upmu 0.46 0.164 High resolution power monitoring
ION 0.206 1.9 Building substation power monitoring
Total 160 1.2B
Spring HEPiX 2017 - Budapest - April 27th
Thank You
- -