Performance Monitoring on NERSC s POWER 5 System · • Parallel application performance – looks...

Performance Monitoring on NERSC’s POWER 5 System Richard Gerber NERSC User Services RAGerber@lbl.gov

ScicomP 14 Poughkeepsie, N.Y.

May 22, 2008

ScicomP 14, Poughkeepsie, N.Y. May 22, 2008 2

Prequel

•  Although this talk centers around identifying system problems, it also highlights the stability and excellent design of the p575 POWER 5 system.

•  Virtually all the problems we’ve had with Bassi can be attributed to software complexity.

•  Take away-point: Application testing and monitoring are necessary to ensure proper system function.

Outline

•  Why monitor performance? •  What are we monitoring? •  Procedure •  Data and results •  Discussion •  Summary

Why monitor performance?

•  To provide stable, consistent high performance scientific computing resources.

•  To ensure that system performance and reliability never decreases over the machine’s lifetime.

•  To recognize when software, hardware, and configuration changes impact performance.

What are we monitoring?

•  Parallel application performance – looks at the system from a user perspective.

•  High-level “component” tests: IO, memory bandwidth, MPI latency and bandwidth.

•  Serial application performance on a single packed node.

•  The goal is to monitor, maintain, improve the user experience.

Three parallel applications

•  GTC (Gyrokinetic Toroidal Code) ▪  Studies energy transfer via plasma

microturbulence in fusion device plasmas. PIC code.

•  PARATEC (Parallel Total Energy Code) ▪  Ab-initio quantum total energy calculations via

pseudo-potentials and plane wave basis set. Self-consistent field conjugate gradient.

•  CAM (Community Atmospheric Model) ▪  Complicated multi-physics and chemistry.

Three standard parallel benchmarks

•  NPB 2.4 Class D: SP •  NPB 2.4 Class D: MG •  NPB 2.4 Class D: FT

•  These were chosen as models of memory-intensive kernels that benefited from good memory bandwidth

Three “Component” Tests

•  MEMRATE ▪  Single-node memory bandwidth.

•  MPITEST ▪  MPI latency ▪  MPI bandwidth

•  PIORAW ▪  Parallel IO performance

Procedure

•  Goals ▪  Monitor the system in production mode ▪  Small impact on users ▪  Fast and flexible way to test SW and HW changes ▪  Automatically run suite, gather data and post results

•  Implementation ▪  Run suite of codes approximately weekly ▪  Each code typically runs a few minutes on 8 (of 111)

nodes ▪  End to end workflow accommodated through scripts

that run, parse, import into DB and results displayed on the web.

Results Web Interface

•  The monitoring results are publicly available on the web ▪  http://www.nersc.gov/nusers/systems/

bassi/monitor.php •  This is useful for visually scanning

for anomalies. •  Data is quickly available to IBM

management and technical staff.

Examples of Problems Revealed

•  PE upgrade problem •  HPS firmware upgrade problem •  AIX 5.3 upgrade problem •  Password file indexing problem •  System daemon problem •  Compiler upgrade issues •  Random hardware problems (e.g., L3

cache)

CAM 3.0 Results CAM 16x1 (no OpenMP)

200.00

250.00

300.00

350.00

400.00

450.00

500.00

550.00

600.00

Sep-05 Mar-06 Oct-06 Apr-07 Nov-07 Jun-08

Healthy PE upgrade problem Firmware upgrade problemAIX 5.3 testing Indexing System DaemonXLF 11

PARATEC Results Paratec 64 tasks

3000.00

3200.00

3400.00

3600.00

3800.00

4000.004200.00

4400.00

4600.00

4800.00

5000.00

Sep-05 Mar-06 Oct-06 Apr-07 Nov-07 Jun-08Date

Healthy PE upgrade problem Firmware upgrade problemAIX 5.3 testing Indexing System DaemonBad Node

NPB MG Results NPB MG 2.4 64 tasks

200.00

400.00

600.00

800.00

1000.00

1200.00

1400.00

1600.00

Sep-05 Dec-05 Mar-06 Jul-06 Oct-06 Jan-07 Apr-07 Aug-07 Nov-07 Feb-08 Jun-08 Sep-08

Healthy PE upgrade problem Firmware upgrade problemAIX 5.3 test Bad Node System DaemonIndexing XLF 11

MEMRATE Packed Node MEMRATE Packed Node TRIAD

Healthy PE upgrade problem Firmware upgrade problemsBad Node Mem Segments IndexingSystem Daemon LL Upgrade/Config

GTC Results GTC 64 tasks

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

900.00

Healthy PE upgrade Firmware upgradeAIX 5.3 testing Bad Node System daemonIndexing LL upgrade/config XLF 11

Discussion

•  Most of the problems we uncovered involved software upgrades or configuration changes

•  None of these issues were known before monitoring revealed them

•  None except the bad L3 caches and compiler changes were identified by “system” tests or had simple resolutions

Healthy Results Summary

Benchmark Avg. MFlop/s Std. Dev. COV

CAM 16 502 1.4 0.28%

PARATEC 64 4719 80 1.69%

GTC 747 8.9 1.20%*

NPB MG D 64 1331 26 1.95%

MEMRATE 7070 42 0.60%

Discussion

•  When you run and examine the results over time you get a “feel” for when something is wrong

•  The measured variation on a healthy system allows you to quickly evaluate the significance of an outlying result

•  Even minor variations in run time can indicate that a system is sick

•  The historical results provide quantitative evidence that a problem exists

Discussion

•  A healthy Bassi has remarkably consistent run time performance characteristics.

•  This is especially notable because the system is heavily loaded with a diverse workload.

•  We don’t have comparable long-term data for other systems at NERSC (yet!)

•  Aggregate measures (SSP) are also useful, but are not as diagnostic for as many problems

Sustained System Performance (SSP)

The SSP is an aggregate metric derived from a (geometric) average of application benchmarks and standard parallel benchmarks.

Indexing System Daemon

Summary

•  Application testing and monitoring are necessary to ensure proper system function.

•  When configured properly application performance on Bassi is remarkably consistent.

•  Most problems we’ve had with Bassi can be attributed to software complexity.

Performance Monitoring on NERSC s POWER 5 System · • Parallel application performance – looks...

Documents

Application performance and communication profiles of M3DC1_3D on NERSC babbage KNC with 16 MPI Ranks Thanh Phung, Intel TCAR Woo-Sun Yang, NERSC

Cray Optimization and Performance Tools - NERSC

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005

NERSC Site Report

NERSC Biology Computing - CCI homepagecci.lbl.gov/dials/diffuse/1620_Skinner.pdf · NERSC History 1974 "Founded"atLivermore"to"supportfusion" research"with"aCDC"system" 1978 Cray"1"installed"

NERSC-9 · 11/1/2018 · Katie Antypas, Brian Austin, Brandon Cook NERSC-9 NERSC User Group Nov. 1, 2018. NERSC-9: A System Optimized for Science Cray Shasta System providing 3-4x

MIC & OpenMP 4 - NERSC€¦ · MIC & OpenMP 4 TCG Micro SSG DPD NERSC Threading Workshop, March 2015 ... − (May) improve the combined cache size − (May) decrease performance of

Zhengji Zhao NERSC User Services Group CORI and NERSC Exascale Application Program (NESAP) Acknowledgement: NERSC application readiness team NWChem Workshop

1 The NERSC Global File System NERSC June 12th, 2006

Using IOR to Analyze the I/O Performance Hongzhang Shan, John Shalf NERSC

National Energy Research Scientific Computing Center (NERSC) NERSC Site Report

User and Performance Impacts from Franklin Upgrades · 2 Franklin’s Role at NERSC • NERSC is the US DOE’s keystone high performance computing center. • Franklin is the “flagship”

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 A Comparison of Performance Analysis Tools on the NERSC SP Jonathan Carter NERSC User Services

MPI Performance in a Production Environment David E. Skinner, NERSC User Services ScicomP 10

The NERSC Global File System NERSC June 12th, 2006

Parallel Application Scaling, Performance, and Efficiency David Skinner NERSC/LBL

NERSC Site Update - IXPUG · Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head. Cori 2 Cray XC40 system 9,600 Intel Xeon Phi (KNL 7250) @ 1.4

National Energy Research Scientific Computing Center (NERSC) NERSC Site Report Shane Canon NERSC Center Division, LBNL 10/15/2004

NERSC Site Update

Crayyy XT and Cray XE System Overview - NERSC: National Energy