Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
For each job in the workflow we perform the following operations:
• Producing one event log for each job step using Score-P
• Generating one JSON profile for each event log in the same allocation
Reduces processing time in the analysis phase
• Querying job scheduling information and creating the job log (JSON) using
SLURM tools
The JSON profiles contain the time spent in:
• Computation
• Inter-process communication (MPI)
• Inter-thread communication (OpenMP, pthreads)
• I/O activities
• GPU Kernels
This data allows users to better understand their workflow and how they can
optimize its end-to-end execution time. We have developed a tool to translate
OTF2 traces generated by Score-P into these JSON profiles.
Center for Information Services and High Performance Computing Department of Interdisciplinary Application Development and Coordination
Christian Herold and Bill Williams ([email protected], [email protected])
Tel. +49 351 - 463 - 38000
Falkenbrunnen Room 014
Chemnitzer Straße 50, 01187 Dresden, Germany
Top-Down Performance Analysis of HPC Workflows
An HPC workflow is a coordinated sequence of interdependent applications.
Workflows can be modeled using jobs composed of steps, where each job
represents a single submission to the scheduling system, and each step executes
a single application (see Figure 1). Jobs may depend on each other. Therefore,
inefficiencies in one step can delay work depending on its associated job and
increase the runtime of the whole workflow. Determining the bottleneck of a
complex workflow can be a challenging task without using tools. In order to
optimize the step or application responsible for the bottleneck, details of the
runtime behavior are required. Therefore, a top-down approach is needed to
scale the performance data from a global (the whole workflow) to a detailed
(application level) view.
GROMACS is an open-source package for chemical simulation mostly used for
dynamic simulations of biomolecules. We instrumented the example "Lysozyme
in Water" from the GROMACS tutorial page with Score-P. We built 6 jobs in one
pipeline, skipping the final analysis step, and profiled the entire workflow,
including job information from the scheduler.
A top-down approach provides
performance summaries for each level
of a single workflow:
1. Present an overview of each job
inside a workflow
Identify inefficient jobs
2. Present an overview of each step
inside a job
Identify causing job step
3. Analyse a job step in detail with
Vampir
Find the inefficiency in the
program
Our next step is the implementation of a visualizer that can read our profiles and
job summaries and produce the charts we show here automatically. We also
intend to refine our data collection infrastructure as needed.
“Dresdner Elbflorenz bei Dämmerung” by MalteF,
used under CC-BY-SA-3.0-DE / Cropped from original
This research was undertaken as part of the NEXTGenIO project, which is funded
through the European Union’s Horizon 2020 Research and Innovation
programme under Grant Agreement no. 671951.
In order to record the runtime behavior on the job step level, we used Score-P to
instrument the application being executed. Figure 2 depicts the measurement of
the workflow for one job and depicts the required components on the left side as
well as the output on the right side
Figure 3 depicts the topology of the
GROMACS workflow:
• First, GROMACS runs three mostly
serial jobs during its setup phase
• Then three parallel jobs perform the
bulk of the simulation work
In Figure 4, we show that the
equilibration job in this workflow
has the longest runtime and the
largest MPI overhead of the six
jobs. This makes it a promising
candidate for optimization, so we
investigate its component steps.
Figure 5 shows that steps 1 and 3
in this job have the longest
runtimes. Step 3 is compute
bound and could possibly benefit
from more cores; step 1 is MPI-
bound and may not be well
configured. We look at step 1 in
more detail below to identify
possible problems.
In Figure 6, we see that step 1 has
significant MPI startup overhead
relative to its amount of
computation. It is likely that this
job would benefit from a lower
degree of parallelism.
Figure 1: An example of a workflow and the
provided performance summaries.
Figure 2: Required components for workflow measurement
Motivation
Top-Down Approach
Methodology
Evaluation
Future Work
Acknowledgement
0,
30,
60,
90,
120,
150,
Tim
e (
se
c)
Job Name
Figure 4: Workflow Overview by Job
Computation MPI OpenMP ISO C I/O POSIX I/O
0,0
12,5
25,0
37,5
50,0
0 1 2 3 4
Tim
e (
se
c)
Job Step Number
Figure 5: Details of Equilibration Job Steps
Computation MPI OpenMP ISO C I/O POSIX I/O
Figure 6: Analysis of Equilibration Step 1
with Vampir
Figure 3: Job topology of the GROMACS
example Lysozyme in Water.
Job
Workflow
Event log
Job step
Profile
Query job log Job log
SLURM
Score-P
Output
Job Summary
Job A
Job step A.1
Job step A.2
Workflow
Job B
Job step B.1
Job step B.2
1.
Detailed Performance Analysis
2.
3.
Workflow Summary
Generate protein
Select + Solvate
Add Ions
Minimize energy
EquilibrationMD
production