Initial Design of a Test Suite for Automatic Performance Analysis Tools Bernd Mohr Forschungszentrum...

Preview:

Citation preview

Initial Design of a Test Suite

for Automatic Performance Analysis

Tools Bernd Mohr

Forschungszentrum JülichJohn von Neumann - Institut für

ComputingGermany

b.mohr@fz-juelich.de

Jesper Larsson Träff

NEC Europe Ltd.C&C Research Labs

Germanytraff@ccrl-nece.de

Initial Design of a Test Suitefor (Automatic)

Performance Analysis Tools

© 2003 Forschungszentrum Jülich, NIC-ZAM [2]

IST Working Group APART (since 1999)

AAutomatic PPerformance AAnalysis: RResources and TTools

• Forum for scientists and vendors• About 20 partners in Europe and the U.S.• http://www.fz-juelich.de/apart/

• Current Automatic Performance Tools Projects• Askalon http://www.par.univie.ac.at/project/askalon/• Kappa-Pi http://www.caos.uab.es/kpi.html• KOJAK http://www.fz-juelich.de/zam/kojak• Paradyn http://www.cs.wisc.edu/~paradyn/• Peridot http://wwwbode.cs.tum.edu/~gerndt/peridot/

© 2003 Forschungszentrum Jülich, NIC-ZAM [3]

(Full, Associated, and Former) Members

• European Research Centers and Universities

• U.S. Research Centers and Universities

• Vendors

© 2003 Forschungszentrum Jülich, NIC-ZAM [4]

APART Terminologie

• Performance Property• Aspect of performance behavior of an application

– E.g., communication dominated by waiting time• Specified as condition referring to performance data• Quantified and normalized in terms of

behavior-independent metric (severity)

• Performance Problem• Performance property with “negative” implications

• Performance Bottleneck• Performance Problem with highest severity

© 2003 Forschungszentrum Jülich, NIC-ZAM [5]

Example: Performance Property “Message in Wrong Order”

Locati

on

RECVA

Time

wait

SEND B SEND

C

RECV

SEND

© 2003 Forschungszentrum Jülich, NIC-ZAM [6]

The APART Test Suite (ATS)

• Users rely on correct working of tools Tools need to be especially well tested Systematic approach needed

• APART Test Suite• Common project inside APART group

– Every member needs this minimize resources– Ensures re-usability– Will also allow evaluation / comparison of

the different member projects• Main focus: automatic performance analysis tools• But also useful for “regular” performance tools

– http://www.fz-juelich.de/apart/ats/

© 2003 Forschungszentrum Jülich, NIC-ZAM [7]

Desired Functionality

• Tests to determine whether the semanticsof the original program were not altered

• Tests to see whether the recordedperformance data is correct

• Synthetic positive test cases for each known and definedperformance property and combinations of them

• Negative test cases which have no known performance problem

• “Real world” size parallel applications and benchmarks

Can be partially based on existing validation suites WWW

Probably needs to be tool specific

Collect available benchmarks and applications WWW

Design and Implementation of a ATS Framework

© 2003 Forschungszentrum Jülich, NIC-ZAM [8]

Validation Suites and Kernel Benchmarks (I)

ValidationMPI test / validation suites from Intel, IBM, ANL•http://www-unix.mcs.anl.gov/mpi/mpi-test/tsuite.html

MPI BenchmarksPARKBENCH (PARallel Kernels and BENCHmarks)•http://www.netlib.org/parkbench/

PMB - Pallas MPI Benchmarks•http://www.pallas.com/e/products/pmb/

SKaMPI (Special Karlsruher MPI – Benchmark)•http://liinwww.ira.uka.de/~skampi/

© 2003 Forschungszentrum Jülich, NIC-ZAM [9]

Kernel Benchmarks (II)

OpenMP BenchmarksEPCC OpenMP Microbenchmarks •http://www.epcc.ed.ac.uk/… research/openmpbench/openmp_index.html

Hybrid BenchmarksThe Los Alamos MicroBenchmarks Suite (LAMB) • MPI and multi threading ( Pthreads and OpenMP)

programming models based on SKaMPI and EPCC

© 2003 Forschungszentrum Jülich, NIC-ZAM [10]

“Real World” Applications and Benchmarks

The NAS Parallel Benchmarks (NPB)•http://www.nas.nasa.gov/Software/NPB/

The ASCI Purple and Blue Benchmark Codes•http://www.llnl.gov/… asci/purple/benchmarks/limited/code_list.html… asci_benchmarks/asci/asci_code_list.html

NCAR Benchmarks•http://www.scd.ucar.edu/css/software/bench/

© 2003 Forschungszentrum Jülich, NIC-ZAM [11]

Current Design of ATS Framework

df_same()df_cyclic2()df_block2()df_linear()df_peak()df_cyclic3()df_block3()

DISTRIBUTION

do_work()

WORK

© 2003 Forschungszentrum Jülich, NIC-ZAM [12]

The Distribution Module

• Distribution specified by• Distribution function• Distribution parameters

• All distribution function have the same signature• double distr_func (int me, int size, double sf, distr_t* dd)

– me, size: member me of group of size size– sf: scaling factor– dd: distribution parameter descriptor

• returns value for me calculated based on me, size, and ddscaled by sf

• ATS provides set of predefined distribution functions• Can easily extended if needed

© 2003 Forschungszentrum Jülich, NIC-ZAM [13]

Predefined Distribution Functions

low

high

block2

low

high

cyclic2

val

same

low

high

linear

low

high

peak

low

med

high

block3

low

med

high

cyclic3

n

© 2003 Forschungszentrum Jülich, NIC-ZAM [14]

Current Design of ATS Framework

df_same()df_cyclic2()df_block2()df_linear()df_peak()df_cyclic3()df_block3()

DISTRIBUTION

do_work()

WORK

MPI PROPERTIES OpenMP PROPERTIES

par_do_omp_work()

OpenMP UTILS

par_do_mpi_work()alloc_mpi_buf()free_mpi_buf()alloc_mpi_vbuf()free_mpi_vbuf()mpi_commpattern_sendrecv()mpi_commpattern_shift()

MPI UTILS

© 2003 Forschungszentrum Jülich, NIC-ZAM [15]

Example: MPI Property Function late_sender

void par_do_mpi_work(distr_func_t df, distr_t* dd, MPI_Comm c) { int me, sz; MPI_Comm_rank(c, &me); MPI_Comm_size(c, &sz); do_work(df(me, sz, 1.0, dd));}

void late_sender(double bwork, double ework, int r, MPI_Comm c) { val2_distr_t dd; int i; mpi_buf_t* buf = alloc_mpi_buf(base_type, base_cnt); dd.low = bwork+ework; dd.high = bwork;

for (i = 0; i<r; ++i) { par_do_mpi_work(df_cyclic2, &dd, c); mpi_commpattern_sendrecv(buf, DIR_UP, 0, 0, c); } free_mpi_buf(buf);}

© 2003 Forschungszentrum Jülich, NIC-ZAM [16]

Currently Implemented Performance Property Functions

• MPI Point-to-PoCommunication Performance Properties• late_sender(basework, extrawork, rf, MPI_Comm);• late_receiver(basework, extrawork, rf, MPI_Comm);

• MPI Collective Communication Performance Properties• imbalance_at_mpi_barrier(distr_func, distr_param, rf, MPI_Comm);• imbalance_at_mpi_alltoall(distr_func, distr_param, rf, MPI_Comm);• late_broadcast(basework, rootextrawork, root, rf, MPI_Comm);• late_scatter(basework, rootextrawork, root, rf, MPI_Comm);• late_scatterv(basework, rootextrawork, root, rf, MPI_Comm);• early_reduce(rootwork, baseextrawork, root, rf, MPI_Comm);• early_gather(rootwork, baseextrawork, root, rf, MPI_Comm);• early_gatherv(rootwork, baseextrawork, root, rf, MPI_Comm);

• OpenMP Performance Properties• imbalance_in_parallel_region(distr_func, distr_param, rf);• imbalance_at_barrier(distr_func, distr_param, rf);• imbalance_in_loop(distr_func, distr_param, rf);

© 2003 Forschungszentrum Jülich, NIC-ZAM [17]

Current Design of ATS Framework

df_same()df_cyclic2()df_block2()df_linear()df_peak()df_cyclic3()df_block3()

DISTRIBUTION

do_work()

WORK

MPI PROPERTIES OpenMP PROPERTIES

par_do_omp_work()

OpenMP UTILS

par_do_mpi_work()alloc_mpi_buf()free_mpi_buf()alloc_mpi_vbuf()free_mpi_vbuf()mpi_commpattern_sendrecv()mpi_commpattern_shift()

MPI UTILS

TEST PROGRAMS

© 2003 Forschungszentrum Jülich, NIC-ZAM [18]

Performance Property Test Programs

• Single performance property testing• Programs can be generated automatically from

performance property function signature– Generator based on Program Database Toolkit (PDT)– http://www.cs.uoregon.edu/research/paracomp/pdtoolkit/

• Property parameters become test program arguments• More extensive tests through scripting languages

or experiment management system (e.g., Zenturio)– http://www.par.univie.ac.at/project/zenturio/

• Composite performance property testing• Program containing multiple performance property functions• Complexity only limited by imagination• Currently: manually implemented

© 2003 Forschungszentrum Jülich, NIC-ZAM [19]

Example: Single Performance Property Test Program

#include "mpi_pattern.h"

int main(int argc, char *argv[]) { distr_func_t df = atodf("b2:0.5:1.0"); distr_t *dd = atodd("b2:0.5:1.0"); int r = 1;

MPI_Init(&argc, &argv);

switch ( argc ) { case 3: r = atoi(argv[2]); case 2: df = atodf(argv[1]); dd = atodd(argv[1]); case 1: break; default: fprintf(stderr, "usage: %s <distf> <rfac>\n", argv[0]); break; }

imbalance_at_mpi_barrier(df, dd, r, MPI_COMM_WORLD); MPI_Finalize();}

© 2003 Forschungszentrum Jülich, NIC-ZAM [20]

Example: Single Performance Property Test Program

• imbalance_at_mpi_barrier <distribution-spec> <repition-factor>

b2:0.5:1.0 2 b2:0.1 :2.0 5

• Problem: additional property “MPI Setup/Termination Overhead” also holds!

© 2003 Forschungszentrum Jülich, NIC-ZAM [21]

Example: Collection of MPI Performance Properties

© 2003 Forschungszentrum Jülich, NIC-ZAM [22]

Examples: Detail MPI Properties

© 2003 Forschungszentrum Jülich, NIC-ZAM [23]

Example: MPI Properties in 2 Communicators

© 2003 Forschungszentrum Jülich, NIC-ZAM [24]

EXPERT Analysis of MPI 2 Communicator Example

© 2003 Forschungszentrum Jülich, NIC-ZAM [25]

Example: OpenMP Performance Property

© 2003 Forschungszentrum Jülich, NIC-ZAM [26]

ATS: Status and Future Work

• Initial prototype available from APART website• List of MPI, OpenMP, and hybrid

validation and benchmark suites• 1st version of ATS framework including

– C version of code– Single property test program generator

• Future Work• More complete collection of validation and benchmark suites• Real “real world” applications• ATS Framework

– Fortran version – More complete list of property functions for

MPI, OpenMP, hybrid, and sequential performance properties– Documentation

Recommended