Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max...

Preview:

Citation preview

Parallel Performance Parallel Performance Wizard: a Performance Wizard: a Performance Analysis Tool for UPC Analysis Tool for UPC (and other PGAS (and other PGAS Models)Models)Max Billingsley III1, Adam Leko1, Hung-Hsun Su1,

Dan Bonachea2, Alan D. George1

1 Electrical and Computer Engineering Dept., University of Florida2 Computer Science Div., UC Berkeley

2

Outline of Talk

Review of PGAS talk

The goal of PPW

Current status of PPW

Using PPW

Continuing Work

How can we make PPW as useful as

possible?

3

Review of PGAS talk

Motivation for performance tools supporting PGAS models printf() doesn’t cut it for optimizing programs writing

using PGAS models such as UPC Good tools can really enhance productivity Currently poor support for UPC from existing tools

Overview of the GASP tool interface Event-based interface between performance tool and GAS

model compiler / runtime system Overview and demonstration of PPW

New performance tool designed for PGAS models

4

The goal of PPW

Help UPC users achieve maximum productivity

in optimizing the performance of their applications

by providing detailed experimental performance data and helping them make sense of this data.

5

Parallel Performance Wizard – current status Beta version of PPW available now:

http://www.hcs.ufl.edu/ppw/ We even have a Java WebStart version you can

test-drive quickly from any computer PPW currently includes many features that

should make it useful for UPC developers UPC-specific array layout visualization

PPW has complete instrumentation support on one UPC implementation Berkeley UPC 2.3.16 beta includes complete

support for PPW by implementing GASP

6

Using PPW

The UPC developer takes the following steps: Build the application using PPW’s compiler

wrapper scripts: ppwupcc –inst-functions -o upc_app upc_app.c

Execute the instrumented application, using the ppwrun script to set up the environment: ppwrun --pofile --output=upc_app.par upcrun -N 32 ./upc_app

Open the resulting file using the PPW GUI Transfer file to workstation and start GUI

7

Continuing work on PPW and GASP PPW

Add Additional PPW visualization features Scalability charts

More interesting analysis functionality GASP

Add support for additional PGAS models Help other tools take advantage of GASP

Nano Case Study, NPB2.4 IS

9

Nano Case Study Intro

PPW looks pretty, how useful is it for real apps?

Examine GWU NPB2.4 IS benchmark and looked for interesting things

Point of study See if tool tells us anything interesting NOT to pick apart a particular implementation Example yesterday illustrated my bad UPC code

10

NPB 2.4 on Marvel (8 dual-core pr. SMP)

11

NPB2.4 on Mu Cluster (Quadrics & Opteron)

12

Close-up of SMP Comm. Pattern

13

Close-up of Cluster Comm. Pattern

14

The Culprit

/*** Equivalent to the mpi_alltoall + mpialltoallv in the c + mpi version* of the NAS Parallel benchmark.*/

for( i=0; i<THREADS; i++ ){ upc_memget( &infos[i], &send_infos_shd[MYTHREAD][i], sizeof( send_info ));}

for(i = 0; i < THREADS; i++){ …upc_memget( key_buff2 + total_displ,

key_buff1_shd + i + infos[i].displ * THREADS, infos[i].count * sizeof(INT_TYPE)) ;

…} * Collectives! *

15

Other Interesting Things

Sum reduction Broadcast

16

Interesting Reduction Find

How many remote references? upc_forall(thr_cnt=1; thr_cnt<THREADS; thr_cnt

<<= 1; continue) … upc_memget(local_array, ptrs[MYTHREAD + thr_cnt],

size * sizeof(elem_t)) ; …

What about now? shared elem_t *shared *ptrs ;

17

Comm. Leak, Visually

18

How can we make PPW as useful as possible? We would like feedback on the tool

Try the PPW beta and provide feedback! www.hcs.ufl.edu/ppw

Help us improve GASP What can we do to help language implementers

add GASP support?

Other ideas regarding UPC performance analysis?

19

Interoperability

Some key issues Usefulness of interoperating with other similar

PGAS models? “Dusty deck” MPI code

Recommended