19
Parallel Performance Parallel Performance Wizard: a Performance Wizard: a Performance Analysis Tool for UPC Analysis Tool for UPC (and other PGAS (and other PGAS Models) Models) Max Billingsley III 1 , Adam Leko 1 , Hung-Hsun Su 1 , Dan Bonachea 2 , Alan D. George 1 1 Electrical and Computer Engineering Dept., University of Florida 2 Computer Science Div., UC Berkeley

Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

Embed Size (px)

Citation preview

Page 1: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

Parallel Performance Parallel Performance Wizard: a Performance Wizard: a Performance Analysis Tool for UPC Analysis Tool for UPC (and other PGAS (and other PGAS Models)Models)Max Billingsley III1, Adam Leko1, Hung-Hsun Su1,

Dan Bonachea2, Alan D. George1

1 Electrical and Computer Engineering Dept., University of Florida2 Computer Science Div., UC Berkeley

Page 2: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

2

Outline of Talk

Review of PGAS talk

The goal of PPW

Current status of PPW

Using PPW

Continuing Work

How can we make PPW as useful as

possible?

Page 3: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

3

Review of PGAS talk

Motivation for performance tools supporting PGAS models printf() doesn’t cut it for optimizing programs writing

using PGAS models such as UPC Good tools can really enhance productivity Currently poor support for UPC from existing tools

Overview of the GASP tool interface Event-based interface between performance tool and GAS

model compiler / runtime system Overview and demonstration of PPW

New performance tool designed for PGAS models

Page 4: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

4

The goal of PPW

Help UPC users achieve maximum productivity

in optimizing the performance of their applications

by providing detailed experimental performance data and helping them make sense of this data.

Page 5: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

5

Parallel Performance Wizard – current status Beta version of PPW available now:

http://www.hcs.ufl.edu/ppw/ We even have a Java WebStart version you can

test-drive quickly from any computer PPW currently includes many features that

should make it useful for UPC developers UPC-specific array layout visualization

PPW has complete instrumentation support on one UPC implementation Berkeley UPC 2.3.16 beta includes complete

support for PPW by implementing GASP

Page 6: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

6

Using PPW

The UPC developer takes the following steps: Build the application using PPW’s compiler

wrapper scripts: ppwupcc –inst-functions -o upc_app upc_app.c

Execute the instrumented application, using the ppwrun script to set up the environment: ppwrun --pofile --output=upc_app.par upcrun -N 32 ./upc_app

Open the resulting file using the PPW GUI Transfer file to workstation and start GUI

Page 7: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

7

Continuing work on PPW and GASP PPW

Add Additional PPW visualization features Scalability charts

More interesting analysis functionality GASP

Add support for additional PGAS models Help other tools take advantage of GASP

Page 8: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

Nano Case Study, NPB2.4 IS

Page 9: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

9

Nano Case Study Intro

PPW looks pretty, how useful is it for real apps?

Examine GWU NPB2.4 IS benchmark and looked for interesting things

Point of study See if tool tells us anything interesting NOT to pick apart a particular implementation Example yesterday illustrated my bad UPC code

Page 10: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

10

NPB 2.4 on Marvel (8 dual-core pr. SMP)

Page 11: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

11

NPB2.4 on Mu Cluster (Quadrics & Opteron)

Page 12: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

12

Close-up of SMP Comm. Pattern

Page 13: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

13

Close-up of Cluster Comm. Pattern

Page 14: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

14

The Culprit

/*** Equivalent to the mpi_alltoall + mpialltoallv in the c + mpi version* of the NAS Parallel benchmark.*/

for( i=0; i<THREADS; i++ ){ upc_memget( &infos[i], &send_infos_shd[MYTHREAD][i], sizeof( send_info ));}

for(i = 0; i < THREADS; i++){ …upc_memget( key_buff2 + total_displ,

key_buff1_shd + i + infos[i].displ * THREADS, infos[i].count * sizeof(INT_TYPE)) ;

…} * Collectives! *

Page 15: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

15

Other Interesting Things

Sum reduction Broadcast

Page 16: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

16

Interesting Reduction Find

How many remote references? upc_forall(thr_cnt=1; thr_cnt<THREADS; thr_cnt

<<= 1; continue) … upc_memget(local_array, ptrs[MYTHREAD + thr_cnt],

size * sizeof(elem_t)) ; …

What about now? shared elem_t *shared *ptrs ;

Page 17: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

17

Comm. Leak, Visually

Page 18: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

18

How can we make PPW as useful as possible? We would like feedback on the tool

Try the PPW beta and provide feedback! www.hcs.ufl.edu/ppw

Help us improve GASP What can we do to help language implementers

add GASP support?

Other ideas regarding UPC performance analysis?

Page 19: Parallel Performance Wizard: a Performance Analysis Tool for UPC (and other PGAS Models) Max Billingsley III 1, Adam Leko 1, Hung-Hsun Su 1, Dan Bonachea

19

Interoperability

Some key issues Usefulness of interoperating with other similar

PGAS models? “Dusty deck” MPI code