29
CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green: Positive note

CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

Embed Size (px)

Citation preview

Page 1: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

CEPBA Tools (DiP) Evaluation Report

Adam LekoHans Sherburne,

UPC Group

HCS Research LaboratoryUniversity of Florida

Color encoding key:

Blue: Information

Red: Negative note

Green: Positive note

Page 2: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

2

Basic Information Name: Dimemas, MPITrace, Paraver Developer: European Center for Parallelism of

Barcelona Current versions:

MPITrace 1.1 Paraver 3.3 Dimemas 2.3

Website: http://www.cepba.upc.es/tools_i.htm

Contact: Judit Gimenez ([email protected])

Page 3: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

3

DiP Overview DiP = Dimemas, Paraver

Toolset used for improving performance of parallel programs

Created by CEPBA ca. 1992/93, still in development

Has three main components: Trace collection

MPITrace for MPI programs OMPTrace for OpenMP programs (not

evaluated) OMPITrace for hybrid OpenMP/MPI

programs (not evaluated) Trace visualization: Paraver Trace simulation: Dimemas

Uses MPIDTrace for instrumentation

Workflow encouraged by DiP “Measure-modify” approach Pictured right

Write code

Examinetracefile(Paraver)

Hypothesize about

bottlenecks

Verify via simulation(Dimemas)

Instrument(MPITrace)

Fix bottlenecks

Test new hypothesis

Page 4: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

4

MPITrace Overview Automatically profiles all MPI commands using MPI profiling

interface Compilation command:

mpicc -L/path/to/mpitrace/libs \ -L/path/to/papi/libs -lmpitrace -lpapi \ <rest of compilation cmds>

Can record other information too Hardware counters via PAPI (MPItrace_counters) Custom events (MPItrace_event)

Requires special runtime wrapper script to produce tracefile Command:

mpitrace mpirun <rest of regular cmds> mpitrace requires license to run

mpitrace must be started from machine listed in license file

Page 5: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

5

MPITrace Overview (2) After running mpitrace, several .mpit files are created (one per MPI

process) Collect them into a single tracefile with command:

mpi2prv –syn *.mpit -syn flag necessary to line up events correctly (not mentioned in docs [1])

This command creates a single logfile (.prv) and Paraver config file (.pcf)

.pcf file also contains names and colors of custom events Tracefile format

ASCII (plain text), well-documented (see [1]) Can get to be quite large

.prv files can be converted to faster-loading, platform-dependent, undocumented binary format via prv2log command

Was never able to get hardware counters working Took several tries to get any tracefile to be created PAPI 3.0.7 installed with no problems on Kappas 1-8 No errors but no hardware counter events in tracefile! Rest of review assumes that this can be fixed given enough time

Page 6: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

6

MPITrace Overhead All programs executed correctly when instrumented Benchmarks marked with a star had high variability in execution time

Readings with stars probably not accurate Based on LU benchmark, expect ~ 30% tracing overhead

More communication == more overhead Wasn’t able to test overhead of hardware counter instrumentation

MPITrace overhead

8%

31%

0%

0%

0%

1%

7%

25%

21%

1%

0%

0% 5% 10% 15% 20% 25% 30% 35%

CAMEL

NAS LU (8p, W)

PP: Big message*

PP: Diffuse procedure

PP: Hot procedure

PP: Intensive server

PP: Ping pong*

PP: Random barrier*

PP: Small messages*

PP: System time

PP: Wrong way

Be

nc

hm

ark

Overhead (instrumented/uninstrumented)

Page 7: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

7

Paraver Overview Four main pieces of Paraver

(see right): Filtering Semantic module Visualization

Graphical timeline Text Analysis (1D/2D)

Complex piece of software! Had to review several

documents to get a feel for how to use [2, 3, 4, 5]

Tutorial short but not too clear Reference manual best

documentation, but lengthy

Image courtesy [2]

Page 8: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

8

Paraver: Process/Resource Models

Process model (courtesy [3])

Resource model (courtesy [3])

Page 9: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

9

Paraver: Graphical Timeline Graphic display uses standard timeline view

Event view similar to Jumpshot, Upshot, etc. (right, top)

Can also display time-varying data like global CPU utilization (right, bottom)

Tool can display more than one trace file at a time Uses “tape” metaphor instead of scrolling

Play, pause, rewind to beginning, fast forward to end

Cumbersome and nonintuitive Breaks intuition of what scroll bars do (scroll bars

do not scroll window) Moving window creates animations which slows

things down compared to regular scrolling Interface is workable, but takes some getting

used to Zooming always brings up another window

Quickly results in many open windows This complexity handled by adding the a

save/restore open windows function Save/restore windows nice feature

Interface is generally snappy Uses ugly widget set by today’s standards

Page 10: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

10

Paraver: Text Views Provide very detailed information about trace files

Textual listing of events Which events happen when

Access by clicking on graphical timeline

Page 11: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

11

Paraver: 1D/2D Analysis 1D Analysis (right, top)

Shows statistics about various types of events

Shown per thread as text or histogram 2D Analysis (right, bottom)

Shows statistics for 1 event type between Pairs of threads Item chosen by semantic module

Uses color to encode information (high variance, max/min)

Analysis mode takes into account filter and semantic modules (described next) Very complex and user-unfriendly, but Allows complicated analyses to be

performed, can easily reconstruct most “normal” profiling information

Page 12: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

12

Paraver: Filter Module Filter module allows filtering of events before

they are Shown in the timeline Processed by the semantic module Analyzed by the 1D/2D analyzers

Can filter events by communication parameters Who sends/receives the message Message tag (MPI tag) Logical times (when send/receive functions are

called) or physical times (when send/receive actually takes place)

Combination of ANDs/ORs from the above Also by user events

Type and/or value Interface for filtering events is straightforward

Page 13: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

13

Paraver: Semantic Module Interface between raw tracefile data what user sees

Sits above filter, below visualization modules Makes heavy use of the runtime/process model

Uses 3 different methods for getting values Work with the process model (next slide)

Application, task, thread, and workload levels Work with the available system resources (next slide)

Node, CPU, and system levels Combine different existing views

E.g., combine TLB misses with loads for average TLB miss ratios In a few words: controls how trace file information is displayed

Flexible way of being able to display disparate types of information (communication vs. hardware counters)

Can take a lot of work to get Paraver to show what information you’re looking for

Saved window configurations can help greatly here (perform steps only once, use for all traces later on)

Easily the most confusing aspect of Paraver Documentation doesn’t necessarily help with this

Page 14: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

14

Dimemas Overview Uses generic “network of SMPs” model to perform trace-

driven simulation Outputs trace files that can be directly visualized by

Paraver Uses different tracefile format for input than Paraver Was never able to get this to work

“dimemas” GUI crashed Java version works, but other problems exist….

“Dimemas” complained about missing license even though one was in $DIMEMAS_HOME/etc/license.dat

Need MPIDTrace? Rest of evaluation based on available documentation [4, 5, 6]

Page 15: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

15

Dimemas: Architectural/Process Model Simulated architecture: network of SMPs Parameters for interconnection network

Number of buses (models resource contention) Bisection bandwidth of network Full duplex/half duplex links (from node to bus)

Parameters for nodes Bandwidth and latency for intra-node communication Latency for inter-node communication Processor speed (uses linear speedup model)

Parameters for existing systems are collected (manually) via microbenchmarks Uses the same process model as Paraver

Application (Ptask), task, thread levels Can model MPI, OMP, hybrid models with this model

Image courtesy [5]

Page 16: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

16

Dimemas: Communication Model Figures to right illustrate timing information

that is simulated Point-to-point communication model

Shown right top Straightforward model based on latencies,

bandwidth, and contention (bus model) Collective communication model

Shown right bottom Implicit barrier before all collective operations Two phases:

Fan in Fan out

Collective communication time represented 3 ways (selected by user) Constant Linear Logarithmic

User specifies parameters Located in special Dimemas “database” text

files Existing set covers IBM SP, SGI Origin 2000,

and a few others

Image courtesy [5]

Image courtesy [5]

Page 17: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

17

Dimemas: Accuracy, Other Features Accuracy

On trivial applications (ping-pong), expected error with correct parameters is less than 12% [4]

Collective communication model for MPI verified in [6] on NAS benchmark suite Most applications within 30% accuracy (IS.A.8 jumped to over 150% error)

Other features Critical path selection

Starts at end, shows dependency path back to beginning of critical path Sensitivity analysis (factorial analysis, vary parameters within 10%) “What-if” analysis

Can adjust the time taken for each function call to see what would happen if you could write a faster version

Can also answer questions like “what would happen if we double our bandwidth?” Simulation time: unknown (not reported in any documentation)

Only communication events are simulated Therefore, assume simulation time is proportional to amount of

communication Also, uses simple (coarse bus-based) contention model, so simulation times

should be reasonable

Page 18: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

18

Bottleneck Identification Test Suite Testing metric: what did trace visualization tell us (automatic

instrumentation)? Assumed a fully-functional installation of Paraver and Dimemas

CAMEL: PASSED Identified large number of small messages at beginning of program

execution Assuming hardware counters worked, could also identify sequential parts of

algorithm (sort on node 0, etc) NAS LU (“W” workload): PASSED

Showed communication bottlenecks very clearly Large(!) number of small messages Illustrated time taken for repartitioning data Shows sensitivity to latency for processors waiting on data from other

processors Could use Dimemas to pinpoint latency problem by testing on ideal

network with no/little latency Moderately-sized trace file (62MB), loaded slowly (> 60 seconds) in

Paraver

Page 19: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

19

Bottleneck Identification Test Suite (2) Big message: PASSED

Traces illustrated large amount of time spent in send and receive

Diffuse procedure: PASSED Traces illustrated a lot of synchronization with

one process doing more work Since no source code correlation, hard to tell

why problem existed Hot procedure: TOSS-UP

Assuming hardware counters work, would be easy to see extra CPU utilization

No source code correlation would make it difficult to pinpoint problem

Intensive server: PASSED Traces showed that other nodes were waiting

on node 0 Ping pong: PASSED

Traces illustrated that the application was very latency-sensitive

Much time being spent on waiting for messages to arrive

Random barrier: PASSED Traces showed that one was doing more

work than the others Small messages: PASSED

Traces illustrated a large number of messages being sent to node 0

Also illustrated overhead of instrumentation for writing tracefile information

System time: FAILED No way to tell system time vs. user time

Wrong way: PASSED First receive took a long time for message to

arrive in trace

Page 20: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

20

General Comments Very large learning curve

Complex software with lots of concepts Concepts must be totally understood or

The software doesn’t make sense The software seems like it has no functionality

Some “common” actions (e.g., view TLB cache misses) can be very difficult to do at first in Paraver Stored window configuration helps with this

Older tools Seem to have grown and gained features as the need for them

arose Lots of “cruft” and strange ways of presenting things User interface clunky by today’s standards User interface complicated by anyone’s standards!

Page 21: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

21

General Comments (2) Trace-driven simulation: useful?

Can be useful for performing “what-if” studies and sensitivity analyses

But, still limited on what you can explore without modifying the application Can see what happens when a function can run twice as fast Can’t see effect of different algorithms without rerunning application

Tools provide little guidance on what user should do next Heavily reliant on skill of user to make efficient use of tools

Page 22: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

22

Adding UPC/SHMEM Support Commercial tool!

No way to explicitly add support into Dimemas or Paraver for UPC or SHMEM

However, tools written using modular design Existing process and resource models can be used to model UPC

and SHMEM applications Paraver and Dimemas do not need to explicitly support UPC and

SHMEM, just trace files

Assuming we have methods for instrumenting UPC and SHMEM code, all that is required is writing to the .prv file format Documented! Not sure about Dimemas’ trace file format…

Page 23: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

23

Evaluation (1) Available metrics: 5/5

Can use PAPI and existing hardware counters Paraver can combine trace information and give you just about any metric you

can think of Cost: 1/5

For Paraver, Dimemas, and MPITrace, 1 seat: 2000 Euros (~$2,600) Documentation quality: 1/5

MPITrace: Inadequate documentation for Linux Dimemas: Only tutorial available unless you want to read through conference

papers and PhD theses Paraver: User manual very thorough but technical and unclear Many grammar errors impair reading!

“temporal files” -> temporary files Many more…

*Note: evaluated Linux version

Page 24: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

24

Evaluation (2) Extensibility: 0/5

Commerical (no source), but Can add new functions to semantic module for Paraver Flexible design lets you support a wide variety of programming paradigms by

using documented trace file format Filtering and aggregation: 5/5

Paraver has powerful filtering & aggregation capability Filtering & aggregation only post-mortem, however

Hardware support: 3/5 AlphaServer (Tru64), 64-bit Linux (Opteron, Itanium), IBM SP (AIX), IRIX, HP-UX Most everything supported: Linux, AIX, IRIX, HP-UX No Cray support

Heterogeneity support: 0/5 (not supported)

Page 25: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

25

Evaluation (3) Installation: 1/5

Linux installation riddled with errors and problems PAPI dependency for hardware counters complicates things (needs kernel patch) Have had the software over 2 months, still not working correctly According to our contact, this is not normal, but other tools nowhere near as hard to install

Interoperability: 1/5 No export interoperability with other tools Apparently tools exist to import SDDF and other formats (but I couldn’t find them) Can import UTE traces

Learning curve: 1/5 All graphical interfaces have unintuitive interfaces Software is complex, and tutorials do not lessen learning curve very much

Manual overhead: 1/5 MPITrace only records MPI events Linux needs extra instructions in source code to get hardware counter information Need to relink or recode to turn tracing on or off

Measurement accuracy: 4/5 CAMEL overhead: ~8% Tracing overhead not negligible, but within acceptable limits Dimemas accuracy decent, but good enough to do what Dimemas is intended for

Page 26: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

26

Evaluation (4) Multiple executions: 1/5

Paraver supports displaying multiple tracefiles at the same time This lets you relate different runs (with different parameters) to each other relatively easily

Multiple analyses & views: 4/5 Semantic modules provide a convenient (if awkward) way of displaying different types of data Semantic modules also allow the displaying of the same type of data in different ways Analysis modules show statistical summary information over time ranges

Performance bottleneck identification: 4.5/5 No automatic bottleneck identification All the information you need to identify a bottleneck should be available between Paraver and Dimemas However, much manual effort is needed to determine where bottlenecks are Also, no information is related back to the source code level

Profiling/tracing support: 2/5 Only supports tracing Trace files can be quite large and can take some time to open

Response time: 3/5 No data at all until after run has completed and tracefile has been opened Dimemas requires simulation to fully finish and Paraver to open up the generated tracefile before

information is shown to user

Page 27: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

27

Evaluation (5) Searching: 3/5

Search features provided by Dimemas Software support: 3.5/5

MPI profiling library allows linking against any existing libraries OpenMP, OpenMP+MPI programs also supported via add-on instrumentation

libraries Source code correlation: 0/5

Not supported directly, can use user events to identify program phases System stability: 3/5

MPITrace stable (had no problems other than installation) Paraver crashed relatively often (>= 1 time per hour) Dimemas stability not tested

Technical support: 3/5 Responses from contact within 24-48 hours Some problems not resolved quickly, though

Page 28: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

28

References[1] “MPITrace tool version 1.1: User’s guide,” November 2000.

http://www.cepba.upc.es/paraver/docs/MPItrace.pdf

[2] “Paraver version 2.1: Tutorial,” November 2000. http://www.cepba.upc.es/paraver/docs/Paraver_TUTORIAL.pdf

[3] “Paraver version 3.1: Reference manual (DRAFT),” October 2001. http://www.cepba.upc.es/paraver/docs/Paraver_MANUAL.pdf

[4] “DiP: A Parallel Program Development Environment,” Jesús Labarta et al. In proc. of 2nd International EuroPar Conference (EuroPar 96), Lyon (France), August 1996.

Page 29: CEPBA Tools (DiP) Evaluation Report Adam Leko Hans Sherburne, UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information

29

References (2)

[5] “Performance Prediction and Evaluation Tools,” Sergi Turell. PhD thesis, Universitat Politecnica de Catalunya, March 2003.

[6] “Validation of Dimemas communication model for collective MPI communications,” S. Girona et al. In proc. of EuroPVM/MPI 2000, Balatonfüred, Lake Balaton, Hungary, September 2000.

[7] “Introduction to Dimemas,” (tutorial).

http://www.cepba.upc.edu/dimemas/docs/Dimemas_MANUAL.pdf