Upload
evangeline-ferguson
View
216
Download
3
Tags:
Embed Size (px)
Citation preview
Simplifying the Usage of Performance Evaluation Tools:
Experiences with TAU and DyninstAPIParadyn/Condor Week 2010,
Rm 221, Fluno Center, U. of Wisconsin, Madison, 10:45am – 11:30 am Tuesday, 14th April, 2010
Sameer Shende, Allen D. Malony, Alan MorrisPerformance Research LaboratoryUniversity of Oregon, Eugene, OR
{sameer, malony, amorris}@cs.uoregon.edu
http://tau.uoregon.edu
http://tau.uoregon.edu 2
Acknowledgements: University of Oregon
Dr. Allen D. Malony, Professor, CIS Dept, and Director, NeuroInformatics Center
Alan Morris, Senior software engineer Dr. Chee Wai Lee, Research faculty Wyatt Spear, Software engineer Scott Biersdorff, Software engineer Dr. Robert Yelle, Research faculty Suzanne Millstein, Ph.D. student
And Matt Legendre and Dan McNulty, University of
Wisconsin at Madison
http://tau.uoregon.edu 3
Motivation
We have made great advances in instrumentation, measurement and analysis techniques
Tools are rich in features and have a complex tool dependency
Tools are getting more complex to use and to install
We need to simplify the usage of our performance evaluation tools!
http://tau.uoregon.edu 4
TAU Performance System® Integrated toolkit for performance problem
solving Instrumentation, measurement, analysis,
visualization Portable performance profiling and tracing
facility Performance data management and data
mining Based on direct performance measurement
approach Open source Available on all HPC platforms
Partners LLNL, ANL, ORNL, LANL, PNNL, LBL Research Centre Jülich, TU Dresden
TAU Architecture
http://tau.uoregon.edu 5
TAU Parallel Performance System Goals
Portable (open source) parallel performance system Computer system architectures and operating systems Different programming languages and compilers
Multi-level, multi-language performance instrumentation Flexible and configurable performance measurement Support for multiple parallel programming paradigms
Multi-threading, message passing, mixed-mode, hybrid, object oriented (generic), component-based
Support for performance mapping Integration of leading performance technology Scalable (very large) parallel performance analysis
http://tau.uoregon.edu 6
TAU Performance System ComponentsTAU Architecture Program Analysis
Parallel Profile Analysis
PD
TP
erfD
MF
Par
aPro
f
Performance Data Mining
Performance Monitoring
TA
Uov
erM
RN
et (
ToM
)
PerfExplorer
http://tau.uoregon.edu 7
TAU Performance System Architecture
http://tau.uoregon.edu 8
TAU Performance System Architecture
http://tau.uoregon.edu 9
Parallel Profile Visualization: ParaProf
http://tau.uoregon.edu 10
Scalable Visualization: ParaProf (128k cores)
http://tau.uoregon.edu 11
Scatter Plot: ParaProf (128k cores)
http://tau.uoregon.edu 12
ParaProf: Communication Matrix Display
http://tau.uoregon.edu 13
Comparing Effects of Multi-Core Processors
AORSA2Dmagnetized plasma simulation Automatic loop level instrumentation Blue is single node Red is dual core Cray XT3 (4K cores)
http://tau.uoregon.edu 14
ParaProf: Mflops Sorted by Exclusive Time
low mflops?
http://tau.uoregon.edu 15
Performance Regression Testing
http://tau.uoregon.edu 16
Usage Scenarios: Evaluate Scalability
http://tau.uoregon.edu 17
Scaling NAMD with CUDA (Jumpshot with TAU)
Data transfer
http://tau.uoregon.edu 18
Measuring Performance of PGI Accelerated Code
http://tau.uoregon.edu 19
TAU and Eclipse
Provide an interface for configuring TAU’s automatic instrumentation within Eclipse’s build system
Manage runtime configuration settings and environment variables for execution of TAU instrumented programs
C/C++/FortranProject in Eclipse
Add or modifyan Eclipse build
configuration w/ TAU
Temporary copyof instrumented code
Compilation/linkingwith TAU libraries
TAU instrumentedlibraries
Programexecution
Performancedata
Programoutput
http://tau.uoregon.edu 20
TAU and Eclipse
PerfDMF
http://tau.uoregon.edu 21
Choosing PAPI Counters with TAU in Eclipse
http://tau.uoregon.edu 22
TAU Performance System Architecture
http://tau.uoregon.edu 23
TAU Instrumentation Approach Support for standard program events
Routines, classes and templates Statement-level blocks Begin/End events (Interval events)
Support for user-defined events Begin/End events specified by user Atomic events (e.g., size of memory allocated/freed) Selection of event statistics
Support definition of “semantic” entities for mapping Support for event groups (aggregation, selection) Instrumentation optimization
Eliminate instrumentation in lightweight routines
http://tau.uoregon.edu 24
TAU Instrumentation Mechanisms Source code
Manual (TAU API, TAU component API) Automatic (robust)
C, C++, F77/90/95 (Program Database Toolkit (PDT)) OpenMP (directive rewriting (Opari), POMP2 spec)
Object code Compiler-based instrumentation (-optCompInst) Pre-instrumented libraries (e.g., MPI using PMPI) Statically-linked and dynamically-linked (tau_wrap)
Executable code Binary re-writing and dynamic instrumentation (DyninstAPI, U. Wisconsin,
U. Maryland) Virtual machine instrumentation (e.g., Java using JVMPI) Interpreter based instrumentation (Python) Kernel based instrumentation (KTAU)
http://tau.uoregon.edu 25
Program Database Toolkit (PDT)
Application/ Library
C / C++parser
Fortran parserF77/90/95
C / C++IL analyzer
FortranIL analyzer
ProgramDatabase
Files
IL IL
DUCTAPE
PDBhtml
SILOON
CHASM
TAU_instr
Programdocumentation
Applicationcomponent glue
C++ / F90/95interoperability
Automatic sourceinstrumentation
http://tau.uoregon.edu 26
Automatic Source-Level Instrumentation in TAU
tau_instrumentor
Parsedprogram
Instrumentationspecification file
Instrumentedsource
TAU sourceanalyzer
Applicationsource
TAU v2.19.1+: If source based instrumentation fails, compiler-based instrumentation is used automatically
http://tau.uoregon.edu 2727
Using TAU with Source Code Instrumentation
TAU supports several measurement options (profiling, tracing, profiling with hardware counters, etc.)
Each measurement configuration of TAU corresponds to a unique stub makefile that is generated when you configure it
To instrument source code using PDT Choose an appropriate TAU stub makefile in <arch>/lib:% export TAU_MAKEFILE=/usr/local/packages/tau/x86_64/lib/Makefile.tau-mpi-pdt% export TAU_OPTIONS=‘-optVerbose …’ (see tau_compiler.sh -help)
And use tau_f90.sh, tau_cxx.sh or tau_cc.sh as Fortran, C++ or C compilers:% mpif90 foo.f90 changes to % tau_f90.sh foo.f90
Execute application and analyze performance data:% pprof (for text based profile display)% paraprof (for GUI)
http://tau.uoregon.edu 28
TAU Measurement Configuration – Examples
% cd /usr/local/packages/tau/x86_64/lib; ls Makefile.*Makefile.tau-pdtMakefile.tau-mpi-pdtMakefile.tau-papi-mpi-pdtMakefile.tau-pthread-pdtMakefile.tau-pthread-mpi-pdtMakefile.tau-openmp-opari-pdtMakefile.tau-openmp-opari-mpi-pdtMakefile.tau-papi-openmp-opari-mpi-pdt… For an MPI+F90 application, you may want to start with:Makefile.tau-mpi-pdt
Supports MPI instrumentation & PDT for automatic source instrumentation % setenv TAU_MAKEFILE
/usr/local/packages/tau/x86_64/lib/Makefile.tau-mpi-pdt % tau_f90.sh application.f90; mpirun –np 256 ./a.out
http://tau.uoregon.edu 29
Compile-Time Environment Variables Optional parameters for TAU_OPTIONS: [tau_compiler.sh –help]
-optVerbose Turn on verbose debugging messages
-optCompInst Use compiler based instrumentation
-optNoCompInst Do not revert to compiler instrumentation if source instrumentation fails.
-optDetectMemoryLeaks Turn on debugging memory allocations/de-allocations to track leaks
-optKeepFiles Does not remove intermediate .pdb and .inst.* files
-optPreProcess Preprocess Fortran sources before instrumentation
-optTauSelectFile="" Specify selective instrumentation file for tau_instrumentor
-optLinking="" Options passed to the linker. Typically $(TAU_MPI_FLIBS) $(TAU_LIBS) $(TAU_CXXLIBS)
-optCompile="" Options passed to the compiler. Typically $(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)
-optPdtF95Opts="" Add options for Fortran parser in PDT (f95parse/gfparse)
-optPdtF95Reset="" Reset options for Fortran parser in PDT (f95parse/gfparse)
-optPdtCOpts="" Options for C parser in PDT (cparse). Typically $(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)
-optPdtCxxOpts="" Options for C++ parser in PDT (cxxparse). Typically$(TAU_MPI_INCLUDE) $(TAU_INCLUDE) $(TAU_DEFS)
...
http://tau.uoregon.edu 30
Runtime Environment Variables in TAU
Environment Variable Default Description
TAU_TRACE 0 Setting to 1 turns on tracing
TAU_CALLPATH 0 Setting to 1 turns on callpath profiling
TAU_TRACK_HEAP or TAU_TRACK_HEADROOM
0 Setting to 1 turns on tracking heap memory/headroom at routine entry & exit using context events (e.g., Heap at Entry: main=>foo=>bar)
TAU_CALLPATH_DEPTH 2 Specifies depth of callpath. Setting to 0 generates no callpath or routine information, setting to 1 generates flat profile and context events have just parent information (e.g., Heap Entry: foo)
TAU_SYNCHRONIZE_CLOCKS 1 Synchronize clocks across nodes to correct timestamps in traces
TAU_COMM_MATRIX 0 Setting to 1 generates communication matrix display using context events
TAU_THROTTLE 1 Setting to 0 turns off throttling. Enabled by default to remove instrumentation in lightweight routines that are called frequently
TAU_THROTTLE_NUMCALLS 100000 Specifies the number of calls before testing for throttling
TAU_THROTTLE_PERCALL 10 Specifies value in microseconds. Throttle a routine if it is called over 100000 times and takes less than 10 usec of inclusive time per call
TAU_COMPENSATE 0 Setting to 1 enables runtime compensation of instrumentation overhead
TAU_PROFILE_FORMAT Profile Setting to “merged” generates a single file. “snapshot” generates xml format
TAU_METRICS TIME Setting to a comma separted list generates other metrics. (e.g., TIME:linuxtimers:PAPI_FP_OPS:PAPI_NATIVE_<event>)
http://tau.uoregon.edu 31
Simplifying Instrumentation using DyninstAPI TAU uses DyninstAPI to create a binary re-writer (tau_run) TAU’s measurement library (DSO) is loaded by tau_run Both runtime instrumentation and binary re-writing are supported Selection of files and routines based on exclude/include lists Simplifies tool usage greatly! Available on POINT LiveDVD [http://tau.uoregon.edu/point.iso] Usage:
% tau_run a.out –o a.inst.out % mpirun –np 4 a.inst.out % paraprof
http://tau.uoregon.edu 32
Issues
Re-writing static executables limited to gcc, limited platforms in beta
Currently, we support dynamic executables (v6.1) We are working on supporting both static and dynamic
executables We hope to support more platforms, compilers and
runtime systems in the future Rewriting shared libraries used by the application LD_PRELOAD’able wrapper libraries can be created
using tau_wrap requires interface information in header file
http://tau.uoregon.edu 33
Binary Rewriting in TAU using DyninstAPI
http://tau.uoregon.edu 34
Wish List for tau_run
Support for more platforms Apple Mac OS X, Windows, IBM BG/P, AIX, …
Support for more compilers Support for rewriting shared objects Support for static binary rewriting with validation for
compilers other than gcc XLC, PathScale, Cray CCE, Intel, PGI,…
http://tau.uoregon.edu 35
Other Tools…
Other TAU tools that use technologies from the ParaDyn/DyninstAPI group TAU over MRNet (ToM) for runtime Stackwalker API for accessing callstack
http://tau.uoregon.edu 36
StackWalkerAPI in TAU
Requirements overview: Minimal information required (PC is enough) Threaded support necessary Low overhead (for high sample rates) Stack unwinding from a signal handler
Malloc could be interrupted Need to walk through signal handler frame
http://tau.uoregon.edu 37
Issues encountered with StackWalkerAPI
StackWalkerAPI: Isn’t thread safe (and locking to use it can cause
significant overhead) Uses malloc/new (and so do dependent libraries such as
libdwarf) C++ (we would prefer C) Issues walking certain kinds of stack frames
Matt Legendre was able to help us out a lot though!
Alternatives: TAU is currently using stack walking constructs from
HPCToolkit
http://tau.uoregon.edu 38
Online Monitoring using TAU over MRNet (ToM)
Back-End (BE) TAU adapter offloads performance data
Filters reduction distributed analysis upstream / downstream
Front-End (FE) unpacks, interprets, stores
Paths reverse data reduction path multicast control path
Push-Pull model source pushes, sink pulls
http://tau.uoregon.edu 39
Conclusions
TAU and DyninstAPI represents mature technology for performance instrumentation, measurement and analysis
Using DyninstAPI’s binary re-writing capabilities, we have produced a tool that simplifies code instrumentation
We hope to collaborate on other projects and include support for an enhanced stack walker API
Questions?
http://tau.uoregon.edu 40
Support Acknowledgements Department of Energy (DOE)
Office of Science MICS, Argonne National Lab
ASC/NNSA University of Utah ASC/NNSA Level 1 ASC/NNSA, LLNL
Department of Defense (DoD) NSF SDCI Partners: Research Centre Juelich LBL, ORNL, ANL, LANL, PNNL, LLNL TU Dresden ParaTools, Inc.