11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,

11 July 2005

Tool Evaluation Tool Evaluation Scoring CriteriaScoring Criteria

Professor Alan D. George, Principal InvestigatorMr. Hung-Hsun Su, Sr. Research Assistant

Mr. Adam Leko, Sr. Research AssistantMr. Bryan Golden, Research Assistant

Mr. Hans Sherburne, Research Assistant

HCS Research LaboratoryUniversity of Florida

PATPAT

11 July 2005

Usability/Portability Characteristics

311 July 2005

Available Metrics

Description Depth of metrics provided by tool Examples

Communication statistics or events Hardware counters

Importance rating Critical, users must be able to obtain representative performance

data to debug performance problems Rating strategy

Scored using relative ratings (subjective characteristic) Compare tool’s available metrics with metrics provided by other

tools

411 July 2005

Documentation Quality

Description Quality of documentation provided Includes user’s manuals, READMEs, and “quick start” guides

Importance rating Important, can have a large affect on overall usability

Rating strategy Scored using relative ratings (subjective characteristic) Correlated to how long it takes to decipher documentation

enough to use tool Tools with quick start guides or clear, concise high-level

documentation receive higher scores

511 July 2005

Installation

Description Measure of time needed for installation Also incorporates level of expertise necessary to perform

installation Importance

Minor, installation only needs to be done once and may not even be done by end user

Rating strategy Scored using relative ratings based on mean installation time for

all tools All tools installed by a single person with significant system

administration experiences

611 July 2005

Learning Curve

Description Difficulty level associated with learning to use tool

effectively Importance rating

Critical, tools that are perceived as being too difficult to operate by users will be avoided

Rating strategy Scored using relative ratings (subjective characteristic) Based on time necessary to get acquainted with all

features needed for day-to-day operation of tool

711 July 2005

Manual Overhead Description

Amount of user effort needed to instrument their code Importance rating

Important, tool must not cause more work for user in end (instead it should reduce time!)

Rating strategy Use hypothetical test case

MPI program, ~2.5 kloc in 20 .c files with 50 user functions Score one point for each of the following actions that can be completed on a fresh

copy of source code in 10 minutes (estimated) Instrument all MPI calls Instrument all functions Instrument five arbitrary functions Instrument all loops, or a subset of loops Instrument all function callsites, or a subset of callsites (about 35)

811 July 2005

Measurement Accuracy Description

How much runtime instrumentation overhead tool imposes Importance rating

Important, inaccurate data may lead to incorrect diagnosis which creates more work for user with no benefit

Rating strategy Use standard application: CAMEL MPI program Score based on runtime overhead of instrumented executable (wallclock

time) 0-4%: five points 5-9%: four points 10-14%: three points 15-19%: two points 20% or greater: one point

911 July 2005

Multiple Analyses/Views

Description Different ways tool presents data to user Different analyses available from within tool

Importance rating Critical, tools must provide enough ways of looking at data

so that users may track down performance problems Rating strategy

Score based on relative number of views and analyses provided by each tool

Approximately one point for each different view and analyses provided by tool

1011 July 2005

Profiling/Tracing Support

Description Low-overhead profile mode offered by tool Comprehensive event trace offered by tool

Importance rating Critical, profile mode useful for quick analysis and trace mode

necessary for examining what really happens during execution Rating strategy

Two points if a profiling mode is available Two points if a tracing mode is available One extra point if trace file size is within a few percent of best

trace file size across all tools

1111 July 2005

Response Time

Description How much time is needed to get data from tool

Importance rating Average, user should not have to wait an extremely long time for

data but high-quality information should always be first goal of tools

Rating strategy Score is based on relative time taken to get performance data

from tool Tools that perform post-mortem complicated analyses or

bottleneck detection receive lower scores Tools that provide data while program is running receive five

points

1211 July 2005

Source Code Correlation

Description How well tool relates performance data back to original source

code Importance rating

Critical, necessary to see which statements and regions of code are causing performance problems

Rating strategy Four to five points if tool supports source correlation to function

or line level One to three points if tool supports indirect method of attributing

data to functions or source lines Zero points if tool does not provide enough data to map

performance metrics back to source code

1311 July 2005

Stability Description

How likely tool is to crash while under use Importance rating

Important, unstable tools will frustrate users and decrease productivity

Rating strategy Scored using relative ratings (subjective characteristic) Score takes into account

Number of crashes experienced during evaluation Severity of crashes Number of bugs encountered

1411 July 2005

Technical Support

Description How quick responses are received from tool developers or

support departments Quality of information and helpfulness of responses

Importance rating Average, important for users during installation and initial

use of tool but becomes less important as time goes on Rating strategy

Relative rating based on personal communication with our contacts for each tool (subjective characteristic)

Timely, informative responses result in four or more points

11 July 2005

Portability Characteristics

1611 July 2005

Extensibility

Description How easy tool may be extended to support UPC and SHMEM

Importance rating Critical, tools that cannot be extended for UPC and SHMEM are

almost useless for us Rating strategy

Commercial tools receive zero points Regardless of if export or import functionality is available Interoperability covered by another characteristic

Subjective score based on functionality provided by tool Also incorporates quality of code (after quick review)

1711 July 2005

Hardware Support

Description Number and depth of hardware platforms supported

Importance rating Critical, essential for portability

Rating strategy Based on our estimate of important architectures for UPC and SHMEM Award one point for support of each of the following architectures

IBM SP (AIX) IBM BlueGene/L AlphaServer (Tru64) Cray X1/X1E (UnicOS) Cray XD1 (Linux w/Cray proprietary interconnect) SGI Altix (Linux w/NUMALink) Generic 64-bit Opteron/Itanium Linux cluster support

1811 July 2005

Heterogeneity

Description Tool support for running programs across different

architectures within a single run Importance rating

Minor, not very useful on shared-memory machines

Rating strategy Five points if heterogeneity is supported Zero points if heterogeneity is not supported

1911 July 2005

Software Support

Description Number of languages, libraries, and compilers supported

Importance rating Important, should support many compilers and not hinder library

support but hardware support and extensibility are more important

Rating strategy Score based on relative number of languages, libraries, and

compilers supported compared with other tools Tools that instrument or record data for existing closed-source

libraries receive an extra point (up to max of five points)

11 July 2005

Scalability Characteristics

2111 July 2005

Filtering and Aggregation

Description How well tool is able to provide users with tools to simplify

and summarize data being displayed Importance rating

Critical, necessary for users to effectively work with large data sets generated by performance tools

Rating strategy Scored using relative ratings (slightly subjective

characteristic) Tools that provide many different ways of filtering and

aggregating data receive higher scores

2211 July 2005

Multiple Executions Description

Support for relating and comparing performance information from different runs

Examples Automated display of speedup charts Differences between time taken for methods using

different algorithms or variants of a single algorithm Importance rating

Critical, import for doing scalability analysis Rating strategy

Five points if tool supports relating data from different runs Zero points if not

2311 July 2005

Performance Bottleneck Detection Description

How well tool identifies each known (and unknown) bottleneck in our test suite

Importance rating Critical, bottleneck detection the most important function of

a performance tool Rating strategy

Score proportional to the number of PASS ratings given for test suite programs

Slightly subjective characteristic; have to guess that the user is able to determine bottleneck based on data provided by tool

2411 July 2005

Searching

Description Ability of the tool to search for particular information or

events Importance rating

Minor, can be useful but difficult to provide users with a powerful search that is user-friendly

Rating strategy Five points if searching is support

Points deducted if only simple search available Zero points if no search functionality

11 July 2005

Miscellaneous Characteristics

2611 July 2005

Cost Description

How much (per seat) the tool costs to use Importance rating

Important, tools that are prohibitively expensive reduce overall availability of tool

Rating strategy Scale based on per-seat cost

Free: five points $1.00 to $499.99: four points $500.00 to $999.99: three points $1,000.00 to $1,999.99: two points $2,000.00 or more: one point

2711 July 2005

Interoperability Description

How well the tool works and integrates with other performance tools

Importance rating Important, tools lacking in areas like trace visualization can make

up for it by exporting data that other tools can understand (also helpful for getting data from 3rd-party sources)

Rating strategy Zero if data cannot be imported or exported from tool One point for export of data in a simple ASCII format Additional points (up to five) for each format the tool can export

from and import into

Documents

11 July 2005 Tool Evaluation Scoring Criteria Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko,