View
9
Download
1
Category
Preview:
Citation preview
January 19, 2005
SGI® Altix™Using The Intel VTune Performance Analyzer
Reiner VogelsangSGI GmbH
reiner@sgi.com
January 19, 2005 Page 2| |
Module Objectives
After completing the module you will be able •to profile an application using VTune•to run an experiement with multiple performance counter•to generate a callgraph of your application
January 19, 2005 Page 3| |
VTune – Purpose
Helps to identify and characterize performance issues by •Collecting performance data
– CPU-Cycles (time)– Micro-architectural events of processor– Platform resource utilization
Organizing and displaying the data
Identifying performance ‘hotspots’
Suggesting improvements (currently Windows only!)
January 19, 2005 Page 4| |
VTune: Status
•Native: Vtune for Linux 3.0– Any IA-32 or Itanium® system running recent Linux version
– Some kernel and GLIBC dependencies– Full Eclipsed-based GUI only for IA32 today
– Due to Eclipse issues with 64bit – Simple GUIs for IA64 available
– For Itanium® & EM64T command-line version– But graphical viewers for result– Eclipse-based release for 64bit system later in 2005
•Remote Data Collection– Allows full Windows GUI to be used for Linux too
January 19, 2005 Page 5| |
VTune: Features
•Sampling of Execution Addresses–Profiling based on processor event counters
•Call Graph Profiling - Instrumented analysis–Call tree, number of calls, timing information–Executing Instrumented Code
•Tracking of System Performance Counters–Performance Monitor (perfmon) Style Counters–Extended Performance DLL APIs – SDK Available!
•Intel® Tuning Assistant: Interpret the results ( Windows or RDC only )
January 19, 2005 Page 6| |
VTune: Example 1
General status commandsvtl query –lc
– lists all collectors ( sampling and callgraph for 2.0) vtl –help –c sampling
– lists all events available for EBS ( event base sampling ) Compile code with -g Create/Run a Sampling activity
setenv OMP_NUM_THREADS 4vtl activity –c sampling –app dplace, “-c 4-7 \
./untrim_elbe” run– Create and run a single Sampling collector Activity with application
‘dplace -x 2 -c 4-7 ./untrim_elbe’ ; default settings ( Instruction Retired and Cycles )
Invoke the viewer in order to display collected data
vtl view -gui
– Displays the last activitiy per default
January 19, 2005 Page 7| |
VTune -Process Display
January 19, 2005 Page 8| |
VTune: Module Display
January 19, 2005 Page 9| |
VTune: Hotspots
January 19, 2005 Page 10| |
VTune: Source Code Display
January 19, 2005 Page 11| |
VTune: Multiple Performance Counter Events
•VTune let you sample customized subsets of performance counter subsets.
•Important for stall cycle analysis and DEAR analysis(DEAR=Data Event Address Registers)
vtl activity -d 600 -c sampling \-o "-cpu_mask 8-15-ec en='L3_READS-ALL-MISS', \en='LOADS_RETIRED',en='STORES_RETIRED', \en='FP_OPS_RETIRED'" \-app dplace,"-x2 -c8-15 ./untrim_elbe" run
–Example collects all loads,stores, floating point operations and misses in L3 due to reads.
–Application will be executed #-of-event times.–Viewer let you sort hot spots according to each individual event.
January 19, 2005 Page 12| |
VTUNE: Module Display, Multiple Events
January 19, 2005 Page 13| |
VTune: Hotspots, Multiple Events
January 19, 2005 Page 14| |
VTune: Hotspots As Charts, Multiple Events
January 19, 2005 Page 15| |
VTune: Source of L3_MISS Hotspot
January 19, 2005 Page 16| |
VTUNE: Example 3, Callgraph
• Shows graphically the caller-callee relationship
• Highlights the hot path of an application
vtl activity -d 600 -c callgraph -app ../src/adi \-moi ../src/adi run
– It is important to declare the path to the application and the module of interest (-moi) in unique manner.
January 19, 2005 Page 17| |
VTUNE: Callgraph + Hot Path
January 19, 2005 Page 18| |
VTune: Summary
•VTune has its benefits for
–Hotpath detection within a caller-callee relationship
–Collecting and Displaying multiple performance counter events in case of stallcylce analysis or DEAR analysis.
January 19, 2005 Page 19| |
Recommended