61
Methods and practices to analyze the performance of your application with Intel® VTuneAmplifier XE Leo Borges Intel Software Conference 2014 Brazil May 2014

Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Embed Size (px)

DESCRIPTION

Leo Borges Intel Software Conference 2014 Brazil May 2014

Citation preview

Page 1: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Methods and practices to analyze the performance of your

application with Intel® VTune™

Amplifier XELeo BorgesIntel Software Conference 2014 BrazilMay 2014

Page 2: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Legal Disclaimer & Optimization NoticeLegal Disclaimer & Optimization NoticeLegal Disclaimer & Optimization NoticeLegal Disclaimer & Optimization Notice

Copyright© Copyright© Copyright© Copyright© 2012, 2012, 2012, 2012, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

2

Page 3: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Agenda

• Intel® VTune Amplifier XE Intro

• Microarchitecture Review

• The Top-Down Characterization details

• Intel® VTune™ Amplifier XE Implementation

• Demo**Sources for current presentation:

� http://software.intel.com/en-us/articles/advanced-profiling-with-intel-vtune-amplifier-xe-part-1-find-the-bottleneck

3

Page 4: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Two Ways to Collect Data - Intel® VTune™ Amplifier XE

4

Software CollectorSoftware CollectorSoftware CollectorSoftware CollectorHotspots, Concurrency, Locks & Waits

Hardware CollectorHardware CollectorHardware CollectorHardware CollectorLightweight Hotspots, Advanced Analysis

Uses OS interrupts Uses the on chip Performance Monitoring

Unit (PMU)

Collects from a single process tree Collect system wide or from a

single process tree.

~10ms default resolution ~1ms default resolution (finer granularity - finds small functions)

Collect on both Intel® and compatible

processors

Requires a genuine Intel® processor for

collection

Call stacks show calling sequence New! Optionally collect call stacks

Works in virtual environments Works in virtual environments only when

supported by the VM

(e.g., vSphere* 5.1)

No driver required Requires a driver

No special recompiles No special recompiles No special recompiles No special recompiles ---- C, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, Assembly

Page 5: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Two Ways to Collect Data - Intel® VTune™ Amplifier XE

5

Software CollectorSoftware CollectorSoftware CollectorSoftware CollectorHotspots, Concurrency, Locks & Waits

Hardware CollectorHardware CollectorHardware CollectorHardware CollectorLightweight Hotspots, Advanced Analysis

Uses OS interrupts Uses the on chip Performance Monitoring

Unit (PMU)

Collects from a single process tree Collect system wide or from a

single process tree.

~10ms default resolution ~1ms default resolution (finer granularity - finds small functions)

Collect on both Intel® and compatible

processors

Requires a genuine Intel® processor for

collection

Call stacks show calling sequence New! Optionally collect call stacks

Works in virtual environments Works in virtual environments only when

supported by the VM

(e.g., vSphere* 5.1)

No driver required Requires a driver

No special recompiles No special recompiles No special recompiles No special recompiles ---- C, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, AssemblyC, C++, C#, Fortran, Java, Assembly

Page 6: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Microarchitecture basics

6

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute RetireRetireRetireRetire

• Classic 4-stage pipeline depicted here.

• Memory not shown.

• Pipeline on current processors capable of speculative

and out of order execution.

Page 7: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Intuitive approach to EBS

• Use a small list of metrics to monitor level of optimization

• Example 1: Cycles per instruction (CPI)

• Example 2: Instruction retirement ratio

� m instructions issued n retired

� Retirement ratio = n/m

� % executed but not retired = (1 – n/m)*100

7Intel Confidential5/30/2014

Page 8: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Microarchitecture Review

8

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit

The traditional 5-stage pipeline. Pipeline on current

processors capable of out of order execution.

Page 9: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Microarchitecture Review

9

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit

The traditional 5-stage pipeline. Pipeline on current

processors capable of out of order execution.

Page 10: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Intel® Software Conference 2014Microarchitecture Review

10

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit

FrontFrontFrontFront----EndEndEndEnd

The front-end fetches instructions IN ORDER, decodes them into u-ops(micro-operations), and sends the u-ops to the back-end.

Page 11: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Microarchitecture Review

11

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit

FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd

The back-end receives u-ops, executes them OUT OF ORDER, accesses memory as needed, and commits results to memory

IN ORDER.

Page 12: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Microarchitecture Review

12

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit

FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd

AllocationAllocationAllocationAllocation

Allocation is the point where u-ops transfer from the front-end to the back-end. The front-end can allocate 4

u-ops per cycle.

Page 13: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Microarchitecture Review

13

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit

FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd

AllocationAllocationAllocationAllocation RetirementRetirementRetirementRetirement

Retirement is the point where u-ops leave the back-end. The back-end can retire 4 u-ops per cycle.

Page 14: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

And a New Term: the Pipeline Slot

14

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit

FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd

4 Potential4 Potential4 Potential4 PotentialAllocations Allocations Allocations Allocations per Cycleper Cycleper Cycleper Cycle

4 Potential4 Potential4 Potential4 PotentialRetirementsRetirementsRetirementsRetirementsper Cycleper Cycleper Cycleper Cycle

In reality, there are many queues, buffers, and pieces of logicthroughout the pipeline to allow up to 4 allocations and 4

retirements per cycle.

Page 15: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

And a New Term: the Pipeline Slot

15

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit

FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd

4 Potential4 Potential4 Potential4 PotentialAllocations Allocations Allocations Allocations per Cycleper Cycleper Cycleper Cycle

4 Potential4 Potential4 Potential4 PotentialRetirementsRetirementsRetirementsRetirementsper Cycleper Cycleper Cycleper Cycle

The “Pipeline Slot” is an abstraction representing all theresources needed to move one u-op through the pipeline.

Page 16: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

ExecuteExecuteExecuteExecute

And a New Term: the Pipeline Slot

16

FetchFetchFetchFetch DecodeDecodeDecodeDecode MemoryMemoryMemoryMemory CommitCommitCommitCommit

FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd

There are 4 Pipeline Slots available every cycle.

S1

S2

S3

S4

Page 17: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

And a New Term: the Pipeline Slot

17

FetchFetchFetchFetch DecodeDecodeDecodeDecode ExecuteExecuteExecuteExecute MemoryMemoryMemoryMemory CommitCommitCommitCommit

FrontFrontFrontFront----EndEndEndEnd BackBackBackBack----EndEndEndEnd

Pipeline slots are filled with u-ops that travel from allocationto retirement over multiple cycles.

S1

S2

S3

S4

S1

S2

S3

S4

S1

S2

S3

S4

S1

S2

S3

S4

S1

S2

S3

S4

Page 18: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Cycles Per Instruction (CPI), a standard measure, has some special kinksFor multi-core processors, CPI can get as low as 0.25 cycles per instructions with current Intel processors.

Normally, something below CPI < ~1.0 is targeted for better performances.

Some would suggest CPI must be targeted around ~0.75 to 0.50.

But is this correct to any architecture?

18

Page 19: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Cycles Per Instruction (CPI), a standard measure, has some special kinks• Threads on each Intel® Xeon™ Phi core share a clock

� If all 4 HW threads are active, each gets ¼ total cycles

• Multi-stage instruction decode requires two threads to utilize the whole core – one thread only gets half

• With two ops/per cycle (U-V-pipe dual issue):

• To get thread CPI, multiply by the active threads

19

Threads per Threads per Threads per Threads per CoreCoreCoreCore

BestBestBestBest CPI CPI CPI CPI per per per per CoreCoreCoreCore

1111 1.02222 0.53333 0.54444 0.5

Threads per Threads per Threads per Threads per CoreCoreCoreCore

BestBestBestBest CPI CPI CPI CPI per per per per CoreCoreCoreCore

Best CPI Best CPI Best CPI Best CPI per Threadper Threadper Threadper Thread

1 x1 x1 x1 x 1.0 = 1.02 x2 x2 x2 x 0.5 = 1.03 x3 x3 x3 x 0.5 = 1.54 x4 x4 x4 x 0.5 = 2.0

Page 20: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

The Top-Down Characterization

What is it?

The Top-Down Characterization is:

• A new way to organize and use processor events to identify the real hardware bottlenecks in systems/applications

• Based on PMU events specifically designed for this task

• Integrated into Intel® VTune Amplifier XE for Core

• Available on Intel® Microarchitecture code named Sandy Bridge and newer

20

Page 21: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

The Top-Down Characterization

Each pipeline slot on each cycle is classified into 1 of 4 categories.

For each slot on each cycle:

21

Page 22: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

The Top-Down Characterization

22

• Sum to 1.0

• Unit is “Percentage of total Pipeline Slots”

• This is the core of the new Top-Down characterization

• Each category is further broken down depending on available events

Page 23: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

23

Back-EndFront-End

Latency BandwithMemoryBoundMemoryBound

Core BoundCore Bound

L1

DRAM

Remote

DRAM Local ou Remote

L2

L3

DIV ActiveDIV

Active

Port Utilization

Port Utilization

0 .. 3 ports

Store BoundStore Bound

ITLBITLBOverhead

ICacheICacheMisses

DSB Switches

Branch Resteers

Retiring Bad Speculation

Branch MispredictBranch

MispredictMachine Clears

Machine Clears

General Microcode SequencerMicrocode Sequencer

DSBMITE

Issues breakdown

Page 24: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Examples of Metrics (Xeon™ Phi)

24

Page 25: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Problem Area: L1 Cache Usage

• Significantly affects data access latency and therefore application performance

• Tuning Suggestions:

� Software prefetching

� Tile/block data access for cache size

� Use streaming stores

� If using 4K access stride, may be experiencing conflict misses

� Examine Compiler prefetching (Compiler-generated L1 prefetches should not miss)

25

MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif

L1 Misses

DATA_READ_MISS_OR_WRITE_MISS + L1_DATA_HIT_INFLIGHT_PF1

L1 Hit Rate

(DATA_READ_OR_WRITE – L1 Misses) / DATA_READ_OR_WRITE

< 95%

Page 26: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Problem Area: Data Access Latency

• Significantly affects application performance

• Tuning Suggestions:

� Software prefetching

� Tile/block data access for cache size

� Use streaming stores

� Check cache locality – turn off prefetching and use CACHE_FILL events - reduce sharing if needed/possible

� If using 64K access stride, may be experiencing conflict misses

26

MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif

Estimated Latency Impact

(CPU_CLK_UNHALTED– EXEC_STAGE_CYCLES– DATA_READ_OR_WRITE)/ DATA_READ_OR_WRITE_MISS

>145

Page 27: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Problem Area: TLB Usage

• Also affects data access latency and therefore application performance

• Tuning Suggestions:

� Improve cache usage & data access latency

� If L1 TLB miss/L2 TLB miss is high, try using large pages

� For loops with multiple streams, try splitting into multiple loops

� If data access stride is a large power of 2, consider padding between arrays by one 4 KB page

27

MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestInvestInvestInvest----igateigateigateigate ifififif

L1 TLB miss ratio DATA_PAGE_WALK/DATA_READ_OR_WRITE > 1%

L2 TLB miss ratio LONG_DATA_PAGE_WALK / DATA_READ_OR_WRITE

> .1%

L1 TLB misses per L2 TLB miss

DATA_PAGE_WALK / LONG_DATA_PAGE_WALK > 100x

Page 28: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Problem Area: VPU Usage

• Indicates whether an application is vectorized successfully and efficiently

• Tuning Suggestions:

� Use the Compiler vectorization report!

� For data dependencies preventing vectorization, try using Intel® Cilk™ Plus #pragma SIMD (if safe!)

� Align data and tell the Compiler!

� Re-structure code if possible: Array notations, AOS->SOA

28

MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif

Vectorization Intensity

VPU_ELEMENTS_ACTIVE / VPU_INSTRUCTIONS_EXECUTED

<8 (DP), <16(SP)

Page 29: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Problem Area: Memory Bandwidth

• Can increase data latency in the system or become a performance bottleneck

• Tuning Suggestions:

� Improve locality in caches

� Use streaming stores

� Improve software prefetching

29

MetricMetricMetricMetric FormulaFormulaFormulaFormula InvestigateInvestigateInvestigateInvestigate ifififif

MemoryBandwidth

(UNC_F_CH0_NORMAL_READ + UNC_F_CH0_NORMAL_WRITE+ UNC_F_CH1_NORMAL_READ + UNC_F_CH1_NORMAL_WRITE) * 64/time

< 80GB/sec(practical peak 140GB/sec)

(with 8 memory controllers)

Page 30: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE

30

Page 31: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

DEMO

31

Page 32: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Running the General Exploration Collector

32

2. Select “General

Exploration” for your CPU

architecture

3. Click “Start” to begin

profiling

1. Click “New Analysis” button

Page 33: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

General Exploration Summary

33

Page 34: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

34

Page 35: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

35

Instructions Navigator New Open Properties Instructions Navigator New Open Properties Instructions Navigator New Open Properties Instructions Navigator New Open Properties New Open CompareNew Open CompareNew Open CompareNew Open CompareProject Project Project Project Result Result Result Result

ToolbarToolbarToolbarToolbar

Page 36: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

36

ProjectProjectProjectProject

NavigatorNavigatorNavigatorNavigator

Page 37: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

37

Result DisplayResult DisplayResult DisplayResult Display

TabsTabsTabsTabs

Page 38: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

38

Result Analysis Result Analysis Result Analysis Result Analysis TypeTypeTypeType

Page 39: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

39

Result ViewpointResult ViewpointResult ViewpointResult Viewpoint

Page 40: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

40

Viewpoint Viewpoint Viewpoint Viewpoint AlternatesAlternatesAlternatesAlternates

Page 41: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

41

ResultResultResultResult ComponentsComponentsComponentsComponents

Page 42: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

42

Grid Grid Grid Grid PanePanePanePane

Page 43: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

43

Grid Grid Grid Grid PanePanePanePane

Grouping pullGrouping pullGrouping pullGrouping pull----downdowndowndown

Page 44: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

44

StackStackStackStack

PanePanePanePane

Page 45: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

45

TimelineTimelineTimelineTimeline

Page 46: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

46

Filter/OptionsFilter/OptionsFilter/OptionsFilter/Options

BarBarBarBar

Page 47: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

Intel Confidential47

5/30/2014

Source View / Source View / Source View / Source View /

Per line localizationPer line localizationPer line localizationPer line localization

Page 48: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

Intel Confidential48

5/30/2014

Source View / Source View / Source View / Source View /

View / Hot spot View / Hot spot View / Hot spot View / Hot spot Navigation controlsNavigation controlsNavigation controlsNavigation controls

Page 49: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

Intel Confidential49

5/30/2014

Assembly View / Assembly View / Assembly View / Assembly View /

View / Hot spot View / Hot spot View / Hot spot View / Hot spot Navigation controlsNavigation controlsNavigation controlsNavigation controls

Page 50: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

VTune™ Amplifier XE visualizes performance

Intel Confidential50

5/30/2014

Assembly View / Assembly View / Assembly View / Assembly View /

Assembly Assembly Assembly Assembly groupingsgroupingsgroupingsgroupings

Page 51: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Intel® Software Conference 2014For event collection the coprocessor is treated as a special HW architecture

51

Page 52: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Intel® Software Conference 2014Project properties provides the means to invoke data collection by target type

52

Page 53: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Intel® Software Conference 2014Launch Application serves many uses, from host/offload to native execution

53

Page 54: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Intel® Software Conference 2014Search directories have been reorganized to speed symbol resolution during finalization

54

Notable coprocessor library paths:Notable coprocessor library paths:Notable coprocessor library paths:Notable coprocessor library paths:/opt/mpss/3.2/sysroots/k1om-mpss-Linux/boot/opt/mpss/3.2/sysroots/k1om-mpss-Linux/lib64/opt/intel/composerxe/lib/mic/opt/intel/composerxe/tbb/lib/mic/opt/intel/composerxe/mkl/lib/mic/opt/intel/mpi-rt/4.1.3/mic

Page 55: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Intel® Software Conference 2014General Exploration runs a set of events to drive top-down analysis

55

Page 56: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

For more information on Intel® Xeon

Phi™ and VTune™ Amplifier XE

56

Optimization on the coprocessor: http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-1-optimization

http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding

Coprocessor Performance Monitoring Unit: http://software.intel.com/sites/default/files/forum/278102/intelr-xeon-phitm-pmu-rev1.01.pdf

For general information: http://software.intel.com/mic-developer

Page 57: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Grid is Based on Top-Down

57

Page 58: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Use the Hover Text to Understand Metrics*

*Suggestions welcome: Submit issues if the text isn’t helpful

58

Page 59: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

Event collections on the coprocessor can generate volumes of datadgemm: on 60+ cores

Tip: Use cpu-mask to reduce data set, while maintaining the same accuracy.

59

Page 60: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE

Copyright© Copyright© Copyright© Copyright© 2013, 2013, 2013, 2013, Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.*Other brands and names are the property of their respective owners.

Resources

Top-Down Characterization White Paper

http://software.intel.com/en-us/articles/how-to-tune-applications-using-a-top-down-characterization-of-microarchitectural-issues

Tuning Guides

http://software.intel.com/en-us/articles/processor-specific-performance-analysis-papers

60

Page 61: Methods and practices to analyze the performance of your application with Intel® VTune™ Amplifier XE