Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Code Modernization Workshop
Intel Software Technologies for Developers –What’s new?Ralph de Wargny, Intel Software & Service GroupMay 2015
Intel Software & Services Group
Intel Paragon 1993
Intel Software & Services Group
What are the chances?
3
What are the chances a code written on this CPU(probably in FORTRAN or C) will work well on these?
Multi-core & SIMD
Many-core & SIMD
Intel Software & Services Group
Why Code Modernization?
4
[in HPC] the “hardware first” ethic is changing. Hardware retains the glamour, but there is
now the stark realization that the newest parallel supercomputers will not realize their full potential without reengineering the software code to
efficiently divide computational problems among the thousands of processors that comprise next-generation many-core computing platforms.
This process is referred to as parallelization, code optimization or code modernization.
As systems move toward exascale levels of performance, the problem of outdated code will only grow in urgency.
From: http://www.scientificcomputing.com/articles/2014/12/hpc-community-
experts-weigh-code-modernization
Intel Software & Services Group
3DTri-Gate
Hi-KMetalGate
Executing to Moore’s Law
Predictable Silicon Track Record – well and alive at Intel.Enabling new devices with higher performance and functionality while controlling power, cost, and size
Transforming the Economics of HPC
14nm
201310nm
R&D**
**Future options are forecasts and subject to change without notice.
7nm
R&D**
2nd
GenTri-Gate
Intel Software & Services Group
Intel® Xeon®
processor
64-bit
Intel® Xeon®
processor
5100 series
Intel® Xeon®
processor
5500 series
Intel® Xeon®
processor
5600 series
Intel® Xeon®
processor code-named
Sandy Bridge
EP
Intel® Xeon®
processor code-named
Ivy Bridge
EP
Intel® Xeon®
processor code-named
Haswell
EP
Core(s) 1 2 4 6 8 12 18
Threads 2 2 8 12 16 24 36
SIMD Width 128 128 128 128 256 256 256
Intel® Xeon Phi™
coprocessor
Knights
Corner
Intel® Xeon Phi™
processor &
coprocessor
Knights
Landing1
61 60+
244 240+
512 512
More cores More Threads Wider vectors
*Product specification for launched and shipped products available on ark.intel.com. 1. Not launched or in planning.
Parallel is the Path ForwardIntel® Xeon® and Intel® Xeon Phi™ Product Families are both going parallel
6
Intel Software & Services Group
0
20.000
40.000
60.000
80.000
100.000
120.000
140.000
160.000
Op
tio
ns P
er
Sec
Binomial Options SP (Higher is Better)
How much potential lies untapped today?
2012Intel® Xeon™
Processor
E5-2600 family formerly
codenamed
Sandy Bridge
2013Intel® Xeon™
Processor
E5-2600 v2 family formerly
codenamed
Ivy Bridge
2010Intel® Xeon™
Processor
X5680formerly
codenamed
Westmere
2007Intel® Xeon™
Processor
X5472formerly
codenamed
Harpertown
2009Intel® Xeon™
Processor
X5570formerly
codenamed
Nehalem
2014Intel® Xeon™
Processor
E5-2600 v3 family formerly
codenamed
Haswell
179x
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance
Parallelized Vectorized
Scalar
Single Thread
Single Thread Scalar
7
Parallel + Vectorized is much faster than either one alone
Intel Software & Services Group
SIMDVectorization
CORESMulti-Threading
NODESMessaging
Multi-Core(CPU)
Many-Core(CPU)
Core
Nodes + Fabric(CLUSTER)
Performance Technologies - Parallelism on all Levels
Intel Software & Services Group
Knights
Corner
1 TFLOPS(peak F.P.-DP)
Knights
Landing
3+ TFLOPS(peak F.P.-DP)
Knights
Hill
3rd Generation
Intel® Xeon Phi™
Product Family
2nd Generation
Intel® Omni-Path
Architecture
10nm
Process Technology
many more
card based
systems
H2’15 First
Commercial
Systems
>50
Systems Provider
expected1
+
Bootable processor
On-package high BW memory
Integrated Omni-Path fabric
>100 PFLOPS customer system compute commits to-data1
Intel® Xeon Phi™
Coprocessor
Applications and
Solutions Catalog
1Intel internal estimate
Intel® Xeon Phi™ Product FamilyIndustry and User Momentum
Intel Software & Services Group
Server Processor
DDR4
high performanceon-package
memory
up to 16GB
~5x STREAMperformanceover DDR4
NUMAsupporton-package
Intel® Omni-Path Fabric
60+ cores
3+ TFLOPS DP peakup to 3x single thread
2-D core meshSMP cache coherency
Integr. PCIe 3.0
Binary compatible withIntel® Xeon® processors
in partnership with
Cores based on Intel® Atom™(Silvermont) microarchitecturewith HPC enhancements:
14nmAVX-512: 512-bit SIMD (VPU) 4 threads/coredeep out-of-order buffersgather/scatterbetter branch predictionhigher cache bandwidth. . .
high capacityhigh bandwidth
Potential future options, subject to change without notice. Codenames.All timeframes, features, products and dates are preliminary forecasts and subject to change without further notification.
Knights LandingNext Generation Intel® Xeon Phi™
Intel Software & Services Group
2015 2016 FUTURE
Forecast and Estimations, in Planning & Targets
. . .
Not drawn to scale, for illustration only. Potential future options, subject to change without notice. Codenames, for illustration only.
All timeframes, features, products and dates are targets and preliminary forecasts and subject to change without further notification.
. . .
XEON® E5
XEON PHI™
FABRIC
. . .100Gb
14nm
AVX-512
MCDRAM DDR4
PCIe3 Omni-Path 1 on-pckg option
Knights Landing10nm
Knights Hill
Omni-Path 2
Future Knights
Omni-Path Gen2
Future Omni-Path100Gb/s
PSM SW-Stack
up to 48-port switches
Silicon Photonics
Omni-Path Gen140Gb
80Gb Dual-Rail
PSM SW-Stack
True Scale
22nm Coprocessor
KNI up to 61c
GDDR5
PCIe Card
Knights Corner
22nm up to 18c
AVX-2
DDR4
PCIe3
Haswell-EP14nm
AVX-2
DDR4
PCIe3
Broadwell-EP≤14nm
Future Xeon-EP
Intel HPC Midrange Roadmap
Intel Software & Services Group
128b
512b
256b
MulticoreMany-Core
Thread/Task-Parallelism Process-Parallelism
Message Passing
MPI
IP-based
Multi-Threading
OpenMP*
TBB, Cilk™ Plus
OpenCL
pthreads
Vectorization
Automatic
Directives/Pragmas
Libraries
SIMD
Cluster
. . .
Node
Data-Parallelism
2015
Professional Edition
Portable & Scalable Parallel ProgrammingOn a Higher Abstracted Level
2015
Professional Edition
2015
Cluster Edition
Intel Software & Services Group
(c) 2013 Jim Jeffers and James Reinders.
Single software architecture valid for all Intel hardware targets
Standard-based Intel parallel programming modelsIntel® Parallel Studio XE 2015: C/C++/Fortran - OpenMP - MPI
Intel Software & Services Group
Technical Computing
& PerformanceResponsiveness Embedded System Web Multi-Platform
What are you developing software for?
Video
Sciences / HPC
Enterprise apps
Big Data
Servers / Clusters
Performance through
Parallel Processing
• Vectorization
• Threading
• Message Passing
Encoding / Decoding
Streaming
Performance through
hardware acceleration
• MPEG4, etc.
• HEVC
Cross-Platform
Multimedia performance
• Android
• Windows
• OS X
Internet of Things
Hardware-based
embedded programming
• BIOS/UEFI/FW
• Kernel/OS
• Drivers
• Embedded
Applications
Cross Device – Multiple
APP stores
• Mobile Apps
• HTML5 technology
• Write - once
Intel Software & Services Group15
Technical Computing
& Performance
application
performance,
scalability & reliability
Responsiveness
Immersive interactivity
for multimedia apps
Embedded System
Fast, efficient
embedded & mobile
devices/systems
Web Multi-Platform
Deploy apps on
multiple platforms
using one codebase
Intel® Software Development Products
Video streaming
performance
Video
Intel® Parallel Studio XE 2016What’s New
Launching Aug 25th 2015
Intel Software & Services Group
Use One Software Architecture Today. Scale Forward Tomorrow.
ClusterCode
CompilerLibrariesParallel Models
Many-core
Intel®MIC
ArchitectureCo-processor
Multicore
MulticoreCPU
Code Reusability
Intel Software & Services Group
Faster Code FasterIntel® Parallel Studio XE 2015
•Simplifies building, debugging and tuning parallel code
•Integrated C++ and Fortran tool suite
•Drops into development environment e.g., Visual Studio*
• Windows*, Linux* & OS X*
1818
Faster Code
Performance without compromise through optimizations for current and future processors• Compilers• Libraries
Profilers simplify tuning parallel code for best performance
Code Faster
Compilers with high level parallelism features including OpenMP* 4.0
Parallelism prototyping assistant
Advanced parallel models and libraries, simple update with relink
Graphical profilers visualize bottlenecks
Memory, thread and MPI error checkers help remove errors
Intel Software & Services Group
Intel® Parallel Studio XE 2016 Suites
Vectorization – Boost Performance By Utilizing Vector Instructions / Units
Intel® Advisor XE - Vectorization Advisor identifies new vectorization opportunities as well as
improvements to existing vectorization and highlights them in your code. It makes actionable coding
recommendations to boost performance and estimates the speedup.
Scalable MPI Analysis– Fast & Lightweight Analysis for 32K+ Ranks
Intel® Trace Analyzer and Collector add MPI Performance Snapshot feature for easy to use,
scalable MPI statistics collection and analysis of large MPI jobs to identify areas for improvement.
Big Data Analytics – Easily Build IA Optimized Data Analytics Application
Intel® Data Analytics Acceleration Library (DAAL) will help data scientists speed through big
data challenges with optimized IA functions.
Standards – Scaling Development Efforts Forward
Supporting the evolution of industry standards of OpenMP, MPI, TBB, Fortran and C++ Intel®
Compilers & performance libraries
Launching Aug 25th, 2015
19
Intel Software & Services Group
Intel® Parallel Studio XE 2016 Suites
Vectorization – Boost Performance By Utilizing Vector Instructions / Units
Intel® Advisor XE - Vectorization Advisor identifies new vectorization opportunities as well as
improvements to existing vectorization and highlights them in your code. It makes actionable coding
recommendations to boost performance and estimates the speedup.
Scalable MPI Analysis– Fast & Lightweight Analysis for 32K+ Ranks
Intel® Trace Analyzer and Collector add MPI Performance Snapshot feature for easy to use,
scalable MPI statistics collection and analysis of large MPI jobs to identify areas for improvement.
Big Data Analytics – Easily Build IA Optimized Data Analytics Application
Intel® Data Analytics Acceleration Library (DAAL) will help data scientists speed through big
data challenges with optimized IA functions.
Standards – Scaling Development Efforts Forward
Supporting the evolution of industry standards of OpenMP, MPI, TBB, Fortran and C++ Intel®
Compilers & performance libraries
Launching Aug 25th, 2015
20
Intel Software & Services Group
Intel® Advisor XE - Vectorization Advisor Data Driven
Vectorization Design
21
Have you: Recompiled with AVX2, but seen little benefit?
Wondered where to start adding vectorization?
Recoded intrinsics for each new architecture?
Struggled with cryptic compiler vectorization messages?
Breakthrough for vectorization design What vectorization will pay off the most?
What is blocking vectorization and why?
Are my loops vector friendly?
Will reorganizing data increase performance?
Is it safe to just use pragma simd?
More PerformanceFewer Machine Dependencies
Intel Software & Services Group22
Intel® Advisor XE – Vectorization AdvisorProvides the data you need for high impact vectorization
Compiler diagnostics + Performance Data = All the data you need in one place
Find “hot” un-vectorized or “under vectorized” loops.
Trip counts
Recommendations – How do I fix it?
Correctness via dependency analysis
Is it safe to vectorize?
Memory Access Patterns analysis
Unit stride vs Non-unit stride access, Unaligned memory access, etc.
Intel Software & Services Group23
Data Driven Threading DesignIntel® Advisor XE – Thread Prototyping
Have you: Tried threading an app, but seen little
performance benefit?
Hit a “scalability barrier”? Performance gains level off as you add cores?
Delayed a release that adds threading because of synchronization errors?
Breakthrough for threading design: Quickly prototype multiple options
Project scaling on larger systems
Find synchronization errors before implementing threading
Separate design and implementation -Design without disrupting development
Add Parallelism with Less Effort, Less Risk and More Impacthttp://intel.ly/advisor-xe
Part of Intel® Parallel Studio
For Windows* and Linux* From $1,599
“Intel® Advisor XE has allowed us to quickly prototype ideas for parallelism, saving developer time and effort”
Simon HammondSenior Technical StaffSandia National Laboratories
Intel Software & Services Group24
Compiler diagnostics + Performance DataFind “hot” un-vectorized or “under vectorized” loops
All of the information you require to vectorize available on one screen!
Intel Software & Services Group
Gives estimated expected gain!
25
Gain estimates – Gives recommendations and the gain you can expect by using a different vector instruction or rewriting the control flow of your program.
Intel Software & Services Group
Convince the compiler to vectorize Unvectorized loops / “under vectorized” loops
26
• Assumed dependencies• Control structures
preventing vectorization.• Rewrite loops to
vectorize – remove conditions, breaks and returns and many other techniques.
Intel Software & Services Group27
Summary: Vector Advisor4 Analysis Features for Efficient Vectorization
1. Compiler diagnostics with Performance Data 2. Recommendations on how to improve vectorization
4. Memory Access Patterns Analysis3. Correctness Dependency Analysis
Intel Software & Services Group
Intel® Parallel Studio XE 2015
Composer EditionIntel® Parallel Studio XE 2015
Professional EditionIntel® Parallel Studio XE 2015
Cluster Edition
Intel® C++ Compiler
Intel® Fortran Compiler
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives
Intel® Math Kernel Library
Intel® Cilk™ Plus
Intel® OpenMP*
Intel® C++ Compiler
Intel® Fortran Compiler
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives
Intel® Math Kernel Library
Intel® Cilk™ Plus
Intel® OpenMP*
Intel® C++ Compiler
Intel® Fortran Compiler
Intel® Threading Building Blocks
Intel® Integrated Performance Primitives
Intel® Math Kernel Library
Intel® Cilk™ Plus
Intel® OpenMP*
Intel® Advisor XE Intel® Inspector XE
Intel® VTune™ Amplifier XE
Intel® Advisor XE Intel® Inspector XE
Intel® VTune™ Amplifier XE
Intel® MPI Library
Intel® Trace Analyzer and Collector
For more information: http://intel.ly/perf-tools
28
Intel® Advisor XE is part of Intel® Parallel Studio XE
Intel Software & Services Group
Intel® Parallel Studio XE 2016 Suites
Vectorization – Boost Performance By Utilizing Vector Instructions / Units
Intel® Advisor XE - Vectorization Advisor identifies new vectorization opportunities as well as
improvements to existing vectorization and highlights them in your code. It makes actionable coding
recommendations to boost performance and estimates the speedup.
Scalable MPI Analysis– Fast & Lightweight Analysis for 32K+ Ranks
Intel® Trace Analyzer and Collector add MPI Performance Snapshot feature for easy to use,
scalable MPI statistics collection and analysis of large MPI jobs to identify areas for improvement.
Big Data Analytics – Easily Build IA Optimized Data Analytics Application
Intel® Data Analytics Acceleration Library (DAAL) will help data scientists speed through big
data challenges with optimized IA functions.
Standards – Scaling Development Efforts Forward
Supporting the evolution of industry standards of OpenMP, MPI, TBB, Fortran and C++ Intel®
Compilers & performance libraries
Launching Aug 25th, 2015
29
Intel Software & Services Group
Lightweight – Low overhead profiling for 32K+ Ranks
Scalability- Performance variation at scale can be detected sooner
Identifying Key Metrics –Shows PAPI counters and MPI/OpenMP imbalances
MPI Performance Snapshot
Intel Software & Services Group
Intel® Cluster Checker 3.0 – What’s New
Data Collectors
Diagnostic Data Analysis Checking for Issues
Suggesting Remedies
Cluster Database Expert System
Results
Provides Assistance
Cluster Health Checks(on-demand, background)
Diagnoses and remedies for common issues
Compliance with Intel® Cluster Ready spec
Simplifies Cluster Computing Platforms
Reduces need for specialized expertise
Enables cluster health checks by applications
Extensible and customizable, API
Intel Software & Services Group
Intel® Parallel Studio XE 2016 Suites
Vectorization – Boost Performance By Utilizing Vector Instructions / Units
Intel® Advisor XE - Vectorization Advisor identifies new vectorization opportunities as well as
improvements to existing vectorization and highlights them in your code. It makes actionable coding
recommendations to boost performance and estimates the speedup.
Scalable MPI Analysis– Fast & Lightweight Analysis for 32K+ Ranks
Intel® Trace Analyzer and Collector add MPI Performance Snapshot feature for easy to use,
scalable MPI statistics collection and analysis of large MPI jobs to identify areas for improvement.
Big Data Analytics – Easily Build IA Optimized Data Analytics Application
Intel® Data Analytics Acceleration Library (DAAL) will help data scientists speed through big
data challenges with optimized IA functions.
Standards – Scaling Development Efforts Forward
Supporting the evolution of industry standards of OpenMP, MPI, TBB, Fortran and C++ Intel®
Compilers & performance libraries
Launching Aug 25th, 2015
32
Intel Software & Services Group
What is Intel DAAL?New library targeting data analytics market
Customers: analytics solution providers, system integrators, and application developers (FSI, Telco, Retail, Grid, etc.)
Key benefits: improved time-to-value, forward-scaling performance and parallelism on IA, advanced analytics building blocks
Key features
Building blocks highly optimized for IA to support all data analysis stages.
Support batch, streaming, and distributed processing with easy connectors to popular platforms (Hadoop, Spark) and tools (R, Python, Matlab).
Flexible interfaces for handling different data sources (CSV, MySQL, HDFS, RDD (Spark)).
Rich set of operations to handle sparse and noisy data.
C++ and Java APIs.
6 releases of Tech Preview in 2014.
First Beta in Feb’15. First gold release in Aug’15.
Analysis
•PCA•Variance-Covariance Matrix
•Distances
•Matrix decompositions (SVD, QR, Cholesky)
•EM for GMM•Uni-/multi-variate outlier detection
•Statistical moments
Machine learning
• Linear regression• Apriori
• K-Means clustering
• Naïve Bayes
• LogitBoost, BrownBoost, AdaBoost• SVM
Intel® Data Analytics Acceleration Library – a C++ and Java API library of optimized analytics building blocks for all data analysis stages, from data acquisition to data mining and machine learning. Essential for engineering high performance Big Data applications.
Important features offered in the initial Beta
• Data layouts: AOS, SOA, homogeneous, CSR• Data sources: csv, MySQL, HDFS/RDD• Compression/decompression: ZLIB, LZO, RLE, BZIP2• Serialization/deserialization
Data Processing
Optimized analytics building blocks for all data analysis stages, from data acquisition to data mining
and machine learning.
Data Modeling
Data structures for model representation, and operations to derive model-based predictions and
conclusions.
Data Management
Interfaces for data representation and access. Connectors to a variety of data sources and data formats, such HDFS, SQL, CSV, ARFF, and user-
defined data source/format.
Data Sources
Numeric Tables
Outliers Detection
Compression / Decompression
Serialization / Deserialization
Intel Software & Services Group
Data Analytics in the Age of Big Data
Problem: Big data needs high performance computing. Many big data applications leave performance at the table – Not optimized for underlying hardware.
Solution: A performance library provides building blocks to be easily integrated into big data analytics workflow.
Volume
Velocity Variety
Value
Intel Software & Services Group
Intel® Data Analytics Acceleration Library
An industry leading end-to-end IA-based data analytics acceleration library of fundamental algorithms covering all data analysis stages.
(De-)CompressionOutlier detection
PCAStatistical momentsVar-Covar matrixMatrix decompositionsAprioriK-Means ClusteringEM for GMM
Linear regressionDecision treesNaïve BayesMulti-Class SVMBoosting
Pre-processing Transformation Analysis Modeling Decision Making
Sci
en
tifi
c/E
ng
ine
eri
ng
We
b/S
oci
al
Bu
sin
ess
Validation
Intel Software & Services Group
Who Should use Intel DAAL?
•Software developers who needs optimized implementations of fundamental numerical algorithms in their analytics application, but do not have resource/expertise to manually do the optimizations themselves.
•Data scientists who build and execute math models for domain specific knowledge discovering, and need to speed up the performance critical parts of their models.
•Data analytics ISV’s who want to gain competitive advantages by making their solutions run faster on Intel architectures.
•Big Data system integrators who want to beef up their product portfolio by providing performance-enhanced alternatives of popular open-source analytics tools.
Intel Software & Services Group
What Are We Releasing?
Intel DAAL 2016 Beta
Available to selected partners in Feb 2015.
Public beta starting in April 2015.
Intel DAAL 2016 product release
Available in Q3 2015.
• Support IA-32 and Intel64 architectures.
• C++, Java APIs.
• Static and dynamic linking.
• A standalone library, and also bundled in Intel PSXE Cluster Edition 2016.
Note: Bundled version is not available on OS* X.
Intel Software & Services Group
Intel® Parallel Studio XE 2016 Suites
Vectorization – Boost Performance By Utilizing Vector Instructions / Units
Intel® Advisor XE - Vectorization Advisor identifies new vectorization opportunities as well as
improvements to existing vectorization and highlights them in your code. It makes actionable coding
recommendations to boost performance and estimates the speedup.
Scalable MPI Analysis– Fast & Lightweight Analysis for 32K+ Ranks
Intel® Trace Analyzer and Collector add MPI Performance Snapshot feature for easy to use,
scalable MPI statistics collection and analysis of large MPI jobs to identify areas for improvement.
Big Data Analytics – Easily Build IA Optimized Data Analytics Application
Intel® Data Analytics Acceleration Library (DAAL) will help data scientists speed through big
data challenges with optimized IA functions.
Standards – Scaling Development Efforts Forward
Supporting the evolution of industry standards of OpenMP, MPI, TBB, Fortran and C++ Intel®
Compilers & performance libraries
Launching Aug 25th, 2015
38
Intel Software & Services Group
Intel® C/C++ and Fortran Compilers 16.0
What’s New
39
• More of C++ 2014, generic lambdas, member initializers and aggregates
• More of C11, _Static_assert, _Generic, _Noreturn, and more
• OpenMP 4.0 C++ User Defined Reductions, Fortran Array Reductions
• OpenMP 4.1 asynchronous offloading, simdlen, simd ordered
• F2008 Submodules, Impure Elemental Functions
• F2015 TYPE(*), DIMENSION(..), RANK intrinsic, attributes for args with BIND
• Significant improvement in alignment analysis, vectorization robustness
• Much improved Neighboring Gather optimization
Intel Software & Services Group
Additional Sparse Matrix Vector Multiplication API new two stage API for Sparse BLAS level 2 and 3 routines
MKL MPI wrappers all MPI implementations are API-compatible but MPI implementations are not ABI-compatible
MKLMPI wrapper solves this problem by providing an MPI-independent ABI to MKL
Support For Small Matrix multiplication a single call executes independent ?GEMM operation simultaneously
Support for Philox4x35 and ARS5 RNG two new pseudorandom number generators with a period of 2^128 are highly optimized for multithreaded
environment
Sparse Solver SMP improvements significantly improved overall scalability for Intel Xeon Phi coprocessors and scalability of the solving step for
Intel Xeon processors
40
Intel® MKL 11.3
What’s New
Intel Software & Services Group
Intel® VTune™ Amplifier XE 2016 Beta
What’s New
41
New OS and IDE support: Visual Studio* 2015 & Windows* 10 Threshold
GPU profiling
GPU Architecture Annotation Diagram
GPU profiling on Linux (Open CL, Media SDK)
Microarchitecture tuning
General Exploration analysis with confidence indication
Driverless ‘perf’ EBS with stacks
Intel Software & Services Group
Intel® VTune™ Amplifier XE 2016 Beta
Improved Hybrid Support
42
Intel OpenMP analysis enhancements
Precise trace-based imbalance calculation that is especially useful for profiling of small region instances
Classification and issue highlighting of potential gains, e.g., imbalance, lock contention, creation overhead, etc.
Detailed analysis of barrier-to-barrier region segments
MPI+OpenMP: multi-rank analysis on a compute node
Per-rank OpenMP potential gain and serial time metrics
Per-rank Intel MPI communication busy wait time detection
Intel Software & Services Group43
Technical Computing
& Performance
application
performance,
scalability & reliability
Responsiveness
Immersive interactivity
for multimedia apps
Embedded System
Fast, efficient
embedded & mobile
devices/systems
Web Multi-Platform
Deploy apps on
multiple platforms
using one codebase
Intel® Software Development Products
Video streaming
performance
Video
Intel Software & Services Group
Recommended books
High performance parallelism pearls: multi-core and many-core approaches, by James Reinders and Jim Jeffers, Morgan Kaufmann, 2014
Introduction to high-performance scientific computing (2nd edition), by Victor Eijkhout, Lulu, 2015
Introduction to high performance computing for scientists and engineers, by Georg Hager and Gerhard
Wellein, CRC Press, 2011
Parallel programming with Intel® Parallel Studio XE, by Stephen Blair-Chappell and Andrew Stokes, Wrox
press, 2012
Thank you!