Scalability issues : HPC Applications & Performance...

High Performance Computing | Systems and Technology Group

Scalability issues :HPC Applications & Performance Tools

Chiranjib Sur HPC @ India Systems and Technology Lab

chiranjib.sur@in.ibm.com

Top 500 : Some statistics

Top 500 - Domains

Scalability Performance

Top500 : Systems Top500 : Performance

Source : www.top500.org

35.63% 33.99%

Laboratory astrophysics - computational snapshot

Laboratory Astrophysics

Multi-phased, multi-level Massive computation

Computational challenge !!

Massive ParallelismRequired

Parallel AlgorithmParallel

Language

HardwareArchitecture,Threading,

Interconnects

ParallelEnvironment

CompilersOptimization

&Debuggers

Scalable parallel

File System

Performance Analysis tools - Single place to go !

Scalability challenges – different aspects

High Throughput

Sustained Performance

Scalable High

Performance Computing

70%70%

30%30%

Amdahl's law

If the serial component remain proportionately equal, there is no inherent speedup !

Parallel component is 50x, max speed up is 3.25x

http://en.wikipedia.org/wiki/Amdahl's_law

70%70%

Gustafson's law

If the serial component shrinks in size, as the problem scales, there is opportunity for speedup !

http://en.wikipedia.org/wiki/Gustafson's_Law

High PERFORMANCE or High THROUGHPUT

70%70%

30%30%

Amdahl's law

If the serial component remain proportionately equal, there is no inherent speedup !

http://en.wikipedia.org/wiki/Amdahl's_law

70%70%

Gustafson's law

If the serial component shrinks in size, as the problem scales, there is opportunity for speedup !

http://en.wikipedia.org/wiki/Gustafson's_Law

High PERFORMANCE or High THROUGHPUT

T p=T s

p+T Oh( p)

Parametrization of Scalability

Tp = parallel execution time

Ts = serial execution time

= Overheard

Scalability – algorithm / programming languages

Parallel algorithm

- Most legacy codes are not designed to work in parallel

- Mostly not designed to exploit modern day HPC architecture

Parallel languages

- Legacy codes contains language (version) specific syntaxes

(e.g. dynamic memory in FORTRAN 77)

- Old codes needs major revision to use modern features, e.g. handling of large arrays

- Not so easy to re-write old codes using new languages like X10, UPC etc.

Legacy code – Algorithm - a Case Study

Hardware – Scaling OUT or Scaling UP ?

Scalability – computing platform

Courtesy : Thomas Dunning, http://www.nsca.illinois.edu/BlueWaters 11

Hardware – what to look for ? / how to look for ?

Hardware Thread Management

Usage of multiple lightweight concurrent threads Less switching overhead Addressing the issue of instruction and memory latency

Threading - Random Access to Global Memory

Any thread can read/write any location(s) Sync with the system software Monolithic thread vs blocks (smaller in size) of threads

On-Chip Shared Memory

Efficient managament of Data @ cache Efficient thread communication / cooperation within blocks

Hardware – what to look for ? / how to look for ?

Hardware Thread Management

Usage of multiple lightweight concurrent threads Less switching overhead Addressing the issue of instruction and memory latency

Threading - Random Access to Global Memory

Any thread can read/write any location(s) Sync with the system software Monolithic thread vs blocks (smaller in size) of threads

On-Chip Shared Memory

Efficient managament of Data @ cache Efficient thread communication / cooperation within blocks

O1 O2 O3 O4

Opt level ---->

User Space Kernel Space

IF_LSDDHYP

NSD - VDISK

Network(s)Network Adapter(s) – HFI, IB

Hardware Platforms: pSeries / xSeries

HAL – AIX & Linux

AIX & Linux Verbs

GSMInfra-structure

LAPI – Reliable FIFO, RDMA, Striping,Failover/Recovery, Checkpoint/Restart,Pre-emption, User Space Statistics,Multi-Protocol, Scalability

PNSD / NRTDebug/CommInfrastructure

Eclipse PTP FrameworkPOE Runtime

ParallelDebugger

HPCS ToolkitEclipse Tools

APPLICATION

TCPUDP

SOCKETS

Multi-Link, Superpkt

ParallelESSL

Scalability – system software

Compilers (www.ibm.com/software/awdtools/fortran/xlfortran/library)

Five distinct optimization levels + many additional options

Code generation and tuning for specific hardware chipsets

Interprocedural optimization and inlining using IPA

Profile-directed feedback (PDF) optimization

User-directed optimization with directives and source-level intrinsic functions

Optimization of OpenMP programs and auto-parallelization capabilities to exploit SMP systems

Automatic parallelization of calculations using vector machine instructions and high-performance mathematical libraries

++++++++ .....

OS and Parallel Environment

Scalability – System Software stack

Compilers (www.ibm.com/software/awdtools/fortran/xlfortran/library)

Five distinct optimization levels + many additional options

Code generation and tuning for specific hardware chipsets

Interprocedural optimization and inlining using IPA

Profile-directed feedback (PDF) optimization

User-directed optimization with directives and source-level intrinsic functions

Optimization of OpenMP programs and auto-parallelization capabilities to exploit SMP systems

Automatic parallelization of calculations using vector machine instructions and high-performance mathematical libraries

++++++++ .....

OS and Parallel Environment

Opt level

Parallel Environment – what next ?

Memory -Using Remote Direct Memory Access (RDMA)

Interconnects - RDMA with proper interconnect

Parallel tuned library - Customized

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.pe432.opuse1.doc%2Fam102_scalaperf.html

Data intensive / Task intensive computing – Combining Massive Data parallelism and instruction level parallelism – heterogeneous model ?

Next generation – MPI 3 ..?

The Computing cycle

The Performance Pie

Performance Performance DimensionsDimensions

CPU Performance

MPI Performance

Threading Performance

I/O Performance

What this tool is all about ? – More on next few sessions

What we can do with a tool like this ?

What programming language ? - FORTRAN, C, C++ ...

Which platform we can use ? - Entire range of IBM HPC hardware portfolio

Which operating system ? - AIX & Linux M$

What we mean by Scalable Tools ?

Scalability – Performance Tools

Performance analysis in a nutshell – IBM HPC Toolkit

Hardware Hardware Performance Performance MonitoringMonitoring

Profiling MPI calls

OpenMP

Profiling openMP

directives

I/O analysis and

optimization

Eclipse Plug-in, Eclipse Plug-in, PeekPerf,PeekPerf,

XprofXprof

Visualization

MPI MIO

2 4 8 16 320

NPB 3.3 - Fourier Transform - Class A

NonInstInst

No of procs

Scalability – Performance tools

2 4 8 16 32 64 128 256 5120.0

Timing - ft.A

Exec time (2)Initialization time (4)Overhead (4)

No of procs

Scalability – case studies : Timing and overhead

2 4 8 16 32 64 1280

5000000

10000000

15000000

20000000

25000000

30000000

35000000

40000000

MPI All-to-All communication - ft.A

No of procs

Scalability – case studies : MPI communication

2 4 8 16 320

Average Communication time (MPI) - ft.A

No of Procs

2 4 8 16 32 640

No. of pagefault without I/O - ft.A

No of Procs

2 4 8 16 32 640

Context switch - ft.A

No of procs

Scalability – case studies : Hardware & I/O

Summary : Performance analysis and next ...

What we can do now ?

What we need ?

Summary : Performance analysis and next ...

What we can do now ?

What we need ?

What we are planning to do ?

Next few talks ..

Tomorrow

The team working on performance tools @ IBM

PidadAditya

Praful

Servesh

Chiranjib

Scalability issues : HPC Applications & Performance...

Documents

Top 500 telecom brands 2010

India's top 500 Companies 2014

CENTA TPO 2016: TOP 500 NATIONAL RANK HOLDERS (101-500 ...cdn.buddy4study.com/static/scholarship_docs/doc_download_formC… · CENTA TPO 2016: TOP 500 NATIONAL RANK HOLDERS (101-500)

Top 500 Web Retailers • 2015 Top 500 Guide • Internet ...images.internetretailer.com/IR_Guides/2015_Top500_SampleProfile.pdf · Top 500 Web Retailers • 2015 Top 500 Guide •

Top 500 Japanese Verbs linguajunkie · Top 500 Japanese Verbs linguajunkie.com Hello, If you’re here, then you probably want to learn Japanese – the top 500 Japanese verbs. With

500 top site

Top 500 Pharma

Top 500 Executives

10th IntlConfInternationalSocietyforScientometrics ...eprints.rclis.org/8682/1/ranking_stockholm.pdfTop 500 Top 1000 0 100 200 300 400 500 600 Top 100 Top 500 Top 1000 Figure 2. Webometrics.info

Sony PictureStation ™ Scalability. Counter Top PictureStation ™

Delivering top performance. Outstanding value. Diagnostic ... · Delivering top performance. Outstanding value. Diagnostic scalability. ACUSON X150 Ultrasound System

TOP 500 The Top 500 is based - RISMedia | Real Estate ... · RISMedia’s REAL ESTATE April 2019 67 The Power Broker Report is sponsored by TOP 500 Sales Rank Company State Sales

Location of Top 500 Companies

Top 500 Banking Brands 2013

2011 Top 500 Design Firms

top 500 Russian verbs

CENTA TPO 2015 TOP 500 RANK HOLDERS (Ranks … 500 (101 to 500).pdfCENTA TPO 2015 – TOP 500 RANK HOLDERS (Ranks 101 to 500) Congratulations! Overall Rank Name School/Organization/Other

Top 500 Scholarship

top 500 words

CENTA TPO 2016: TOP 500 NATIONAL RANK HOLDERS (101-500)