Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
From the latency to the throughput age
Prof. Jesús LabartaDirector Computer Science Dept (BSC)UPC
ETP4HPC Post-H2020 HPC Vision
Frankfurt, June 24th 2018
2
To exascale ... and beyond
3
VisionThe multicore and memory revolution– ISA leak … – Plethora of architectures
• Heterogeneity• Memory hierarchies
Complexity + variability = Divergence– Between our mental models and actual
system behavior
ApplicationsApplications
ISA / API
The power wall made us go multicore and the ISA interface to leak our world is shaking
What programmers need ? HOPE !!!
4
Vision• … similar effect at system level/coarse grain
• Plethora of architectures• Heterogeneity• Memory hierarchies
• New usage practices• Online simulation, analytics and visualization• Interactive supercomputing, response time• Value based computing• Urgent computing
• Important• Integration of concurrency and data• Dynamic resource sharing
data1
Simulation1
Simul2
dat a2
dat a2
BSC vision. BDEC. Fukuoka. Feb 2014
Evolution vs. revolution
• Revolutions• Change of mindset before after
• Do we think outside the box ?
6
Do we think outside the box ?• Very strong walls in the HPC box !!!
7
Do we think outside the box ?• Very strong walls in the HPC box !!!• Sometimes we try to blow them up
8
Do we think outside the box ?• Very strong walls in the HPC box !!!• Sometimes we try to blow them up• But the walls are in our mind !!!
9
Do we think outside the box ?
• We do (I may be exaggerating … or may be not that much)
• Proudly show the performances we achieve and not the code we write• Use variables about resources (cores, GPUs)
• omp_get-num_threads(), …• Run sequences of jobs with 5K core because each of them takes 20% less time
than with 2K cores• Believe that overlap == changing sends isends or using one sided calls• Burn million hours to estimate good configuration• Integrate simulation, analytics, visualization in a single MPI binary
10
Do we think outside the box ?
• Do we ?• Interleave processes ?• Think of using MPI + OpenMP with just 1 OpenMP thread ?• Share nodes among jobs ?• Serialize (and overlap) reductions?• Taskify MPI calls to allow their out of order execution?• Spawn packing and unpacking tasks to allow for fast draining of incoming
messages by main process?• Parallelize packing and unpacking of messages? Depending on message size ?
11
Do we think outside the box ?
• Why?• Follow “recommended best practices”• Never thought of ?• Some bad experience never again• I can do it better !!!!!• Dazzled by performance !!
12
All about the mindset• The real parallel programming revolution
• … is in the mindset of programmers• From the latency to the throughput age !!!
• … and can/should be achieved productively• Incrementally• On a standard programming model/language (MPI+OpenMP, Python, …)
• Real revolution, real effort• Issue everywhere. At home first.• Shape minds vs. reshape minds
13
Key aspects
• Actual behavior/Performance analysis• Avoid flying blind !!• Towards insight and understanding of fundamental issues• For application & system developers
• Programming practices and models• Decouple programmer from machine
• Programs to convey ideas to humans … that happen to be executable by machines
• Enable productive/evolutionary/composable approaches• Can we avoid/contain the complexity explosion ?• Dynamic resource sharing
14
Behavior awareness
• A common language about fundamental issues
• Evolution of bottlenecks
• Methodology • 195 studies:
• ~25% industry• Awareness• Opportunity to improve
• And examples how• Co-design input
1515
Behavior awareness15
Tracking scaling behavior of computation regions(Strong scaling MPI+OpenMP example)
16
• Coupled codes• Multiple physics, domains• Compute & I/O
16
Behavior awareness
26.7MB traceEff: 0.43; LB: 0.52; Comm:0.81
1600
cor
es
2.5 sEC-EARTH
Atmosphere
Ocean
1717
Vision in the programming revolution
ISA / API
Applications
Power to the runtime
PM: High-level, clean, abstract interface
General purpose
Decouple
Forget about resources
Minimal & sufficient permeability?
Intelligence&
Resource management
“Reuse & expand” old architectural ideas under
new constraints
1818
Vision in the programming revolution
ISA / API
Special purpose
Must be easy to develop/maintain
Fast prototyping
Applications
Power to the runtime
PM: High-level, clean, abstract interface
DSL1 DSL2 DSL3
19
Integrate concurrency and dataSingle mechanism
Concurrency:Dependences built from data accessesLookahead: About instantiating work
Locality & data managementFrom data accesses
Task based parallel programming
20
Task based parallel programming• Some important features
• Dependences, Lookahead• Taskloops• Nesting• Array sections / Regions• Exploiting malleability:
• Dynamic Load Balance (DLB)• Within App, across apps
• MPI+OpenMP interoperability
• Think global, specify local
21
Towards the throughput age• By
• Express potential concurrency
• Malleability• Dynamic resource
sharing/management
• Configuration independence
• Amount of resources is what really matters
• Side effects• Nx1 can be better than
pure MPI !!!• hope for lazy programmers
Infrastructures for new usage modes• Persistent KVS
• Alternative for parallel programs I/O?• Flexible querying: 3D indexing, Data-thinning
• Need/opportunity of clean integration of concurrency and data
• Within one app• Shared communication space between multiple apps.
• Malleable/Elastic/opportunistic resource management/sharing
23
Impact on architecture ?• High throughput devices
• Long Vectors• Decouple Front end - Back end engines, reduce front end pressure, optimize memory
throughput, explicit locality management• Specialized compute and data motion engines • Tuned numerical precision
• ISA is important• Decouple/hide again hardware details, reuse SW technologies (compilers, OS,…), • Specific instructions?• “limited” number of control flows
• Hierarchical Acceleration• Nesting• Homogenize heterogeneity
• Runtime aware architectures (RAA)
24
Age before beauty• Behavior (insight/models) before syntax• Detail performance analytics before aggregated profiles• Work instantiation and order before overhead• Malleability before fitted rigid structure• Possibilities before how tos• Elegance before one day shine
25
The challenge• Think of fundamentals, think out of the box
• Revolution: change everything so that nothing changes
• Should we: change as little as possible so that everything is different ?
• Programmers !!!!
• Develop a culture of• Efficiency awareness• Latency throughput mindset• Dynamic sharing of resources
• To exascale … and before