View
0
Download
0
Category
Preview:
Citation preview
: A GE , AJ .L : CG A A A
: A GE , AJ CG EMJG A GAA EGK -
GAA , A A H:G Mikito Furuichi, Daisuke Nishiura, ,
E K The University of Electro-CommunicationsE E I
A G I - A AE G A E
Heterogeneous Many Core Project
Ryutaro Himeno�leader, Fluid dynamics�Toshikazu Ebisuzaki (sub-leader, Astronomy)
Kobe univ.(JunnichiroMakino)Astrophysics,middleware
JAMSTEC(Hide Sakaguchi)Tsunami, Earthquake
UEC Tokyo(Tadashi Yamazaki)Neuro science
KEK(Tadashi Ishikawa)Lattice ���Multiple precision accuracy computation
NIG(Ken Kurokawa)Genome analysis
RIKEN(Toshikazu Ebisuzaki)Neuro science, MD, Plasma physics, Fluid dynamics
¥40 million/year, 5 yearsfunded by MEXT, Japan
Background• RIKEN & PEZY Computing/ExaScaler Joint research started in May 2015 and RIKEN
installed “Shoubu” using PEZY-SC.• “Shoubu” got No.1 on Green 500, in June and Nov. in 2015.• “Shoubu” and “Satsuki” installed at Ebisuzaki Lab., RIKEN got No. 1 and No. 2 on
Green 500, in June 2016.• PEZY developed 2nd Generation system PEZY-SC2 and announced it improved
performance and memory bandwidth a lot, uses heterogeneous many core and magnetic field coupling memory, March, 2017.
• 2017�5�JAMSTEC-RIKEN-PEZY-ExaScaler joint research started in May 2017.• “Satsuki" and “Shoubu” will be retrofitted in 2018.• Heterogeneous many core processors are worldwide trend.
– Anton2, MD-GRAPE4, NVIDIA Tegra, Sunway TaihuLight, etc.
Shoubu@RIKEN
Gyoko@JAMSTEC
Satsuki@RIKEN
PEZY-SC PEZY-SC2 PEZY-SC3
developed . ( 36B7 2 ,A 6 98
architecture 9 9 ������������ ������������No. of cores ( (, (
frequency ( 01 01 01Memory bandwidth ) 0/ 4/ 4/
I/O bandwidth 0/ ( 0/ ) 0/FLOPs ) 4 A ( 4 A , 4 A
Energy consumption 5 5 ( 5Efficiency ) 0 A 5 ) 0 A 5 ( 0 A 5
4 : ) G D G L G F G EGC D: D:G HD E FFB : ED H GH
, BEF - : GE G CC D 2E BH E GE D E HC DM :EG :ECF GH L : L BB FEF B G D D G G, BEF C B L G EG E F G :B D G C E H, BEF F G EGC D: FFB : EDH
G :B C E G ED B C DM E M 2EB : B G ,MD C :H 0 C E ,-2
G C EN.B MD C :H C D : B MD C :H, D BMH HN DEC : D BMH H
( 4 GH 3 GE : D: 1 : , -B C D GM F G :B F G G ED C E
In Black�modify existing codes to fit heterogeneous many coresIn Red: new applicationIn Green: partly new application
1 ... .1
PEZY-SC: conventional general purpose processor + many-core processor
Including general purpose processor in an LSI, memory system is directly connected to it.
Data has been stored on local memoryLargest issue is optimization of transmission between processors
Using OpenCL : accelerator type programming model
Largest issue is data transmission between memory and many core
PEZY-SC2�heterogeneous many-core processor
interconnect
mem
ory
Generalpurposeprocessor
interconnect
core
core
core
mem
ory
Generalpurposeprocessor
interconnect
core
core
core
mem
ory
Generalpurposeprocessor
interconnect
core
core
core
Heterogeneousmany-coreprocessor Heterogeneousmany-coreprocessor Heterogeneousmany-coreprocessor
Cannot get full performance of hardware using existing codes
Develop powerful programing models after tuning small no. of codes, then develop middleware.
system with homogeneous many-core system with heterogeneous many-core
22 C :
• /- - /
– - - // /- - -• - , - .• A• : 2 : 2
• -– E- 2 D :A
2 D :– 2 A : D 2 C
:
Application developers use Middleware
D C F > C > C >
• E F F>D
• - C C–– H C C > > E
> > >>C C >
– - -
• D H> > CC .– C D C– D C H > F C
H > F C > CE C H
• D >– - CCH DC C C E
-• E F F F E
F C >• - C C
– - -
– E C > >C C > )> > C > >
• D H> > CC .– ) F E >C F
CD C > >–
• D >– C > D > >– GC > F > D C >– F C D C >
• F C >– ( > > H
35 E > E D) - F G D E > F> D
• – . 5 E 5 E
• 2 : 0 5 E D5E >> E D D E) 0 E >D • , 5 > D F>5E D E . 5 E > D
DE5 E : E 5E 5>D : 0– 05 E> G E
• 3 D E) > E E 5 F ( : D • G E D >FE E 5 I E E• DE5 E G >FE : : 5E : >5 ED
•– 1F > 5E G5 >5E F > : 5E– 3 E > I 5 > 5> 5– D 5 G5>F5E : : 5 > D
Giant impact
Mantle convection
Droplet formation
Molecular motor
Crack propagation
���. Particle Method �(2) Disaster simulation of Tsunami and Earthquake• Massively parallel computing with Smoothed Particle
Hydrodynamics (SPH) and Discrete Element Method (DEM) – Suitable for Tsunami, power, granular and civil engineering problems– Over billion particles with efficient parallelization (see Poster P21)
• Practical Applications�– Direct comparison test between the numerical and laboratory
experiment of centrifuge system– Analysis of landslide, break water, Tsunami sedimentation, and
liquefaction– Efficient design of railway track structure
• Toward prediction of critical location in the disaster event
Laboratory experiment
Numerical simulation
Replacement of laboratory experiment with numerical simulation requires low energy cost => PEZY-SC2 could be the best solution
The quantitative agreement can be obtained with over billion particles.
SPH-DEM coupled simulation of tsunami run-up with building structure
3.2 Grid Method�Incompressible Flow�New�• Popularly used in industrial design• Recently utilized in medical fields such as blood
Challenging subjects• Performance limit by memory
band width• Strong scaling• Multi-physics phenomena
Courtesy of Nissan Motor Co., Ltd.
) ) 1 GT D 1 G A PG M AG BT GT D G B T D> M D> D
• 002 2 : : : : : B 2 0– 3 D> B D> /= P M D Y M DGGD
G G , (60– 04/ G– 04/ D 3636 >> G M D G
PUP D G ( ( G ( *– B 2A : 2
• 2 2 B 0 2 0 :– 2S P D GT D A B D> B D> -. /GG M PG
AM B GT D D GD D PG M A G BT GT DG M M D D BM = A M D>M = V D>M = 10
, D>M = = E – /GG D D>M = 10 M GTU =T D M M> D> G 0 T D
G -. D>M =D G > P D T MP> PM B T D> M D> D0 :/ : 2 , 2 0 2 2 2
E : 0 :/ : 2 , 2 0 : : 2 20 : 2 : 2 0 20 :
3 GT D B D G V D>M =D D>M =D G > P D T - 2 2 : 0B
M G D>M =D G > P D T MP> PM B T D> D G >PG M G G - 2 2:0 : : 2 0 2
: P A LP > AM B D> - D 2Metagenome analysis pipeline
Microbial GPS by hierarchical Bayesian model
*: Actual calculation time with 150 core NIG supercomputer or TSUBAME2.0
3.4. Other app:Human-scale neural network simulation(Cerebral cortex model is a new project)• - - - 1 1 1 -
1– A: : 7 7 1 : 7 : 1 7
• 1 7 C A1 A• 7 A A A 1 • - 7 1 1 71 A 1 7
– A 1 7 7 A 7 717 1 A:– 7 : 7 ) : 7 : 7 : 1 7
• - - 01 - 1 1 1 -– A
• 1 7:A 7 1 1 : C7 7 7 A• 7: 7:A 7 7 : ( ( .
Cerebellum
C. Layered and sheeted structure of the cerebral cortexB. Cerebro-cerebellar-basal ganglial loop model
12
Cerebral Cortex
A. Cerebellar corticonuclear microcomplex model
(A)
(B)
(C)
3.4 Others: Large scale simulations forprecise experiments in elementary particle physics
• Search for new physics beyond the Standard Model– Standard Model: three generations of matters (quark & lepton), force
carriers (gauge bosons), origin of masses (Higgs) à not the “Theory of Everything”
– Any inconsistency between the theory and experiments signals new physics– Precise experiments are underway at LHC (CERN), RHIC (BNL), SuperKEKB
(KEK), and will be held at future ILC (International Linear Collider) – For precise theoretical predictions, large scale simulations are necessary
• Lattice QCD (Quantum Chromodynamics)– QCD describes the strong interaction among “quarks” through “gluons”– Analytical calculation of QCD is impossible at low energy– Lattice QCD, QCD on 4-dim. Space-time lattice, enables quantitative
numerical calculations by Monte Carlo method
• Precise calculation of perturbation theory– QED (Quantum electrodynamics) in muon g-2 experiments (Fermi Lab),
Electroweak Theory in electron-positron collider(ILC).– Many Feynman diagrams must be evaluated for high-order corrections
à automated computation– Integration of functions with singular behavior requires “multi-precision”
13
Summary
• New processor architecture: heterogeneous many core processor– To answer how to write program code– Develop middleware for anybody to increase performance– 10 high performance applications
• Gravity multibody, Molecular Dynamics�Largest computations of Galaxy and molecules
• SPH & DEM method�Disasters simulation• Neulo science�Human brain scale simulation• LatticeQCD, Fluiddynamics, magnetic fluiddynamics� enhance
middleware for grid method• Genomic analysis � 1st application in data science
In Black�modify existing codes to fit heterogeneous many coresIn Red: new applicationIn Green: partly new application
Recommended