Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
THE GLOTZER GROUP
Breaking Through Serial Barriers: Scalable Hard Particle Monte Carlo
Simulations with HOOMD-BlueJoshua A. Anderson, M. Eric Irrgang, Sharon C. Glotzer
Anderson, J. A. et al., JCP 254, 27-38 (2013)
Thursday, April 17, 14
THE GLOTZER GROUPTHE GLOTZER GROUP
HPMC - Massively parallel MC on the GPU
• Hard Particle Monte Carlo plugin for HOOMD-blue
• 2D Shapes• Disk• Convex (Sphero)polygon• Concave polygon• Ellipse
• 3D Shapes• Sphere• Ellipsoid• Convex (Sphero)polyhedon
• NVT and NPT ensembles• Frenkel-Ladd free energy• Parallel execution on a single GPU• Domain decomposition across
multiple nodes (CPUs or GPUs)
H
β-MncP20 (A13)
#P04
[100]
Damasceno et al., Science (2012)
Engel M. et al., PRE 87, 042134 (2013)
Damasceno, P. F. et al., ACS Nano 6, 609 (2012)
Damasceno et al., Science (2012)
Thursday, April 17, 14
Thursday, April 17, 14
Thursday, April 17, 14
Thursday, April 17, 14
THE GLOTZER GROUP
GPU parallel Monte Carlo
• Store lists of active cells• Compute cell list• Compute extended cell list• For i in [0...nselect)• Loop through checkerboards in a shuffled order• For each active cell c in parallel• RNG rng(c, i, step)• Choose one particle p• Choose a random trial move p’• Check all particles in the extended cell list for
overlaps with p’• If p’ remains in cell and no overlaps• p’ -> p
• Translate all particles by a random vector
• Challenges : Expensive overlap checks, precision, highly divergent execution, auto-tuning
Thursday, April 17, 14
THE GLOTZER GROUP
Overlap checks• Disk/sphere - trivial• Convex polygons - separating axis• Concave polygons - brute force• Spheropolygons - XenoCollide/GJK• Convex polyhedra - XenoCollide/GJK• Ellipsoid / Ellipse: Matrix method• Compute delta in double, convert to
single for expensive overlap check
⊖=
Separating planes
XenoCollide
1001.842 - 1000.967 = 0.875
�~r
Thursday, April 17, 14
THE GLOTZER GROUP
Example job script
from hoomd_script import *from hoomd_plugins import hpmc
init.read_xml(filename=‘init.xml’)
mc = hpmc.integrate.convex_polygon(seed=10, d=0.25, a=0.3);mc.shape_param.set('A', vertices=[(-0.5, -0.5), (0.5, -0.5), (0.5, 0.5), (-0.5, 0.5)]);
run(10e3)
Thursday, April 17, 14
THE GLOTZER GROUP
Single GPU performance
CPU: 8-core Intel Xeon E5-2670 (Sandybridge) System: 65k particle dense fluid
38-64x 18-28x
0
16
32
48
64
80
Sp
eed
up
DiskSquare
Hexagon
Rounded SquareSphere
Ellipsoid
Tetrahedron
Cube
Trunc. Octahedron
Rounded Tetrahedron
Rounded Trunc. OctahedronDart
1 CPU core 1 CPU (8 cores) Tesla M2070 Tesla K20X Tesla K40
Thursday, April 17, 14
Thursday, April 17, 14
Thursday, April 17, 14
THE GLOTZER GROUP
Multi-node scaling - squares (2D) GPU: Tesla K20X, CPU: Xeon E5-2680 (XSEDE Stampede)
106
107
108
109
Performance
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
P - GPUs/CPU cores
41x
N=4,194,304N=65,536N=4,096
GPUGPUGPU
CPUCPUCPU
Thursday, April 17, 14
THE GLOTZER GROUP
Multi-GPU scaling bottlenecks - squares (2D)
0.20
0.50
1.00
2.00
5.00
Time/ms
1 2 4 8 16 32 64
P
Compute
Communicate
Ideal compute
Thursday, April 17, 14
THE GLOTZER GROUP
Multi-node scaling - truncated octahedra (3D)
105
106
107
108
109
Performance
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
P - GPUs/CPU cores
20x
N=4,096,000N=64,000N=4,096
GPUGPUGPU
CPUCPUCPU
GPU: Tesla K20X, CPU: Xeon E5-2680 (XSEDE Stampede)
Thursday, April 17, 14
THE GLOTZER GROUP
Questions?
Funding / Resources• DOD NSSEFF grant: N00244-09-1-0062• This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which
is supported by National Science Foundation grant number OCI-1053575.
email: [email protected]
• Code not yet publicly available, will eventually be released as part of HOOMD-blue http://codeblue.umich.edu/hoomd-blue
• Paper on disks: Anderson, J. A. et al., JCP 254, 27-38 (2013)• Paper on 3D & shapes: coming soon
Thursday, April 17, 14