19
A Communication-Optimal N-Body Algorithm for Direct Interactions Michael Driscoll, Evangelos Georganas, Penporn Koanantakool, Edgar Solomonik, Katherine Yelick * UC Berkeley *Lawrence Berkeley National Laboratory

A Communication-Optimal N-Body Algorithm for Direct Interactions

  • Upload
    wren

  • View
    19

  • Download
    1

Embed Size (px)

DESCRIPTION

A Communication-Optimal N-Body Algorithm for Direct Interactions. Michael Driscoll, Evangelos Georganas , Penporn Koanantakool , Edgar Solomonik , Katherine Yelick * UC Berkeley *Lawrence Berkeley National Laboratory. Overview. Intro to N-Body problem. Communication bounds. - PowerPoint PPT Presentation

Citation preview

Page 1: A Communication-Optimal N-Body Algorithm for Direct Interactions

A Communication-Optimal N-Body Algorithm for Direct Interactions

Michael Driscoll, Evangelos Georganas, Penporn Koanantakool, Edgar Solomonik, Katherine Yelick*

UC Berkeley*Lawrence Berkeley National Laboratory

Page 2: A Communication-Optimal N-Body Algorithm for Direct Interactions

Overview

• Intro to N-Body problem.• Communication bounds.• Communication-optimal algorithm.• Performance results.• Conclusion

Page 3: A Communication-Optimal N-Body Algorithm for Direct Interactions

Direct N-Body

n particles- molecules, galaxies, database tuples, etc.- O(n2) interactions

for i = 1 to n: for j = 1 to n: force[i] += interact( particles[i], particles[j] )

p processors

Page 4: A Communication-Optimal N-Body Algorithm for Direct Interactions

Communication Model• Communication cost along critical path.

• Alpha-beta model:

• Can we find lower bounds on S or W?• Do current algorithms meet those bounds?• If not, can we find ones that do? or better bounds?

# messageslatency

1/bandwidth

# words

Page 5: A Communication-Optimal N-Body Algorithm for Direct Interactions

Communication Lower BoundsFrom Minimizing Communication in Numerical Linear Algebra [Ballard et al. 2011]:

F # flopsM size of fast memoryH max flops per M wordsS # messagesW # words

Generalized in: Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays [Christ et al. 2013].

Page 6: A Communication-Optimal N-Body Algorithm for Direct Interactions

Lower Bounds for N-Body

Flops:Memory:Max flops per M words:

Plug into latency and bandwidth lower bounds:

Do current algorithms meet these bounds?

Page 7: A Communication-Optimal N-Body Algorithm for Direct Interactions

A Naïve N-Body Algorithm

• For p steps, send n/p particles.# messages: # words:

• Recall bounds, and :✔

Proc. 2 Proc. 3 Proc. 4 Proc. 5 … Proc. PProc. 0 Proc. 1

+ +

+

particles:

replicas:

Page 8: A Communication-Optimal N-Body Algorithm for Direct Interactions

The naïve algorithm is optimal…

• Recall the lower bounds:

• Notice M in denominator.• Increase M => decrease communication.• Realize a “lower” lower bound.

Page 9: A Communication-Optimal N-Body Algorithm for Direct Interactions

Communication-Optimal N-Body

• Replication factor: c copies of each particle

• Communication cost: MessagesWords– Broadcast – Shifts – Reduction – Total

Team 2 Team 3 Team 4 Team 5 … Team p/c Team 0 Team 1particles:

processors:

p/c teams

c layers

+

reduce #messages by c2 reduce #words by c• c = p1/2 => force decomposition [Plimpton 1995]

Page 10: A Communication-Optimal N-Body Algorithm for Direct Interactions

Experiments

• Developed particle code– Flat MPI– 52-byte particles– Repulsive force drops off with square of distance– Reflective boundary conditions

• Platforms– Hopper: Cray XE-6 at NERSC, 24 cores/node– Intrepid: IBM BlueGene/P at ALCF, 4 cores/node– Both have 3D torus interconnect.

Page 11: A Communication-Optimal N-Body Algorithm for Direct Interactions

Performance on Hopper24K particles, 6K cores

Dow

n is good

95.6%reduction

Page 12: A Communication-Optimal N-Body Algorithm for Direct Interactions

Performance on Intrepid262K particles, 32K cores

Dow

n is good

99.3%reduction

Page 13: A Communication-Optimal N-Body Algorithm for Direct Interactions

Strong Scaling on Intrepid262K particles

Up is G

ood

Perfect Strong Scaling

4.5xspeedup

Page 14: A Communication-Optimal N-Body Algorithm for Direct Interactions

CA N-Body with Cutoff Distance

• No interactions beyond cutoff radius r

• Assuming:– uniform particle distribution– spatial processor decomposition

• Simple extension to support a cutoff:– still communication-optimal– works in space of any dimensions– speedups from 1D and 2D experiments

Page 15: A Communication-Optimal N-Body Algorithm for Direct Interactions

c layers

N-Body with Cutoff

• Shifts occur modulo the cutoff distance.• Optimality holds– same counting argument– see paper for details

particles:

processors:

p/c teams

+

cutoff diameter

Team 2 Team 3 Team 4 Team 5 … Team p/c Team 0 Team 1

Page 16: A Communication-Optimal N-Body Algorithm for Direct Interactions

1D Simulation on Intrepid262K particles, 32K cores

Dow

n is good

84.6% reduction

Page 17: A Communication-Optimal N-Body Algorithm for Direct Interactions

2D Simulation on Hopper196K particles, 24K cores

Dow

n is good

74.8% reduction

Page 18: A Communication-Optimal N-Body Algorithm for Direct Interactions

Strong Scaling on Hopper2D space, 24K cores, 196K particles

Up is G

ood

Good Strong Scaling

Page 19: A Communication-Optimal N-Body Algorithm for Direct Interactions

Conclusions• By using c times more memory, we reduce:

– Words sent along critical path: c.– Messages sent along critical path: c2.

• Theory: maximize c.• Practice: tune for best c.

– Saw 99.5% reduction in communication (11.8x speedup).

• Applications beyond direct n-body:– collision detection algorithms– database joins– bottom solvers in hierarchical n-body codes