13
NAMD and BG/L Chee Wai Lee [email protected] Parallel Programming Laboratory Computer Science Department University of Illinois at Urbana- Champaign http://charm.cs.uiuc.edu

NAMD and BG/L

  • Upload
    crete

  • View
    24

  • Download
    0

Embed Size (px)

DESCRIPTION

Chee Wai Lee [email protected] Parallel Programming Laboratory Computer Science Department University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu. NAMD and BG/L. Outline. BG/L Platform overview Optimization Efforts: Context Optimization Efforts: Approaches Topology Awareness - PowerPoint PPT Presentation

Citation preview

Page 1: NAMD and BG/L

NAMD and BG/L

Chee Wai [email protected]

Parallel Programming LaboratoryComputer Science Department

University of Illinois at Urbana-Champaignhttp://charm.cs.uiuc.edu

Page 2: NAMD and BG/L

Outline

● BG/L Platform overview● Optimization Efforts: Context● Optimization Efforts: Approaches

– Topology Awareness– Load Balancing– Parallelism– Computation/Communication Overlap

● Results

Page 3: NAMD and BG/L

Bluegene/L Platform Review

● Hardware characteristics:– PowerPC 440 700 Mhz 32-bit processors– 2 Processors per node, no cache coherence– 4MB L3 Cache– 512 MB memory per node– 6 outgoing FIFO links per node– 3D Torus interconnect

Page 4: NAMD and BG/L

Bluegene/L Platform Review (2)

● Other characteristics:– Microkernel on compute nodes, minimal OS

interference.

Page 5: NAMD and BG/L

Outline

● BG/L Platform overview● Optimization Efforts: Context● Optimization Efforts: Approaches

– Topology Awareness– Load Balancing– Parallelism– Computation/Communication Overlap

● Results

Page 6: NAMD and BG/L

Objectives

● Scale the 92,000 atom benchmark apoa1 as far as possible.

● Sought understanding of scaling issues involved on the BG/L machine.

Page 7: NAMD and BG/L

Outline

● BG/L Platform overview● Optimization Efforts: Context● Optimization Efforts: Approaches

– Topology Awareness– Load Balancing– Parallelism– Computation/Communication Overlap

● Results

Page 8: NAMD and BG/L

Topology Awareness

● Distribute Patches according to the topology.– Logically align the NAMD 3D patch grid to BG/L's

processor grid.– Patch Grid divided by Orthogonal Recursive

Bisection (ORB) scheme.– Processor Grid is divided in similar proportions and

assigned to corresponding Patch subgrids.

● Topology aware spanning tree for multicasts.

Page 9: NAMD and BG/L

Load Balancing

● Framework optimizations– Memory footprint had to be reduced to accommodate

the desired number of processors.– Spanning Tree implemented to handle large numbers

of incoming messages to pe 0.

● Spread non-migratable work better– Bonded computations (eg. Dihedrals) allocated off

processors with Patch work where possible.

Page 10: NAMD and BG/L

More Parallelism

● 2-away computation. Patches interact with neighbors of neighbors.– User-tunable configuration option.

● Break up compute objects.– Another User-tunable configuration option.– Balance tradeoffs in grainsize vs overheads.

● PME pencil decomposition efforts.

Page 11: NAMD and BG/L

Overlap of Computation and Communication

● Hurt by lack of cache-coherence.

● One processor can serve as communication co-processor if the L1 caches are flushed for large messages. Hurts too much.

● Make use of FIFO link buffers. Every so often in NAMD's outer loop, we make AdvanceCommunication() calls.

Page 12: NAMD and BG/L

Outline

● BG/L Platform overview● Optimization Efforts: Context● Optimization Efforts: Approaches● Results

Page 13: NAMD and BG/L

Results

Nodes Processors Mode Time (watson)32 32 co347 ms128 128 co 97.2 ms512 512 co 23.7 ms1024 1024 co13.8 ms2048 2048 co8.6 ms4096 4096 co6.2 ms8192 Processor scaling was achieved at 5.2ms per step