64
Sung Jong Lee ([email protected]) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop February 22-23, 2011

Sung Jong Lee ([email protected]) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Sung Jong Lee ([email protected])

Dept. of Physics, University of Suwon

Challenges in Parallel Super-computing

2011 1st KIAS Parallel Computation Workshop

February 22-23, 2011

Page 2: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Brief History of Supercomputing www.top500.org Grand Challenge Problems Present Machine’s Characteristics Challenges of Exascale Supercomputing Summary References

Contents

Page 3: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Automobile Crash Simulations in Audi • A Virtual Car undergoes 100,000 Crash simulations (48 months) be-

fore the first prototype is built. Then real crash tests are conducted.

• Audi Supercomputer ranks 260th among top 500 supercomputers

(Nov. 2010)

Page 4: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Performance Measure

Megaflops (MF/s) = 106 flops

Gigaflops (GF/s) = 109 flops

Teraflops (TF/s) = 1012 flops

Petaflops (PF/s) = 1015 flops

Exaflops (EF/s) = 1018 flops

Zettaflops (ZF/s) = 1021 flops

Yottaflops (YF/s) = 1024 flops

Flops = Floating Point Operations / Second

Page 5: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Milestones in Supercomputing

• GigaFlops : M13 , Scientific Research Institute of Com-puter Complexes, Moscow (1984)

• TeraFlops : ASCI Red, Sandia National Lab. (1996)

• PetaFlops : Roadrunner, Los Alamos National Lab.

(2008)

Page 6: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

History

• Alan Turing (1912-1954)

* Turing-Welchman

Bombe (1938)

* Used for Breaking

German Enigma, etc

Page 7: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

History

• Seymour Cray (1925-1996)– Developed CDC 1604 – first fully transistorized

supercomputer (1958)– CDC 6600 (1965), 9 MFlops– Founded Cray Research in 1972

• CRAY-1 (1976), 160 MFlops• CRAY-2 (1985)• CRAY-3 (1989)

Page 8: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Supercomputers in USSR (Ukraine)• M13 , Scientific Research Institute of Computer Com-

plexes, Moscow (1984)• 2.4 Gigaflops • Led By Mikhail A. Kartsev - Developer of Super-Computers for

Space Observation

Mikhail A. Kartsev 1923~1983 (?)

Page 9: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Architectures: Shared vs. Distributed

• Easy Programming: one global memory• Bottleneck of memory

• Message Passing: Send/Recv• Scalability• Programming: not easy

Page 10: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Architectural transitions • Vector Processors (70s ~ 90s) : Cray-1, Cray-2 , CRAY-XMP,

CRAY_YMP,SX-2, VP-200, etc• Massively Parallel Processors (90s~2000): Cray-T3E , CM5,

VPP-500, nCUBE, SP2, PARAGON• Clusters (2000~ ) • Multicore Processors (2003 ? ~ )

Cray-1 (1976)

Installed at Los Alamos National Lab.

$ 8.8 Million

Performance: 160 Mflops,

Main Memory : 8 MB

Page 11: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Present Architectural Trends

* Transition to Simplicity and parallelism (driven by) three trends:

1) Single-Processor performance is no longer improving significantly

- Explicit Parallelsim is the only way to increase performance.

2) Constant field scaling has come to an end

- Threshold voltage cannot be reduced (due to leakage current)

- New processors are simpler (better performance per unit

power)

3) Increase in main memory latency and decrease in main memory

bandwidth relative to processor cycle time and execution rate

continues. Memory Bandwidth and Latency becomes the

Performance-Limiting factor !

• Multicore Processors (2003 ? ~ :

Page 12: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Multi-Core Processors

• Three classes of multi-core die microarchitectures

Page 13: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Recent Multi-core CPU’s

Tilera’s TILE Gx CPU:

• 100 Cores • Performance: 750 * 10^9 32-bit ops • Power Consumption: 10~55 W • Memory Bandwidth : 500 Gb/s

• Intel’s CPU with 48

Cores • Performance :• Power Consumption: 25 W ~ 125

W• On die Power mangement • Clock Speed : 1.66~1.83 GHz • Memory Bandwidth :

Page 14: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

GPU

Nvidia Tesla M2050/70 GPU: • 448 CUDA Cores)• 3GB/6GB GDDR5 Memory• Power Consumption: 225 W • Memory Bandwidth• Performance: 515 Gflops (Double precision : 148 GB/s

Page 15: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

State of the Art Summary

* 50 years reliance on von Neumann model :

1) Split between Memory and CPU’s, Sequential thread, Model of Sequential Execution

2) Memory Wall : Performance of memory has not kept up with

the improvement in CPU clock rates leading to multi-level

caches and deep memory hierarchies

Complexity increases when multiple CPU’s attempt to share

the same memory.

Page 16: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

CPU and Memory Cycle Time Trend

* DARPA report, 2008, p103

Page 17: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

State of the Art Summary (2)

3) Power Wall : Rise of Power as the first class constraint.

Concomitant flattening of CPU clock rates Multi-Cores .

Already several hundred cores on a die.

Expect thousands of cores on a die.

But, More cores demands more memory Bandwidth for

Memory access. But, this is not possible due to the Power concerns.

4) Attempts to modify this von Neumann model

blurring memory and processing logic there.

Page 18: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

TOP 500 Supercomputers (Nov 2010) (http://www.top500.org)

Page 19: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

TOP Supercomputers (Nov 2010)

• 7 Systems exceeds 1 PFlop/s• Top 10 : 0.8 PFlop/s• Top 100 : 76 TFlop/s• Top 500 : 31.1 TFlop/s

Rmax Maximal LINPACK performance achieved

Rpeak Theoretical peak performance

• Rmax= 2.57 Petaflops, 186,368 Cores, • Main Memory = 229.4 TB• 14,336 Xeon X5670 processors and

7,168 Nvidia Tesla M2050 gpGPUs. 2,048 NUDT FT1000 heterogeneous pro-cessors

• Top 2 : Jaguar, (Oakridge Lab., USA)

• Top 1 : Tianhe-1A , (NSC, China)

• Rmax= 1.76 Petaflops 224,162 Cores (Cray XT5-HE Opteron 6-core 2.6 GHz)

• Main Memory = Not Available

Page 20: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop
Page 21: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Clock Rate in the Top 10 Supercomputers

Processor Parallelism in the Top 10 Supercomputers

Processor Parallelism

Page 22: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

How About Korea • Haedam (19th) & Haeon (20th) (Korea meteorological administration)

: 316.40 Tflops (45120 Cores)• TachyonII (24th) (KISTI) : 274.80 Tflops (26232 Cores)

Main Memory = 157.392 TByte

Page 23: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

TachyonII and IBM p6

H

Page 24: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

System Performance by Countries

Page 25: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Countries Share Over Time (1993~2010)

Page 26: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Architecture Share Over time (1993~2010)

Page 27: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Interconnect Family Share over Time (1993~2010)

Page 28: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop
Page 29: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Special Purpose Supercomputer• Anton (D. E. Shaw Research Group, 2008)• 512 processing nodes with 3D-torus hypercube topology • Each node includes a special MD engine as a single ASIC• Theoretical Maximum Performance = Flops• Net Power Consumption = KW

Page 30: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

KIAS Cluster Case

• 418 nodes (44,826 cores) : • Theoretical Maximum Performance = 67 TeraFlops• Net Power Consumption = 154.8 KW

This includes a GPU cluster with 24 nodes (43,008 cores),

49 TFlops

Page 31: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Grand Challenge Problems

* Astrophysics Problems : • High Energy Physics, Nuclear Physics:• Materials science : Design of novel materials

Quantum Structure calculations • Atmospheric Science : Weather forcasting, etc • Fusion Research : Magnetohydrodynamics of Plasma,

etc • Macromolecular Structure Modeling and Dynamics :

Protein structures and folding dynamics

Page 32: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

How does a protein physically fold from a denatured state into its native conformation?

Example : Protein Folding

?

Page 33: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Computational Load of Folding Simulation

By Molecular Dynamics

• Suppose a Protein + 1000 water molecules

- Approximately 3,000 atoms • Integration time step = 10-15 s• # of Long range force calculation at each time step

~ 1000*1000 = 106 • Then, one millisecond (10-3s) simulation corresponds to

1012 * 106 = 1018 calculations !!

This is an Exascale Problem !

Page 34: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

An Example 2a: WRF (Weather Research and For-cast) model : Full Scale Nature Run

At present

(1) 5km*5km square resolution, 101 vertical levels on the

hemisphere 2*109 cells

(2) time step = ? milliseconds

--- 10 Teraflops on 10,000 5Gflops nodes (2007)

If the resolution is ~1km , then 5*1010 cells

If sustained at Exascale, it would require 10 PB of main memory

with I/O requirements up 1000 times

Page 35: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Types of Challenge Problems

* Parallelism :

(a) Embarrassingly Parallel Problems

(b) Coarse-Grained Problems

(c) Fine-Grained Problems

* Computation vs. Memory:

(a) CPU-intensive Problems: Molecular Dynamics

(b) Memory-Intensive Problems : Bioinformatics,

Data Analysis in Large Data Experiments

(High Energy Experiments)

Page 36: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

* Objective: Understand the course of mainstream technology and determine the primary challenges to reaching a 1000x increase in computing capabil-ity by 2015.

DARPA Report on Exascale Computing Chal-lenges (2008)

ExaScale Computing Study:

Technology Challenges in Achieving

Exascale Systems

Peter Kogge, Editor & Study Lead

Page 37: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

(1) The Energy and Power Challenge

(2) The Memory and Storage Challenge

(3) The Concurrency and Locality

Challenge

(4) The Resiliency Challenge

* DARPA Report 2008

Four Main Challenges

for the Exascale Supercomputing

Page 38: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Energy and Power Challenge

• Power / Performance (Average over Top 10 sites)

= 2.67 KW / Teraflops = 2.67 nJ/flop =2.67x10-9 J/flop

Simple Extension to Exascale: need

around 1~2 Giga-Watt for 1 Exaflops !

~ Capacity of a whole Nuclear Power Plant !!

e.g. 21 Nuclear Power plants in KOREA produced

~19 GW electricity in 2009

Power Consumption vs. Performanceof top 10 supercomputers (Nov. 2010)

Page 39: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Energy and Power Challenge (continued)

• Power / Performance of a Tesla GPU

approx. 229 W / 515 Gigaflops

This means around 400 MW for 1 Exaflops solely for

the processing unit alone !!!!

Consider the case of recent GPU’s Tesla 2050/70

Page 40: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

http://www.green500.org

The Most Energy-Efficient Supercomputers (Nov. 2010)

Page 41: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

• (1) Main Memory : Assume 1GB per chip, then 1PB = Million Chips

In Realistic Sizes of Main Memory = 10PB~ 100PB

---> 10M~100M chips !!

-- > (a) Multiple Power and Resiliency Issues (plus cost !)

(b) Bandwidth challenge : How chips are organized and

interfaced with other components.

* Need to increase memory densities and bandwidths by orders

of magnitude . • (2) Secondary Storage : Need ~100 times the Main Memory Size

(a) Bandwidth challenge

(b) Challenge of managing metadata (file descriptors, i-nodes,

file control blocks, etc)

The Memory and Storage Challenge

• DARPA Report 2008, p213~214

Page 42: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

• Total Concurrency ≡ the total # of operations (flops) that must be initi-ated on each and every cycle. : Billion-way Concurrency Needed !!

The Concurrency Challenge

• DARPA Report 2008, p214~215

Page 43: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

• Parallelism ≡ the number of distinct threads that makes up the execution of a program

• Present maximum ~ order of 100,000 • Need to go to 108 : order of 100~1000 times present value !

The Processor parallelism

• DARPA Report 2008, p216

Page 44: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

• Resiliency ≡ the property of a system to continue effective operations even in the presence of faults either in hardware or software.

• More and Different forms of faults and disruptions than today’s systems

* Huge number of components : 106 to 108 memory chips

& 106 disk drives. * High clock rate increases bit error rates (BER) on data transmission.

* Aging effects in fault characteristics of devices

* Smaller feature sizes increases sensitivity of devices to SEU (Single

Event Upsets), e.g., cosmic rays, radiations

* Low operating voltage with low power increases the effect of noise

sources, like power supply

* The increased levels of concurrency increases the potential for races,

metastable states, and difficult timing problems.

The Resiliency Challenge

• DARPA Report 2008, p217

Page 45: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Aggressive Strawman Architecture

* DARPA Report 2008, p177

To Achieve 1 Exaflops,

* 1 Core = 4 FPU + L1 Cache Memory

* 1 Node = 742 Cores on a 4.5 Tflops, 150 Watt Active Power

Processor Chip

*1 Group = 12 Nodes +routing

* A Rack = 32 Groups

* System = 583 Racks,

* Total # of nodes ~ 223,000

* Total # of cores ~ 223,000*742

Page 46: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Aggressive Strawman Architecture

* DARPA Report 2008, p177

System Interconnect

Page 47: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

• DARPA Report 2008, p 128

Interconnect bandwidth requirements for an Exascale system

Page 48: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Characteristics of Aggressive Strawman Architec-ture

* DARPA Report 2008, p176

Perfomance = 1 Exaflops

Total Memory = 3.6 PB

Total Power Consumption = 67.7 MW !

Page 49: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

* Main Memory and Interconnect has important shares in power

consumption.

Power Distribution in Aggressive Strawman System

Page 50: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Characteristics of Aggressive Strawman Architec-ture

* DARPA Report 2008, p188

(1) Perfomance = 1 Exaflops

Total DRAM Memory = 3.6 PB

Disk Storage = 3,600 PB = 3.6 EB

Performance per Watt = 14.7 Gflops /Watt

Total # of Cores = 1.66 * 108

# of Microprocessor Chips = 223,872

Total Power Consumption = 67.7 MW !

(2) If Scaled down to 20 MW Power !

Then the Perfomance = 0.303 Exaflops = 303 Petaflops

Total DRAM Memory = 1.0 PB

Disk Storage = 1,080 PB = 1.08 EB

Total # of Cores = 5.04 * 107

# of Microprocessor Chips = 67,968

( Projected to the year of 2015)

Page 51: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

(a) The Energy-efficient Circuits and Architecture In Silicon

* Communication Circuits & Memory circuits

(b) Alternative Low-energy Devices and Circuits for Logic and Memory

e.g., * Superconducting RSFQ (Rapid Single Flux Quantum)

Devices

* Cross-bar Architectures with Novel Bi-state devices

(c) Alternative Low-energy Systems for Memory and Storage

* New Levels in Memory Hierarchy

* Rearchitecting conventional DRAMS

(d) 3D Interconnect, Packaging and Cooling

(e) Photonic Interconnect

Possibilities of Exascale Hardware

* DARPA Report 2008

Page 52: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

(a) Alternative Low-energy Devices and Circuits for Logic and Memory

e.g. * Superconducting RSFQ (Rapid Single Flux Quantum)

Devices : Extremely Low Power consumption

Logic Devices and Memory Devices

* DARPA Report 2008, Ch. 6

(b) Alternative Memory Types (Non-Volatile RAM’s) * Phase Change Memory (PCRAM) - Two resistance states (Crystalline vs. Amorphous) * SONOS Memory * Magnetic Random Access Memory (MRAM) - Fast Non-Volatile Memory Technology * FeRAM, Resistive RRAM

Page 53: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

* Potential Direction for 3D Packaging (A)

3D Packaging (A)

* DARPA Report 2008, p 160

Page 54: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

3D Packaging (B), (C)

* DARPA Report 2008, p161

• Potential Direction for 3D Packaging (B)

• Potential Direction for 3D Packaging (C)

Page 55: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Possible Aggressive Packaging of a Single Node

* DARPA Report 2008

Page 56: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Each Chip consists of 36 super-cores (6 by 6) each of which contain-ing 21 cores = 742 cores

A Strawman Design with

Optical Interconnects

* DARPA Report 2008, p191-198

Chip super-core organization and Photonic Interconnect

* On-Chip Optical Interconnect

* Off-Chip Optical Interconnect

* Rack to Rack Optical Intercon-nect

* Optically Connected Memory and Storage System

Page 57: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Rack to Rack Optical System Intercon-nect

* DARPA Report 2008, p195

Page 58: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Total Memory Power = 8.5 MW~ 12 MW

A Possible Optically Connected Memory Stack

* DARPA Report 2008, p197

Page 59: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

(a) System Architectures and Programming Model to Reduce Com-munication

* Design in self-awareness of the status of energy usage at all

levels, and the ability to maintain a specific power level

* more explicit and dynamic program control over the contents of

memory structures (such that minimal communication energy

is expended)

* Alternative execution and programming models

(b) Locality-aware Architectures

* Optimize data placement and movement

Exascale Architectures and Programming Models

* DARPA Report 2008

Page 60: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Presently O(105) processors need O(108) processors

and possibly O(1010) threads in Exascale

(a) Power and Resiliency Model in Application Models

(b) Understanding Adapting Old Algorithms

(c) Inventing New Algorithms

(d) Inventing New Applications

(e) Making Applications Resiliency-Aware

Exascale Algorithm and Application De-velopment

* DARPA Report 2008

Page 61: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

(a) Energy-efficient Error Detection and Correction Architecture

(b) Fail-in-place and Self-Healing Systems

(c) Checkpoing Roll-back and Recovery

(d) Algorithmic-level Fault Checking and Fault Resiliency

(e) Vertically-Integrated Resilient Systems

Resilient Exascale Systems

* DARPA Report 2008

Page 62: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Summary

• Exascale Supercomputing Requires New Technology

• Possibly Expected around ~2010

• Power Wall and Memory Wall should be overcome

• 3D Packaging and Optical Inteconnect should be pursued

• Alternative Materials for Memory and Logic Devices

- e.g., Superconducting Devices, Spintronics-based Devices

• Different Programming Model

• Reversible Logic and Computing

Page 63: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

• TOP 500 Supercomputers (http://www.top500.org)

• Exascale Computing Study (DARPA Report, 2008)

References :

Page 64: Sung Jong Lee (sjree@suwon.ac.kr) Dept. of Physics, University of Suwon Challenges in Parallel Supercomputing 2011 1st KIAS Parallel Computation Workshop

Thank You