18
Parallel Research at Illinois Parallel Everywhere www.parallel.illinois.ed u

Parallel Research at Illinois

  • Upload
    kimama

  • View
    34

  • Download
    3

Embed Size (px)

DESCRIPTION

Parallel Research at Illinois. Parallel Everywhere. www.parallel.illinois.edu. Parallel @ Illinois. UPCRC. Cloud Computing Testbed. Illiac IV. OpenSparc Center of Excellence. Extreme Scale Computing. CUDA Center of Excellence. Taxonomy. - PowerPoint PPT Presentation

Citation preview

Page 1: Parallel Research at Illinois

Parallel Research at Illinois

Parallel Everywhere

www.parallel.illinois.edu

Page 2: Parallel Research at Illinois

2 www.parallel.illinois.edu

Parallel @ Illinois

Illiac IV

UPCRC

Cloud Computing Testbed

OpenSparc Center of Excellence

CUDA Center of Excellence

Extreme Scale Computing

Page 3: Parallel Research at Illinois

3 www.parallel.illinois.edu

Taxonomy• Cloud: runs many loosely coupled queries or applications

that often are IO intensive; focus is on dynamic provisioning, virtualization, cost reduction, etc. Costs are linear and fault tolerance is relatively easy

• Supercomputer: runs few tightly coupled, compute intensive applications. Focus is on algorithms, performance, communication, synchronization; fault tolerance is hard and cost is nonlinear

• Both can be humongous systems

Page 4: Parallel Research at Illinois

4 www.parallel.illinois.edu

FROM PETASCALE TO EXASCALETalk focuses on supercomputers

Page 5: Parallel Research at Illinois

5 www.parallel.illinois.edu

Current Leader: Jaguar Cray XT5 at Oak Ridge

Number of cores 153,000

Peak performance 1.645 petaflopsSystem memory 362 terabytesDisk space 10.7 petabytesDisk bandwidth 200+ gigabytes/secondPower Megawatts

(Band)

Page 6: Parallel Research at Illinois

6 www.parallel.illinois.edu

Supercomputing in Two Years

Blue Waters Computing System at Illinois

System Attribute

Vendor

Processor

Peak Performance (PF) Sustained Performance (PF)Number of Processor Cores

Amount of Memory (PB)Amount of Disk Storage (PB)Amount of Archival Storage (PB)External Bandwidth (Gbps)

* Reference petascale computing system (no accelerators).

Blue Waters*

IBMPower 7

>1.0

>200,000>0.8>10

>500100-400

$200M

Machine room designed for 20 MGW

Page 7: Parallel Research at Illinois

7 www.parallel.illinois.edu

Molecular Science Weather & Climate Forecasting

Earth ScienceAstronomy Health

Think of a Supercomputer as a Large Scientific Instrument to Observe the Unobservable

Page 8: Parallel Research at Illinois

8 www.parallel.illinois.edu

Performance growths 1,000-fold every 11 years

Page 9: Parallel Research at Illinois

9 www.parallel.illinois.edu

Exascale in 2020

• Extrapolation of current technology– 100M- 1B threads: the compute power of each

thread does not increase!– 100-500 MWatts

Page 10: Parallel Research at Illinois

10 www.parallel.illinois.edu

Main Issues• Can we reduce power consumption?• Can we use so many threads? • Can we program them?• Will the machine stay up?• How do we deal with variability & jitter?• How do we control & manage such a system?

10

Page 11: Parallel Research at Illinois

11 www.parallel.illinois.edu

Low Power Design

• Energy consumption might be reduced one order of magnitude with aggressive technology and architecture change

• But changes have a price– Low power cores -- more cores– Aggressive voltage scaling -- more errors– Aggressive DRAM redesign -- less bandwidth– Hybrid architecture -- hard to program

• This problem also affects large clouds, but to a lesser extent

Page 12: Parallel Research at Illinois

12 www.parallel.illinois.edu

Scaling Applications to 1B Threads

• Weak scaling: use more powerful machine to solve larger problem– increase application size and keep running time

constant; e.g., refine grid– Larger problem may not be of interest– May want to scale time, not space (molecular

dynamics) – not always easy to work in 4D– Cannot scale space without scaling time (iterative

methods): granularity decreases and communication increases

Page 13: Parallel Research at Illinois

13 www.parallel.illinois.edu

Scaling Iterative Methods• Assume that number of cores (and compute

power) are increased by factor of k4

• Space and time scales are refined by factor of k• Mesh size increased by factor of k×k×k• Local cell dimension decreases by factor of k1/4

• Cell volume decreases by factor of k3/4 while area decreases by factor of k2/4; area to volume ratio (communication to computation ratio) increases by factor of k3/2.

Page 14: Parallel Research at Illinois

14 www.parallel.illinois.edu

Debugging and Tuning: Observing 1B Threads

• Scalable infrastructure to control and instrument 1B threads

• Parallel information compression to identify “anomalies”

• Need to ability to express “normality” (global correctness and performance assertions)

Page 15: Parallel Research at Illinois

15 www.parallel.illinois.edu

Decreasing Mean Time to Failure• Problem:

– More transistors– Smaller transistors– Lower voltage– More manufacturing variance

• Current technology: global, synchronized checkpoint; HW error detection

• Future technology:– More HW fault detection– More efficient checkpoint (OS?, compiler? application?)– Fault resilient algorithms?

Page 16: Parallel Research at Illinois

16 www.parallel.illinois.edu

Handling Variability

• Current parallel programming models implicitly assume that all tasks progress at same rate.

• Future systems will have increased variability in rate of progress– HW variability: power management, manufacturing variability,

error correction– System variability: Jitter– Application variability: dynamic, irregular computations.

• Dynamic load balancing overcomes variability, but increases communication and reduces locality

Page 17: Parallel Research at Illinois

17 www.parallel.illinois.edu

Controlling the System

• Observability: Can you probe state of any HW or SW component?

• Situational awareness: When cascading errors occur, how long before you find the root cause?

• System analytics: Can you mine logs to identify anomalies?

• This problem also affects large clouds, but to a lesser extent

Page 18: Parallel Research at Illinois

18 www.parallel.illinois.edu

Summary

• Supercomputing is fun (again): business as usual will not go far

• Supercomputing is essential for progress on key scientific and societal issues

• You might live thru the end of Moore’s law: this will be a fundamental shift in our discipline and our industry