19
Trends in Computing Architecture CMSC828E Ramani Duraiswami Several slides taken from a Microway/NVIDIA webinar Some figures adapted from web sources

Trends in Computing Architecture

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Trends in Computing Architecture

CMSC828E

Ramani Duraiswami

Several slides taken from a Microway/NVIDIA webinarSome figures adapted from web sources

Problem sizes in simulation and data processing are increasing

• Change in paradigm in science– Simulate then test

– Fidelity demands larger simulations

– Problems being simulated are also much more

• Sensors are getting varied and cheaper; and storage is getting cheaper

– Cameras, microphones

• Other Large data– Text (all the newspapers, books, technical papers)

– Genome data

– Medical/biological data (X-Ray, PET, MRI, Ultrasound, Electron microscopy …)

– Climate (Temperature, Salinity, Pressure, Wind, Oxygen content, …)

Ways to attack problem size growth

• Faster algorithms with better asymptotic complexity

• Faster processors

– “Moore’s law will take care of it”

• Go parallel!

– Clusters of computers

– New data parallel chips (multicore processors, GPUs)

“Moore’s Law will take care of it”• Not law but an

observation by Gordon Moore in the 1960s

• Number of transistors doubles every 18 months

• Basically has been taken to mean that the “standard computer”sperformance improves exponentially, with a doubling time of 18 months

Refuting the Moore’s law argument

• Argument:– Moore’s law: Processor speed doubles every 18 months

– If we wait long enough the computer will get fast enough and let my inefficient algorithm tackle the problem

• Is this true?

– Yes for algorithms with linear asymptotic complexity

– No!! For algorithms with different asymptotic complexity

– Most scientific algorithms are O(N2) or O(N3)

– For a million variables, we would need about 16 generations of Moore’s law before a O(N2) algorithm was comparable with a O(N) algorithm

• Did no one tell you that Moore’s law is dead?

Moore’s Law is dead:

“Issues at small scales”

- Lithography not possible

- 2D electrostatics harder to control,

- “parasitic resistance” degrade performance,

- device to device variations will be larger,

- ultra-thin bodies and hyper-abrupt junctionsmake manufacturing difficult

Moore’s Law is dead!

• Feature sizes and clock speeds on commodity chips have been stagnant over the past 4 years

– ~3 GHz and 45 nm

• All manufacturers are going with multicore to maintain performance

– Core-2, core-2-duo, quad-core, …

• Shared memory multiprocessing

– Intel has demo’ed several many core systems

• Graphics processors and gaming consoles have already been on the multicore path for a decade!

Sony Playstation 3

2.18 teraflops <$400

Difficult to program

Microsoft X-Box 360

1.04 teraflops <$300

Difficult to program

Gamer Power

GEFORCE 8880 GTXMulticore Intel box with 3 GPUsin Slots~ 1 Teraflop for < 3000(shown with 1 GPU)

Programming on the GPU• GPU organized as groups of

multiprocessors (8 relatively slow processors) with small amount of own memory and access to common shared memory

• Factor of 100s difference in speed as one goes up the memory hierarchy

• To achieve gains problems must fit the GPU programming paradigm/ manage memory

• Fortunately many practically important tasks do map well and work on converting others– Image and Audio Processing– Some types of linear algebra cores– Many machine learning algorithms

• Research issues: – Identifying important tasks and mapping them to the

architecture– Making it convenient for programmers to call GPU

code from host code

Local memory~50kB

GPU sharedmemory~1GB

Host memory~2-32 GB

11

4 cores

What is GPU Computing?

Computing with CPU + GPU

Heterogeneous Computing

12

146X

Medical Medical

Imaging Imaging

U of UtahU of Utah

36X

Molecular Molecular

DynamicsDynamics

U of Illinois, U of Illinois,

UrbanaUrbana

18X

Video Video

TranscodingTranscoding

Elemental TechElemental Tech

50X

MatlabMatlab

ComputingComputing

AccelerEyesAccelerEyes

100X

AstrophysicAstrophysic

ss

RIKENRIKEN

149X

Financial Financial

simulationsimulation

OxfordOxford

47X

Linear AlgebraLinear AlgebraUniversidad

Jaime

20X

3D 3D

UltrasoundUltrasound

TechniscanTechniscan

130X

Quantum Quantum

ChemistryChemistry

U of Illinois, U of Illinois,

UrbanaUrbana

30X

Gene Gene

SequencingSequencing

U of MarylandU of Maryland

Not 2x or 3x : Speedups are 20x to 150x

13

Accelerating Time to Discovery

4.6 Days

27 Minutes

2.7 Days

30 Minutes

8 Hours

13 Minutes16 Minutes

3 Hours

CPU Only With GPU

14

Source: Stone, Phillips, Hardy, Schulten

Molecular Dynamics

Available MD software

NAMD / VMD (alpha release)

HOOMD

ACE-MD

MD-GPU

Ongoing work

LAMMPS

CHARMM

GROMACS

AMBER

Source: Anderson, Lorenz, Travesset

15

Quantum Chemistry

Source: Ufimtsev, Martinez

Source: Yasuda

Available MD software

NAMD / VMD (alpha release)

HOOMD

ACE-MD

Ongoing work

LAMMPS

CHARMM

Q-Chem

Gaussian

GAMESS

16

Computational Fluid Dynamics (CFD)

Source: Thibault, Senocak

Source: Tolke, Krafczyk

Ongoing work

Navier-Stokes

Lattice Boltzman

3D Euler Solver

Weather and ocean modeling

17

Electromagnetics / Electrodynamics

FDTD Acceleration using GPUsSource: Acceleware

FDTD Solvers

Acceleware

EM Photonics

CUDA Tutorial

Ongoing work

Maxwell equation solver

Ring Oscillator (FDTD)

Particle beam dynamics

simulator

18

Weather, Atmospheric, & Ocean Modeling

Source: Michalakes,

Vachharajani

CUDA-accelerated WRF available

Other kernels in WRF being ported

Ongoing work

Tsunami modeling

Ocean modeling

Several CFD codes

Source: Matsuoka, Akiyama, et al

19

Computational Finance

Source: CUDA SDK

Financial Computing Software vendors

SciComp : Derivatives pricing modeling

Hanweck: Options pricing & risk analysis

Aqumin: 3D visualization of market data

Exegy: High-volume Tickers & Risk Analysis

QuantCatalyst: Pricing & Hedging Engine

Oneye: Algorithmic Trading

Arbitragis Trading: Trinomial Options

Pricing

Ongoing work

LIBOR Monte Carlo market model

Callable Swaps and Continuous Time

Source: SciComp