17
Kristopher Windsor CS 147, Fall 2008

Multitasking and Parallelism

Embed Size (px)

DESCRIPTION

Multitasking and Parallelism. Kristopher Windsor CS 147, Fall 2008. Table of contents. Parallel processing on one core Multicore usage, difficulties, and next steps Alternatives to multicore CPUs Multicore benchmarks. Optimizing each clock cycle. - PowerPoint PPT Presentation

Citation preview

Page 1: Multitasking and Parallelism

Kristopher WindsorCS 147, Fall 2008

Page 2: Multitasking and Parallelism

Parallel processing on one core Multicore usage, difficulties, and next

steps Alternatives to multicore CPUs Multicore benchmarks

Page 3: Multitasking and Parallelism

Single data stream Multiple data streams

Single instruction stream

SISD (Pentium 4) SIMD (x86 MMX)

Multiple instruction streams

MISD (not used) MIMD (Xeon / Clovertown)

Multiple instructions and / or data can be processed each cycle, for batch-processing efficiency

For example, MMX has many ALUs operate simultaneously to process multiple data

Vector architecture is similar to SIMD, but its speed comes from parallel data movement, not parallel data processing

Page 4: Multitasking and Parallelism

Required whenever there are more threads than cores

There are multiple ways for a core to switch to a different thread Fine-grained multithreading: switch every

cycle Course-grained multithreading: switch when

the current thread is stalled (IE it is waiting for some data to come back from the RAM)

Simultaneous multithreading (SMT): multiple threads are processed each cycle

Page 5: Multitasking and Parallelism

Clock speed limits for each core due to heat Heat produced is exponentially related to

clock speed, and cooling methods are limited This limit has already been reached, and one

core is not enough Power efficiency

Smaller CPU designs can be optimized better Individual cores or processors can be turned

off when not needed

Page 6: Multitasking and Parallelism

Job-level parallelism Parallel processing program

Each process can only use one core

Easier to code Most programs are

written like this Inefficient when you

have multiple cores but only one main program

Each process can have multiple threads, which run on different cores

Harder to code Used in OS, which has

many independent tasks, and in web servers, where each request can be handled separately

Best use of multiple cores

Page 7: Multitasking and Parallelism

Software-rendered display represents most of the game’s CPU usage (IE more than the physics calculations), and the graphics output cannot naturally be split into multiple threads

3D hardware-accelerated graphic output is typically the performance bottleneck, and since the GPU is 50x + faster on a video card than on a CPU, multicore CPUs will not help

In games where every object can collide with every other object, physics cannot be parallelized easily because any two collisions may need to access the same memory

Every event has to happen in order, but parallel processing does not naturally do this

Page 8: Multitasking and Parallelism

Sequential Concurrent

Dim Shared As Integer total

Sub program () 'this part can be done several times

at once 'because it does not depend on 'other parts of the program Dim As Integer addme = 0 For i As Integer = 1 To 10000 addme += 1 Next i 'accesses a global variable total += addmeEnd Sub

For i As Integer = 1 To 100 program()Next i

Dim Shared As Integer totalDim Shared As Any Ptr mutex

Sub program () Dim As Integer addme = 0 For i As Integer = 1 To 10000 addme += 1 Next i Mutexlock(mutex) total += addme Mutexunlock(mutex)End Sub

mutex = Mutexcreate() Dim As Any Ptr threads(1 To 100) For i As Integer = 1 To 100 threads(i) = Threadcreate(@program()) Next i For i As Integer = 1 To 100 Threadwait(threads(i)) Next iMutexdestroy(mutex)

Page 9: Multitasking and Parallelism

Each processor has its own cache

If one processor changes the memory, the other processors may have the wrong data cached

Snooping protocol: when one processor changes the data, every other processor must remove (invalidate) its copy

AMD’s MOESI protocol: every cache block has data in one of these five states: modified, owned, exclusive, shared, or invalid

Page 10: Multitasking and Parallelism

Adding several cores to a machine will provide limited speed improvements, because the other components have not been upgraded

In this example, adding cores allows more FLOPs, but not more data transfer

Page 11: Multitasking and Parallelism

Intel is developing 6 and 8 core processors (Westmere and Nehalem)

Tilera produces 64-core chips (TILE64) with an architecture made for many cores Removes the bus data-

transfer bottleneck Saves power by

powering-off individual cores

Comes with developer tools for making parallel processing programs

Page 12: Multitasking and Parallelism

CPU GPU

Slowly adopting multiple cores

Caches exploit locality

Needs low-latency RAM

Naturally better suited to parallelism, and uses major multithreading to achieve performance The GeForce 8800 GTX has

16 multiprocessors and 16 * 8 multithreaded floating-point processors

No locality; uses course-grained hardware multithreading to minimize time loss

Needs high-bandwidth RAM

Page 13: Multitasking and Parallelism

Costs Benefits

Maintenance and storage costs for each machine

Operating systems will take RAM from each machine

Resources such as RAM cannot be shared well among machines

Can be built with mass-produced computers and standard LAN hardware.

Can reach sizes beyond the limits of current multicore chips

Can be spread over multiple physical locations Gives your company more

bandwidth than any one ISP offers Provides redundancy in case of fire

or power outage

Can be upgraded without replacing the current hardware

Page 14: Multitasking and Parallelism
Page 15: Multitasking and Parallelism

Sparse Matrix-Vector multiplication test and the Lattice-Boltzmann Magneto-Hydrodynamics test give different results

Less FLOPs per core when there are many cores Upgrading from 2 cores to 4 may have little effect

Certain processors better for certain applications (IE Xeon) Multicores demand new methods of software optimization

Page 16: Multitasking and Parallelism

Computer Organization and Design: the Hardware / Software Interface, 4th ed., by David A. Patterson and John L. Hennessy

AMD.com PCLaunches.com

(New Intel Processors)

Tilera.com

Page 17: Multitasking and Parallelism