46

•Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

Embed Size (px)

Citation preview

Page 1: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task
Page 2: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Moore’s Law

– The number of transistors that can be placed

inexpensively on an integrated circuit will

double approximately every 18 months.

– Self-fulfilling prophecy

• Computer architect goal

• Software developer assumption

2

Page 3: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Impediments to Moore’s Law

– Theoretical Limit

– What to do with all that die space?

– Design complexity

– How do you meet the expected performance

increase?

3

Page 4: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• von Neumann model – Execute a stream of instructions (machine code) – Instructions can specify

• Arithmetic operations • Data addresses • Next instruction to execute

– Complexity • Track billions of data locations and millions of instructions • Manage with:

– Modular design – High-level programming languages

4

Page 5: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Parallelism

– Continue to increase performance via

parallelism.

5

0

50

100

150

200

250

300

2004 2006 2008 2010 2012 2014 2016 2018

Nu

mb

er

of

Co

res

0

10

20

30

40

50

60

70

80

90

100

Man

ufa

ctu

rin

g P

rocess

Number of Cores

Processing

Page 6: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• From a software point-of-view, need to

solve demanding problems

– Engineering Simulations

– Scientific Applications

– Commercial Applications

• Need the performance, resource gains

afforded by parallelism

6

Page 7: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Engineering Simulations – Aerodynamics – Engine efficiency

7

Page 8: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Scientific Applications – Bioinformatics – Thermonuclear processes – Weather modeling

8

Page 9: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Commercial Applications – Financial transaction processing – Data mining – Web Indexing

9

Page 10: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Unfortunately, greatly increases coding

complexity

– Coordinating concurrent tasks

– Parallelizing algorithms

– Lack of standard environments and support

10

Page 11: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• The challenge

– Provide the abstractions, programming

paradigms, and algorithms needed to

effectively design, implement, and maintain

applications that exploit the parallelism

provided by the underlying hardware in order

to solve modern problems.

11

Page 12: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Standard sequential architecture

CPU RAM

BUS

Bottlenecks

12

Page 13: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Use multiple

– Datapaths

– Memory units

– Processing units

13

Page 14: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• SIMD

– Single instruction stream, multiple data

stream Processing

Unit

Control

Unit

Interco

nnect

Processing

Unit

Processing

Unit

Processing

Unit

Processing

Unit 14

Page 15: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• SIMD

– Advantages

• Performs vector/matrix operations well

– EX: Intel’s MMX chip

– Disadvantages

• Too dependent on type of computation

– EX: Graphics

• Performance/resource utilization suffers if

computations aren’t “embarrasingly parallel”.

15

Page 16: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• MIMD

– Multiple instruction stream, multiple data

stream Processing/Control

Unit

Processing/Control

Unit

Processing/Control

Unit

Processing/Control

Unit

Interco

nnect

16

Page 17: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• MIMD – Advantages

• Can be built with off-the-shelf components • Better suited to irregular data access patterns

– Disadvantages • Requires more hardware (!sharing control unit) • Store program/OS at each processor

• Ex: Typical commodity SMP machines we see

today.

17

Page 18: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Task Communication

– Shared address space

• Use common memory to exchange data

• Communication and replication are implicit

– Message passing

• Use send()/receive() primitives to exchange data

• Communication and replication are explicit

18

Page 19: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Shared address space

– Uniform memory access (UMA)

• Access to a memory location is independent of

which processing unit makes the request.

– Non-uniform memory access (NUMA)

• Access to a memory location depends on the

location of the processing unit relative to the

memory accessed.

19

Page 20: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Message passing

– Each processing unit has its own private

memory

– Exchange of messages used to pass data

– APIs

• Message Passing Interface (MPI)

• Parallel Virtual Machine (PVM)

20

Page 21: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Algorithm

– a sequence of finite instructions, often used

for calculation and data processing.

• Parallel Algorithm

– An algorithm that which can be executed a

piece at a time on many different processing

devices, and then put back together again at

the end to get the correct result

21

Page 22: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Challenges

– Identifying work that can be done

concurrently.

– Mapping work to processing units.

– Distributing the work

– Managing access to shared data

– Synchronizing various stages of execution.

22

Page 23: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Models

– A way to structure a parallel algorithm by

selecting decomposition and mapping

techniques in a manner to minimize

interactions.

23

Page 24: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Models

– Data-parallel

– Task graph

– Work pool

– Master-slave

– Pipeline

– Hybrid

24

Page 25: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Data-parallel

– Mapping of Work

• Static

• Tasks -> Processes

– Mapping of Data

• Independent data items assigned to processes

(Data Parallelism)

25

Page 26: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Data-parallel – Computation

• Tasks process data, synchronize to get new data or exchange results, continue until all data processed

– Load Balancing • Uniform partitioning of data

– Synchronization • Minimal or barrier needed at end of a phase

– Examples • Ray Tracing

26

Page 27: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Data-parallel

P

P

P

P

P

D D

D D

D

O

O

O

O

O

27

Page 28: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Task graph

– Mapping of Work

• Static

• Tasks are mapped to nodes in a data dependency

task dependency graph (Task parallelism)

– Mapping of Data

• Data moves through graph (Source to Sink)

28

Page 29: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Task graph – Computation

• Each node processes input from previous node(s) and send output to next node(s) in the graph

– Load Balancing • Assign more processes to a given task • Eliminate graph bottlenecks

– Synchronization • Node data exchange

– Examples • Parallel Quicksort, Divide and Conquer approaches • Scientific Applications that can be expressed in workflows

(e.g. DAGs)

29

Page 30: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Task graph

P

P P

P

P

P

D

D

D

O

O

30

Page 31: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Work pool

– Mapping of Work/Data

• No desired pre-mapping

• Any task performed by any process

• Pull-model oriented

– Computation

• Processes work as data becomes available (or

requests arrive)

31

Page 32: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Work pool

– Load Balancing

• Dynamic mapping of tasks to processes

– Synchronization

• Adding/removing work from input queue

– Examples

• Web Server

• Bag-of-tasks

32

Page 33: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Work pool

P

Work Pool

P P

P P

Inp

ut q

ueu

e

Outp

ut q

ueu

e

33

Page 34: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Master-slave

– Modification to Worker Pool Model

• One or more Master processes generate and

assign work to worker processes\

• Push-model oriented

– Load Balancing

• A Master process can better distribute load to

worker processes

34

Page 35: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Pipeline

– Mapping of work

• Processes are assigned tasks that correspond to

stages in the pipeline

• Static

– Mapping of Data

• Data processed in FIFO order

– Stream parallelism

35

Page 36: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Pipeline – Computation

• Data is passed through a succession of processes, each of which will perform some task on it

– Load Balancing • Insure all stages of the pipeline are balanced

(contain the same amount of work)

– Synchronization • Producer/Consumer buffers between stages

– Ex: Processor pipeline, graphics pipeline

36

Page 37: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Pipeline

P

Input q

ueu

e

Outp

ut q

ueu

e

P P

buffer

buffer

37

Page 38: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Message-Passing

• Shared Address Space

38

Page 39: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Message-Passing

– Most widely used for programming parallel

computers (clusters of workstations)

– Key attributes:

• Partitioned address space

• Explicit parallelization

– Process interactions

• Send and receive data

39

Page 40: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Message-Passing – Communications

• Sending and receiving messages • Primitives

– send(buff, size, destination) – receive(buff, size, source) – Blocking vs non-blocking – Buffered vs non-buffered

• Message Passing Interface (MPI) – Popular message passing library – ~125 functions

40

Page 41: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Message-Passing

Workstation

Workstation

Workstation

Workstation

Data

P4 P3 P2 P1

send(buff1, 1024, p3) receive(buff3, 1024, p1)

41

Page 42: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Shared Address Space – Mostly used for programming SMP machines

(multicore chips) – Key attributes

• Shared address space – Threads – Shmget/shmat UNIX operations

• Implicit parallelization

– Process/Thread communication • Memory reads/stores

42

Page 43: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Shared Address Space – Communication

• Read/write memory – EX: x++;

– Posix Thread API • Popular thread API • Operations

– Creation/deletion of threads – Synchronization (mutexes, semaphores) – Thread management

43

Page 44: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Shared Address Space

Workstation

T3 T4 T2 T1

Data SMP RAM

44

Page 45: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

• Synchronization

– Deadlock

– Livelock

– Fairness

• Efficiency

– Maximize parallelism

• Reliability

– Correctness

– Debugging

45

Page 46: •Moore’s Law - Department of Computer Science | IIT ...iraicu/teaching/CS595-F11/lecture08.pdf · •Moore’s Law –The number of transistors that can be placed ... •Task

46