47
Copyright 1999, © Amy Apo n, Ph.D. Shared Memory Shared Memory Parallel Programming Parallel Programming Cluster Computing Short Course Presentation One

Copyright 1999, © Amy Apon, Ph.D. Shared Memory Parallel Programming Cluster Computing Short Course Presentation One

Embed Size (px)

Citation preview

Copyright 1999, © Amy Apon, Ph.D.

Shared Memory Parallel Shared Memory Parallel ProgrammingProgramming

Cluster ComputingShort Course

Presentation One

Copyright 1999, © Amy Apon, Ph.D.

2

IntroductionIntroduction

The commodity computers available The commodity computers available today have capabilities that were today have capabilities that were previously only available on large previously only available on large mainframes and expensive mainframes and expensive workstations.workstations.

However, new programming techniques However, new programming techniques may be required in order to be able to may be required in order to be able to take advantage of these capabilities. take advantage of these capabilities.

Copyright 1999, © Amy Apon, Ph.D.

3

Introduction, continuedIntroduction, continued

A series of presentations will coverA series of presentations will cover

• shared memory programming using shared memory programming using the pthread librarythe pthread library

• distributed memory programming distributed memory programming using TCP sockets and MPIusing TCP sockets and MPI

• performance and architectural issuesperformance and architectural issues

Copyright 1999, © Amy Apon, Ph.D.

4

Shared Memory Shared Memory Programming OutlineProgramming Outline

• Process state, process creation (30 Process state, process creation (30 min)min)

• Basics of threads (30 min)Basics of threads (30 min)

• Thread synchronization (60 min)Thread synchronization (60 min)

• Symmetric multiprocessors (45 min)Symmetric multiprocessors (45 min)

Copyright 1999, © Amy Apon, Ph.D.

5

The Process State The Process State DiagramDiagram

• A process is a program in A process is a program in execution!execution!

scheduled

context switch

I/O requestI/O completion

enter exit

Process state, process creation

Copyright 1999, © Amy Apon, Ph.D.

6

fork() Creates Unix fork() Creates Unix ProcessesProcesses

main()

{ int pid;

if (pid = fork() !=0) { /* child is here */

}

else { /* parent is here */

}

main()

{ int pid;

if (pid = fork() !=0) { /* child is here */

}

else { /* parent is here */

}

Parent executes Parent executes firstfirst Child is an exact Child is an exact

copy except for copy except for process ID!process ID!

Gets control at Gets control at fork()

Process state, process creation

Copyright 1999, © Amy Apon, Ph.D.

7

Process Creation in UnixProcess Creation in Unixmain()

{ int pid;

if (pid = fork() !=0) { /* child is here */

}

else { /* parent is here */

}

main()

{ int pid;

if (pid = fork() !=0) { /* child is here */

}

else { /* parent is here */

}

In the parent, the In the parent, the returned value of returned value of the fork is the (new) the fork is the (new) child’s process IDchild’s process ID

In child, returned value is In child, returned value is 00

execve() is used to execve() is used to overlay child with a new overlay child with a new programprogram Process state, process creation

Copyright 1999, © Amy Apon, Ph.D.

8

A typical Unix process A typical Unix process treetree$ pstree init-+-crond |-httpd---5*[httpd] |-inetd---in.telnetd---login---tcsh-+-pstree |-login---bash |-lpd |-qmail-send-+-qmail-clean | |-qmail-lspawn | |-qmail-rspawn

•Process state, process creation

Copyright 1999, © Amy Apon, Ph.D.

9

Process statusProcess status

$ ps

PID TTY TIME CMD

24897 pts/0 00:00:00 tcsh

24937 pts/0 00:00:00 ps

$ ps aux | more (to see all processes)

$ top (to see busy processes)

$ man command (to get more information)

•Process state, process creation

Copyright 1999, © Amy Apon, Ph.D.

10

ProcessesProcesses

• Don’t share memory by defaultDon’t share memory by default– can do this with shmat(), but high can do this with shmat(), but high

overhead!overhead!

• Have an entry in the process tableHave an entry in the process table

• Generally have high overhead for Generally have high overhead for creation and for a context switchcreation and for a context switch

• For more info, see For more info, see Operating Operating SystemsSystems, by Silberschatz and Galvin, by Silberschatz and Galvin

Process state, process creation

Copyright 1999, © Amy Apon, Ph.D.

11

Basics of ThreadsBasics of Threads

• Threads are a fundamental tool for Threads are a fundamental tool for shared memory programming!shared memory programming!

Basics of threads

Copyright 1999, © Amy Apon, Ph.D.

12

pthread Librarypthread Library

• POSIX standard thread libraryPOSIX standard thread library

• Include in C, C++ programsInclude in C, C++ programs

• Portable across all Unix platformsPortable across all Unix platforms

• Some similarities and some differences Some similarities and some differences with Windows threadswith Windows threads

• Fully compatible with Java, Solaris Fully compatible with Java, Solaris threadsthreads

Basics of threads

Copyright 1999, © Amy Apon, Ph.D.

13

#include <pthread.h>#include <stdio.h>

void * hello (void * parm) { printf(”Hello World! My parameter is %d\n", (int) parm); pthread_exit(0);}

main() {

pthread_t tid;

pthread_create( &tid, NULL, hello, 1 );

pthread_join(tid, NULL);

}

Thread CreationThread Creation

Address of thread ID

Thread attribute

Function to execute

Single (address) parameter only!

Basics of threads

Copyright 1999, © Amy Apon, Ph.D.

14

To Execute a Thread To Execute a Thread ProgramProgram

• Compile in UnixCompile in Unix

$ gcc -o simple simple.c -lpthread

• ExecuteExecute

$ simple (or ./simple if “.” not on path)

Basics of threads

Copyright 1999, © Amy Apon, Ph.D.

15

A Thread is a A Thread is a “Lightweight Process”“Lightweight Process”

• Executes and context switches like a Executes and context switches like a processprocess

• Has its own ID and program counterHas its own ID and program counter

• Shares code, global variables, open file Shares code, global variables, open file pointers with creating process and pointers with creating process and threadsthreads

• Has its own local variables, stack spaceHas its own local variables, stack space

Basics of threads

Copyright 1999, © Amy Apon, Ph.D.

16

A Thread is a A Thread is a “Lightweight Process”“Lightweight Process”

• In most implementations, when a thread In most implementations, when a thread blocks, other threads do not block.blocks, other threads do not block.

• This allows one thread to do I/O (This allows one thread to do I/O (wait wait ) ) while another thread computes.while another thread computes.

• Multiple threads can execute Multiple threads can execute concurrently on a computer with more concurrently on a computer with more than one processorthan one processor

Basics of threads

Copyright 1999, © Amy Apon, Ph.D.

17

int globalvar = 0;pthread_t tid[3];

void * ChangeVar (void * parm) { globalvar++; printf(”I changed globalvar to %d", globalvar); }

main() { int i; for( i=0; i<3; i++) pthread_create( &tid[i], NULL, ChangeVar, NULL ); for( i=0; i<3; i++) pthread_join( tid[i], NULL);}

Shared Global VariablesShared Global Variables

Basics of threads

might be

different than the

one computed here

The value

printed here

THIS CODE HAS AN ERROR!

Copyright 1999, © Amy Apon, Ph.D.

18

A Context Switch Can A Context Switch Can Occur Anytime! Occur Anytime!

scheduled

context switch

enter exit

globalvar++;printf(”I changed globalvar to %d", globalvar);

Thread1 and Thread2 both execute this code

A context switch here

causes this access to get the wrong value

Basics of threads

Copyright 1999, © Amy Apon, Ph.D.

19

Race ConditionRace Condition• A A race conditionrace condition occurs whenever the occurs whenever the

outcome of the program depends on which outcome of the program depends on which thread modifies a shared memory location thread modifies a shared memory location first (i.e., “wins the race”)first (i.e., “wins the race”)

• A piece of code that accesses a shared A piece of code that accesses a shared memory location is called a memory location is called a critical sectioncritical section. .

• SynchronizationSynchronization is required so that the is required so that the shared access happens in mutual exclusionshared access happens in mutual exclusion

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

20

Synchronization is NeededSynchronization is Needed

Example 1Example 1: : Producer/Consumer ProblemProducer/Consumer Problem

• producers place items into a shared buffer, producers place items into a shared buffer, consumers remove items from the bufferconsumers remove items from the buffer

• producers must not write into a full bufferproducers must not write into a full buffer

• consumers must not remove the same itemconsumers must not remove the same item

This occurs with network printer queues.This occurs with network printer queues.

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

21

Synchronization is NeededSynchronization is Needed

Example 2Example 2: : Reader/Writer ProblemReader/Writer Problem

• Writers update a shared data item, Writers update a shared data item, readers read the itemreaders read the item

• Writers must write in mutual exclusion, Writers must write in mutual exclusion, any number of readers can read at a timeany number of readers can read at a time

Occurs with distributed database systems.Occurs with distributed database systems.

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

22

Synchronization is NeededSynchronization is Needed

Example 3: Barrier Example 3: Barrier SynchronizationSynchronization

• All threads must come to a common All threads must come to a common stopping place before any can stopping place before any can proceedproceed

STOP

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

23

Thread Synchronization Thread Synchronization ToolsTools• mutex variablesmutex variables

• semaphoressemaphores

• condition variablescondition variables

Only access to Only access to shared variablesshared variables must must be controlled. Local variables in a be controlled. Local variables in a thread are in private memory. thread are in private memory.

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

24

Mutex VariablesMutex Variables

• Works like the service station key!Works like the service station key!

• One thread has the “key” at a time.One thread has the “key” at a time.

•Don’t forget to give the key back Don’t forget to give the key back when finished!when finished!

pthread_mutex_t mutex;

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

25

Mutex ExampleMutex Examplepthread_mutex_t mut;int globalvar = 0;

void * ChangeGlobalVar (void * parm) { int localvar; pthread_mutex_lock(&mut); localvar = ++globalvar; pthread_mutex_unlock(&mut); printf(”I changed globalvar to %d", localvar); }

main() { pthread_mutex_init(&mut,NULL); /* create threads here */

Lock mutex before access

Unlock mutex after access

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

26

SemaphoresSemaphores

#include <semaphore.h> sem_t semA, semB;

Two primary operations on Two primary operations on semaphores:semaphores:

• sem_post(&semA);sem_post(&semA);

• sem_wait(&semB);sem_wait(&semB);

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

27

SemaphoresSemaphores

• Count open service positions, like at a Count open service positions, like at a bankbank

Sem_wait:Sem_wait: /* if a position is not open, wait *//* if a position is not open, wait */ while (sem<=0) wait; sem--; /* I take the open position *//* I take the open position */

Sem_post:Sem_post: sem++; /* when I leave, a position is open *//* when I leave, a position is open */

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

28

Using Semaphores for Using Semaphores for Resource AllocationResource Allocation

Synchronization

main() { sem_init(&res_sem, 0, 5);

pshared

value (quantity of this resource)

sem_wait(&res_sem);

/* use resource here */

sem_post(&res_sem);

request resource

release resource

Copyright 1999, © Amy Apon, Ph.D.

29

Using Semaphores for Using Semaphores for Barrier SynchronizationBarrier Synchronization

Thread AThread A Thread BThread B

main() { sem_init(&semA, 0, 0); sem_init(&semB, 0, 0);

sem_post(&semB);sem_wait(&semA);

sem_post(&semA);sem_wait(&semB);

Synchronization

initial value is 0

Copyright 1999, © Amy Apon, Ph.D.

30

Condition VariablesCondition Variables

• Based on “Monitors”, by C. A. R. HoareBased on “Monitors”, by C. A. R. Hoare

• Allow threads to wait for a resource to Allow threads to wait for a resource to become availablebecome available

• Always used with a mutexAlways used with a mutex

pthread_mutex_t mutex; pthread_cond_t notempty, notfull;

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

31

Condition VariablesCondition Variables

• Give a thread waiting for a resource the Give a thread waiting for a resource the first opportunity to use the mutex when first opportunity to use the mutex when the resource becomes availablethe resource becomes available

mutex

condition variable

Thread A locks mutex

A waits on condition

A obtains resource inside critical section

A enters critical section

A unlocks mutex

Thread B locks mutexB unlocks mutex

B releases resource

A waits on condition

Synchronization

A uses the resource outside of a critical section

Copyright 1999, © Amy Apon, Ph.D.

32

Using Conditions Variables Using Conditions Variables for Producer/Consumerfor Producer/Consumer

Producer

Consumer/* produce item in local buffer */

pthread_mutex_lock(&mut);

while (/* buffer is full */) pthread_cond_wait(&notfull, &mut );

/* put an item in buffer */

pthread_cond_signal(&notempty);pthread_mutex_unlock(&mut);

pthread_mutex_lock(&mut);

while (/* buffer is empty */) pthread_cond_wait(&notempty, &mut );

/* get an item from buffer */

pthread_cond_signal(&notfull);pthread_mutex_unlock(&mut);/* consume item */

Synchronization

Copyright 1999, © Amy Apon, Ph.D.

33

Shared Memory Shared Memory ProgrammingProgrammingWe have covered:We have covered:

• Process state, process creationProcess state, process creation

• Basics of pthreadsBasics of pthreads

• Thread synchronization using mutex, Thread synchronization using mutex,

semaphores, and condition variablessemaphores, and condition variables

Summary of

Shared Memory Programming

Copyright 1999, © Amy Apon, Ph.D.

34

Useful pthread Useful pthread informationinformation

• Getting Started With POSIX Threads, by Tom Getting Started With POSIX Threads, by Tom Wagner and Don TowsleyWagner and Don Towsley http://centaurus.cs.umass.edu/~wagner/threads_html/tutorial.html

• On-line thread tutorial from SunOn-line thread tutorial from Sun http://www.sun.com/sunworldonline/swol-02-1996/swol-02-threads.html

• Programming With Posix Threads, by David R. Programming With Posix Threads, by David R. Butenhof (Addison-Wesley Professional Butenhof (Addison-Wesley Professional Computing Series) Computing Series)

Copyright 1999, © Amy Apon, Ph.D.

35

What is an SMP? What is an SMP? (Symmetric (Symmetric

Multiprocessor)Multiprocessor)

• A computer with more than one CPUA computer with more than one CPU

• Disk subsystem, network, main Disk subsystem, network, main memory, I/O devices, … are all memory, I/O devices, … are all equally accessible to all processorsequally accessible to all processors

• However, each processor has its However, each processor has its own private cacheown private cache

Symmetric Multiprocessor

Copyright 1999, © Amy Apon, Ph.D.

36

Cache MemoryCache Memory

• Very fast memory, close to CPUVery fast memory, close to CPU

processorprocessor

cachecache

System busSystem bus

main memorymain memory

Symmetric Multiprocessor

Copyright 1999, © Amy Apon, Ph.D.

37

Cache MemoryCache Memory• Makes main memory appear faster (on the average)Makes main memory appear faster (on the average)

• If an item is in the cache, get itIf an item is in the cache, get it

• Otherwise, there is a cache miss and the item must be retrieved from memoryOtherwise, there is a cache miss and the item must be retrieved from memory

• Cache is 20 times (or more!) faster than memoryCache is 20 times (or more!) faster than memory

Symmetric Multiprocessor

Copyright 1999, © Amy Apon, Ph.D.

38

Cache MemoryCache Memory• Works because ofWorks because of

– spatial locality (next item to be accessed is likely to be close by)spatial locality (next item to be accessed is likely to be close by)

– temporal locality (this item is likely to be accessed again soon)temporal locality (this item is likely to be accessed again soon)

Think of the way we program, using loops, array access . . .Think of the way we program, using loops, array access . . .

Symmetric Multiprocessor

Copyright 1999, © Amy Apon, Ph.D.

39

Cache MemoryCache Memory

• Is organized in cache linesIs organized in cache lines

• A miss loads a cache line from memoryA miss loads a cache line from memory

Symmetric Multiprocessor

Any of these memory “lines”

Can be loaded into this cache location (in a direct mapped cache)

Copyright 1999, © Amy Apon, Ph.D.

40

Cache MemoryCache Memory

• If a line is cache needs to be replaced, If a line is cache needs to be replaced, then it must be copied back to memorythen it must be copied back to memory

Symmetric Multiprocessor

The location in memory is wrong until the cache line is copied back!

Copyright 1999, © Amy Apon, Ph.D.

41

SMP’s and Cache MemorySMP’s and Cache Memory• SMP cache and memory can be wrong! SMP cache and memory can be wrong!

-- the -- the cache coherence problemcache coherence problem

Symmetric Multiprocessor

System busSystem bus

A=5A=5

Proc 1Proc 1

A=5A=5

Proc 0Proc 0

A=7A=7

new (correct) value

old (incorrect) values!

Copyright 1999, © Amy Apon, Ph.D.

42

Snoopy BusSnoopy Bus

• Most common solution to the cache Most common solution to the cache coherence problem on SMP’scoherence problem on SMP’s

• The bus watches all reads and writesThe bus watches all reads and writes

• A cache miss causes the bus to A cache miss causes the bus to broadcast a request for the newest broadcast a request for the newest valuevalue

• A write sends an invalidate message A write sends an invalidate message

Symmetric Multiprocessor

Copyright 1999, © Amy Apon, Ph.D.

43

Snoopy BusSnoopy Bus

• Is a Is a very busyvery busy bus! bus!

• Works well for two (four?) processors, Works well for two (four?) processors, but is a classic “Von Neumann but is a classic “Von Neumann bottleneck”bottleneck”

• Does not perform well as the number Does not perform well as the number of the processors in the SMP gets of the processors in the SMP gets larger!larger!

Symmetric Multiprocessor

Copyright 1999, © Amy Apon, Ph.D.

44

Symmetric MultiprocessorSymmetric Multiprocessor

• Composed of two or more Composed of two or more (symmetric) processors, one of every (symmetric) processors, one of every other subsystemother subsystem

• Each processor has a private cacheEach processor has a private cache

• Snoopy bus is the most common Snoopy bus is the most common solution to the cache coherence solution to the cache coherence problemproblem

Symmetric Multiprocessor

Copyright 1999, © Amy Apon, Ph.D.

45

Symmetric Symmetric MultiprocessorsMultiprocessors

• Can degrade in performance as the Can degrade in performance as the number of processes increasesnumber of processes increases

• Performance depends on the Performance depends on the amount of data that is shared in the amount of data that is shared in the application!application!

Symmetric Multiprocessor

Copyright 1999, © Amy Apon, Ph.D.

46

Symmetric Symmetric MultiprocessorsMultiprocessors

• For more information, see:For more information, see:

• In Search of Clusters: The ongoing In Search of Clusters: The ongoing battle in lowly parallel computingbattle in lowly parallel computing, , Second Edition, by Gregory F. Pfister, Second Edition, by Gregory F. Pfister, Prentice Hall Publishing Company, Prentice Hall Publishing Company, 19981998

• Or a book on computer architectureOr a book on computer architecture

Symmetric Multiprocessor

Copyright 1999, © Amy Apon, Ph.D.

47

Presentation Two Presentation Two Distributed Memory Distributed Memory

ProgrammingProgramming

• Distributed memory processingDistributed memory processing

• TCP client/server examples TCP client/server examples

• How MPI works over TCP How MPI works over TCP

• Programming in MPIProgramming in MPI

• MPI set up, further information MPI set up, further information