Upload
christiana-chandler
View
224
Download
0
Embed Size (px)
Citation preview
Keep hardware in mind
• When considering ‘parallel’ algorithms,– We have to have an understanding of the
hardware they will run on
– Sequential algorithms: we are doing this implicitly
Creative use of processing power
• Lots of data = need for speed• ~20 years : parallel processing– Studying how to use multiple processors together– Really large and complex computations– Parallel processing was an active sub-field of CS
• Since 2005: the era of multicore is here– All computers will have >1 processing unit
Traditional Computing Machine
• Von Neumann model:– The stored program computer
• What is this?– Abstractly, what does it look like?
New twist: multiple control units
• It’s difficult to make the CPU any faster– To increase potential speed, add more CPUs– These CPUs are called cores
• Abstractly, what might this look like in these new machines?
Shared memory model
• Multiple processors can access memory locations
• May not scale over time– As we increase the ‘cores’
Algorithms• We will use term processor for the processing unit that
executes instructions
• When considering how to design algorithms for these architectures– Useful to start with a base theoretical model– Revise when implementing on different hardware with
software packages• Parallel computing course
– Also consider:• Memory location access by ‘competing’/’cooperating’ processors• Theoretical arrangement of the processors
PRAM model
• Parallel Random Access Machine• Theoretical
• Abstractly, what does it look like?• How do processors access memory in this
PRAM model?
PRAM model
• Processors working in parallel– Each trying to access memory values– Memory value: what do we mean by this?
• When designing algorithms, we need to consider what type of memory access that algorithm requires
• How might our theoretical computer work when many reads and writes are happening at the same time?
Designing algorithms
• With many algorithms, we’re moving data around– Sort, e.g. Others?
• Concurrent reads by multiple processors– Memory not changed, so no ‘conflicts’
• Exclusive writes (EW)– Design pseudocode so that any processor is exclusively
writing a data value into a memory location
Designing Algorithms• Arranging the processors– Helpful for design of algorithm
• We can envision how it works• We can envision the data access pattern needed
– EREW, CREW (CRCW)
– Not how processors are necessarily arranged in practice• Although some machines have been
– What are some possible arrangements?– Why might these arrangements prove useful for design?
Sequential merge sort
• Recursive– Can envision
a recursion tree
function mergesort(m) var list left, right if length(m) ≤ 1 return m else middle = length(m) / 2
for each x in m up to middle add x to left
for each x in m after middle add x to right
left = mergesort(left)
right = mergesort(right)
result = merge(left, right)
return result
Parallel merge sort• Shared data: 2 lists in memory• Sort pairs once in parallel• The processes merge concurrently
How might we write the pseudocode?
Parallel merge sort• Shared data: 2 lists in memory• Sort pairs once in parallel• The processes merge concurrently
How might we write the pseudocode?
Numbering of processors starts with 0
s = 2while s <= N do in parallel N/s steps for proc i merge values from i*s to (s*i)+s -1 s = s*2
Parallel Merge Sort
• Work through pseudocode with larger N
• Processor Arrangement: binary tree• Memory access: EREW
• What was the more practical implementation?
Activity: Sum N integers
• Suppose we have an array of N integers in memory
• We wish to sum them– Variant: create a running sum in a new array
• Devise a parallel algorithm for this– Assume PRAM to start– What processor arrangement did you use?– What memory access is required?
Next Activity• Now suppose you need an algorithm for
multiplying a matrix by a vector
X =
Matrix A Vector X Result Vector
Devise a parallel algorithm for thisAssume PRAM to start Think about what each process will compute- there are optionsWhat processor arrangement did you use?What memory access is required?
Matrix-Vector Multiplication• The matrix is assumed to be M x N. In other words:
– The matrix has M rows.– The matrix has N columns.– For example, a 3 x 2 matrix has 3 rows and 2 columns.
• In matrix-vector multiplication, if the matrix is M x N, then the vector must have a dimension, N.– In other words, the vector will have N entries.– If the matrix is 3 x 2, then the vector must be 3 dimensional.– This is usually stated as saying the matrix and vector must be
conformable.• Then, if the matrix and vector are conformable, the product of the matrix and the vector is a resultant vector that has a dimension of M.
(So, the result could be a different size than the original vector!)For example, if the matrix is 3 x 2, and the vector is 3 dimensional, the result of the multiplication would be a vector of 2 dimensions
Matrix-Vector Multiplication
• Ways to do a parallel algorithm:– One row of matrix per processor– One element of matrix per processor• There is additional overhead involved why?
• What if number of rows M is larger than number of processors?
• Emerging theme: how to partition the data