104
CS3383 Unit 4: dynamic multithreaded algorithms David Bremner March 25, 2018

CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

CS3383 Unit 4: dynamic multithreadedalgorithms

David Bremner

March 25, 2018

Page 2: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Outline

Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions

Page 3: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Contents

Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions

Page 4: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Introduction to Parallel Algorithms

Dynamic Multithreading

▶ Also known as the fork-join model▶ Shared memory, multicore▶ Cormen et. al 3rd edition, Chapter 27

Page 5: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Introduction to Parallel AlgorithmsDynamic Multithreading

▶ Also known as the fork-join model▶ Shared memory, multicore▶ Cormen et. al 3rd edition, Chapter 27

Nested Parallelism▶ Spawn a subroutine, carry on with other work.▶ Similar to fork in POSIX.

Page 6: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Introduction to Parallel Algorithms

Nested Parallelism▶ Spawn a subroutine, carry on with other work.▶ Similar to fork in POSIX.

Parallel Loop

▶ iterations of a for loop can execute in parallel.▶ Like OpenMP

Page 7: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Cilk+

▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.

▶ Programmer specifies possible paralellism▶ Runtime system takes care of mapping to OS threads▶ Cilk+ contains several more features than our model, e.g.

parallel vector and array operations.▶ Similar primitives are available in java.util.concurrent

Page 8: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Cilk+

▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.

▶ Programmer specifies possible paralellism

▶ Runtime system takes care of mapping to OS threads▶ Cilk+ contains several more features than our model, e.g.

parallel vector and array operations.▶ Similar primitives are available in java.util.concurrent

Page 9: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Cilk+

▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.

▶ Programmer specifies possible paralellism▶ Runtime system takes care of mapping to OS threads

▶ Cilk+ contains several more features than our model, e.g.parallel vector and array operations.

▶ Similar primitives are available in java.util.concurrent

Page 10: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Cilk+

▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.

▶ Programmer specifies possible paralellism▶ Runtime system takes care of mapping to OS threads▶ Cilk+ contains several more features than our model, e.g.

parallel vector and array operations.

▶ Similar primitives are available in java.util.concurrent

Page 11: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Cilk+

▶ The multithreaded model is based on Cilk+, available in thelatest versions of gcc.

▶ Programmer specifies possible paralellism▶ Runtime system takes care of mapping to OS threads▶ Cilk+ contains several more features than our model, e.g.

parallel vector and array operations.▶ Similar primitives are available in java.util.concurrent

Page 12: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Writing parallel (pseudo)-codeKeywords

parallel Run the loop (potentially) concurrentlyspawn Run the procedure (potentially) concurrently

sync Wait for all spawned children to complete.

Serialization▶ remove keywords from parallel code yields correct serial code▶ Adding parallel keywords to correct serial code might break it

▶ missing sync▶ loop iterations not independent

Page 13: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Writing parallel (pseudo)-codeKeywords

parallel Run the loop (potentially) concurrentlyspawn Run the procedure (potentially) concurrently

sync Wait for all spawned children to complete.

Serialization▶ remove keywords from parallel code yields correct serial code▶ Adding parallel keywords to correct serial code might break it

▶ missing sync▶ loop iterations not independent

Page 14: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Writing parallel (pseudo)-codeKeywords

parallel Run the loop (potentially) concurrentlyspawn Run the procedure (potentially) concurrently

sync Wait for all spawned children to complete.

Serialization▶ remove keywords from parallel code yields correct serial code▶ Adding parallel keywords to correct serial code might break it

▶ missing sync

▶ loop iterations not independent

Page 15: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Writing parallel (pseudo)-codeKeywords

parallel Run the loop (potentially) concurrentlyspawn Run the procedure (potentially) concurrently

sync Wait for all spawned children to complete.

Serialization▶ remove keywords from parallel code yields correct serial code▶ Adding parallel keywords to correct serial code might break it

▶ missing sync▶ loop iterations not independent

Page 16: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Fibonacci Examplefunction Fib(𝑛)

if 𝑛 ≤ 1 thenreturn 𝑛

else𝑥 = Fib(𝑛 − 1)𝑦 = Fib(𝑛 − 2)

return 𝑥 + 𝑦end if

end function

▶ Code in C, Java, Clojure and Racket available from http://www.cs.unb.ca/~bremner/teaching/cs3383/examples

Page 17: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Fibonacci Examplefunction Fib(𝑛)

if 𝑛 ≤ 1 thenreturn 𝑛

else𝑥 = spawn Fib(𝑛 − 1)𝑦 = Fib(𝑛 − 2)syncreturn 𝑥 + 𝑦

end ifend function

▶ Code in C, Java, Clojure and Racket available from http://www.cs.unb.ca/~bremner/teaching/cs3383/examples

Page 18: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Contents

Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions

Page 19: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.

function Fib(𝑛)if 𝑛 ≤ 1 then ▷

return 𝑛else

𝑥 = spawn Fib(𝑛 − 1)𝑦 = Fib(𝑛 − 2) ▷syncreturn 𝑥 + 𝑦 ▷

end ifend function

P-FIB(1) P-FIB(0)

P-FIB(3)

P-FIB(4)

P-FIB(1)

P-FIB(1)

P-FIB(0)

P-FIB(2)

P-FIB(2)

Page 20: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.

nodes strandsdown edges spawn

up edges returnhorizontal edges sequentialcritical path longest path in DAG

span weighted length ofcritical path ≡ lowerbound on time

P-FIB(1) P-FIB(0)

P-FIB(3)

P-FIB(4)

P-FIB(1)

P-FIB(1)

P-FIB(0)

P-FIB(2)

P-FIB(2)

Page 21: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.

nodes strandsdown edges spawnup edges return

horizontal edges sequentialcritical path longest path in DAG

span weighted length ofcritical path ≡ lowerbound on time

P-FIB(1) P-FIB(0)

P-FIB(3)

P-FIB(4)

P-FIB(1)

P-FIB(1)

P-FIB(0)

P-FIB(2)

P-FIB(2)

Page 22: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.

nodes strandsdown edges spawnup edges return

horizontal edges sequential

critical path longest path in DAGspan weighted length of

critical path ≡ lowerbound on time

P-FIB(1) P-FIB(0)

P-FIB(3)

P-FIB(4)

P-FIB(1)

P-FIB(1)

P-FIB(0)

P-FIB(2)

P-FIB(2)

Page 23: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.

nodes strandsdown edges spawnup edges return

horizontal edges sequentialcritical path longest path in DAG

span weighted length ofcritical path ≡ lowerbound on time

P-FIB(1) P-FIB(0)

P-FIB(3)

P-FIB(4)

P-FIB(1)

P-FIB(1)

P-FIB(0)

P-FIB(2)

P-FIB(2)

Page 24: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Computation DAGStrandsSeq. inst. with no parallel, spawn, return from spawn, or sync.

nodes strandsdown edges spawnup edges return

horizontal edges sequentialcritical path longest path in DAG

span weighted length ofcritical path ≡ lowerbound on time

P-FIB(1) P-FIB(0)

P-FIB(3)

P-FIB(4)

P-FIB(1)

P-FIB(1)

P-FIB(0)

P-FIB(2)

P-FIB(2)

Page 25: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work and Speedup

𝑇1 Work, sequential time.

𝑇𝑝 Time on 𝑝 processors.

Work Law

𝑇𝑝 ≥ 𝑇1/𝑝speedup ∶= 𝑇1/𝑇𝑝 ≤ 𝑝

Page 26: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work and Speedup

𝑇1 Work, sequential time.𝑇𝑝 Time on 𝑝 processors.

Work Law

𝑇𝑝 ≥ 𝑇1/𝑝speedup ∶= 𝑇1/𝑇𝑝 ≤ 𝑝

Page 27: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work and Speedup

𝑇1 Work, sequential time.𝑇𝑝 Time on 𝑝 processors.

Work Law

𝑇𝑝 ≥ 𝑇1/𝑝speedup ∶= 𝑇1/𝑇𝑝 ≤ 𝑝

Page 28: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallelism

𝑇𝑝 Time on 𝑝 processors.

𝑇∞ Span, time givenunlimited processors.

We could idle processors:

𝑇𝑝 ≥ 𝑇∞ (1)

Best possible speedup:

parallelism = 𝑇1/𝑇∞≥ 𝑇1/𝑇𝑝 = speedup

Page 29: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallelism

𝑇𝑝 Time on 𝑝 processors.𝑇∞ Span, time given

unlimited processors.

We could idle processors:

𝑇𝑝 ≥ 𝑇∞ (1)

Best possible speedup:

parallelism = 𝑇1/𝑇∞≥ 𝑇1/𝑇𝑝 = speedup

Page 30: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallelism

𝑇𝑝 Time on 𝑝 processors.𝑇∞ Span, time given

unlimited processors.

We could idle processors:

𝑇𝑝 ≥ 𝑇∞ (1)

Best possible speedup:

parallelism = 𝑇1/𝑇∞≥ 𝑇1/𝑇𝑝 = speedup

Page 31: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Span and Parallelism Example

Assume strands are unit cost.▶ 𝑇1 = 17

▶ 𝑇∞ = 8▶ Parallelism = 2.125 for this

input size.

P-FIB(1) P-FIB(0)

P-FIB(3)

P-FIB(4)

P-FIB(1)

P-FIB(1)

P-FIB(0)

P-FIB(2)

P-FIB(2)

Page 32: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Span and Parallelism Example

Assume strands are unit cost.▶ 𝑇1 = 17▶ 𝑇∞ = 8

▶ Parallelism = 2.125 for thisinput size.

P-FIB(1) P-FIB(0)

P-FIB(3)

P-FIB(4)

P-FIB(1)

P-FIB(1)

P-FIB(0)

P-FIB(2)

P-FIB(2)

Page 33: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Span and Parallelism Example

Assume strands are unit cost.▶ 𝑇1 = 17▶ 𝑇∞ = 8▶ Parallelism = 2.125 for this

input size.P-FIB(1) P-FIB(0)

P-FIB(3)

P-FIB(4)

P-FIB(1)

P-FIB(1)

P-FIB(0)

P-FIB(2)

P-FIB(2)

Page 34: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Composing span and workA

A

B

B

A‖B

A +B

series 𝑇∞(𝐴 + 𝐵) = 𝑇∞(𝐴) + 𝑇∞(𝐵)

parallel 𝑇∞(𝐴‖𝐵) = max(𝑇∞(𝐴), 𝑇∞(𝐵))series or parallel 𝑇1 = 𝑇1(𝐴) + 𝑇1(𝐵)

Page 35: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Composing span and workA

A

B

B

A‖B

A +B

series 𝑇∞(𝐴 + 𝐵) = 𝑇∞(𝐴) + 𝑇∞(𝐵)parallel 𝑇∞(𝐴‖𝐵) = max(𝑇∞(𝐴), 𝑇∞(𝐵))

series or parallel 𝑇1 = 𝑇1(𝐴) + 𝑇1(𝐵)

Page 36: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Composing span and workA

A

B

B

A‖B

A +B

series 𝑇∞(𝐴 + 𝐵) = 𝑇∞(𝐴) + 𝑇∞(𝐵)parallel 𝑇∞(𝐴‖𝐵) = max(𝑇∞(𝐴), 𝑇∞(𝐵))

series or parallel 𝑇1 = 𝑇1(𝐴) + 𝑇1(𝐵)

Page 37: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.

𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)

Let 𝜙 ≈ 1.62 be the solution to

𝜙2 = 𝜙 + 1

We can show by induction (twice)that

𝑇 (𝑛) ∈ Θ(𝜙𝑛)

𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏

(I.H.)

Substitute the I.H.

𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)

= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)

≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large

= 𝑎𝜙𝑛 − 𝑏

Page 38: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.

𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)

Let 𝜙 ≈ 1.62 be the solution to

𝜙2 = 𝜙 + 1

We can show by induction (twice)that

𝑇 (𝑛) ∈ Θ(𝜙𝑛)

𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏

(I.H.)

Substitute the I.H.

𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)

= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)

≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large

= 𝑎𝜙𝑛 − 𝑏

Page 39: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.

𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)

Let 𝜙 ≈ 1.62 be the solution to

𝜙2 = 𝜙 + 1

We can show by induction (twice)that

𝑇 (𝑛) ∈ Θ(𝜙𝑛)

𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏

(I.H.)

Substitute the I.H.

𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)

= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)

≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large

= 𝑎𝜙𝑛 − 𝑏

Page 40: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.

𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)

Let 𝜙 ≈ 1.62 be the solution to

𝜙2 = 𝜙 + 1

We can show by induction (twice)that

𝑇 (𝑛) ∈ Θ(𝜙𝑛)

𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)

Substitute the I.H.

𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)

= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)

≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large

= 𝑎𝜙𝑛 − 𝑏

Page 41: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.

𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)

Let 𝜙 ≈ 1.62 be the solution to

𝜙2 = 𝜙 + 1

We can show by induction (twice)that

𝑇 (𝑛) ∈ Θ(𝜙𝑛)

𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)

Substitute the I.H.

𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)

= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)

≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large

= 𝑎𝜙𝑛 − 𝑏

Page 42: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.

𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)

Let 𝜙 ≈ 1.62 be the solution to

𝜙2 = 𝜙 + 1

We can show by induction (twice)that

𝑇 (𝑛) ∈ Θ(𝜙𝑛)

𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)

Substitute the I.H.

𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)

= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)

≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large

= 𝑎𝜙𝑛 − 𝑏

Page 43: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.

𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)

Let 𝜙 ≈ 1.62 be the solution to

𝜙2 = 𝜙 + 1

We can show by induction (twice)that

𝑇 (𝑛) ∈ Θ(𝜙𝑛)

𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)

Substitute the I.H.

𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)

= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)

≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large

= 𝑎𝜙𝑛 − 𝑏

Page 44: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Work of Parallel FibonacciWrite 𝑇 (𝑛) for 𝑇1 on input 𝑛.

𝑇 (𝑛) = 𝑇 (𝑛−1)+𝑇 (𝑛−2)+Θ(1)

Let 𝜙 ≈ 1.62 be the solution to

𝜙2 = 𝜙 + 1

We can show by induction (twice)that

𝑇 (𝑛) ∈ Θ(𝜙𝑛)

𝑇 (𝑛) ≤ 𝑎𝜙𝑛 − 𝑏 (I.H.)

Substitute the I.H.

𝑇 (𝑛) ≤ 𝑎(𝜙𝑛−1 + 𝜙𝑛−2) − 2𝑏 + Θ(1)

= 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 + (Θ(1) − 𝑏)

≤ 𝑎𝜙 + 1𝜙2 𝜙𝑛 − 𝑏 for 𝑏 large

= 𝑎𝜙𝑛 − 𝑏

Page 45: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Span and Parallelism of Fibonacci

𝑇∞(𝑛) = max(𝑇∞(𝑛 − 1), 𝑇∞(𝑛 − 2)) + Θ(1)= 𝑇∞(𝑛 − 1) + Θ(1)

Transforming to sum, we get

𝑇∞ ∈ Θ(𝑛)

parallelism = 𝑇1(𝑛)𝑇∞(𝑛)

= Θ (𝜙𝑛

𝑛)

▶ So an inefficient way to compute Fibonacci, but very parallel

Page 46: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Span and Parallelism of Fibonacci

𝑇∞(𝑛) = max(𝑇∞(𝑛 − 1), 𝑇∞(𝑛 − 2)) + Θ(1)= 𝑇∞(𝑛 − 1) + Θ(1)

Transforming to sum, we get

𝑇∞ ∈ Θ(𝑛)

parallelism = 𝑇1(𝑛)𝑇∞(𝑛)

= Θ (𝜙𝑛

𝑛)

▶ So an inefficient way to compute Fibonacci, but very parallel

Page 47: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Span and Parallelism of Fibonacci

𝑇∞(𝑛) = max(𝑇∞(𝑛 − 1), 𝑇∞(𝑛 − 2)) + Θ(1)= 𝑇∞(𝑛 − 1) + Θ(1)

Transforming to sum, we get

𝑇∞ ∈ Θ(𝑛)

parallelism = 𝑇1(𝑛)𝑇∞(𝑛)

= Θ (𝜙𝑛

𝑛)

▶ So an inefficient way to compute Fibonacci, but very parallel

Page 48: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Span and Parallelism of Fibonacci

𝑇∞(𝑛) = max(𝑇∞(𝑛 − 1), 𝑇∞(𝑛 − 2)) + Θ(1)= 𝑇∞(𝑛 − 1) + Θ(1)

Transforming to sum, we get

𝑇∞ ∈ Θ(𝑛)

parallelism = 𝑇1(𝑛)𝑇∞(𝑛)

= Θ (𝜙𝑛

𝑛)

▶ So an inefficient way to compute Fibonacci, but very parallel

Page 49: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Contents

Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions

Page 50: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Loopsparallel for 𝑖 = 1 to 𝑛 do

statement...statement...

end for

▶ Run 𝑛 copies in parallel with local setting of 𝑖.

▶ Effectively 𝑛-way spawn▶ Can be implemented with spawn and sync▶ Span

𝑇∞(𝑛) = Θ(log 𝑛) + max𝑖

𝑇∞(iteration i)

Page 51: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Loopsparallel for 𝑖 = 1 to 𝑛 do

statement...statement...

end for

▶ Run 𝑛 copies in parallel with local setting of 𝑖.▶ Effectively 𝑛-way spawn

▶ Can be implemented with spawn and sync▶ Span

𝑇∞(𝑛) = Θ(log 𝑛) + max𝑖

𝑇∞(iteration i)

Page 52: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Loopsparallel for 𝑖 = 1 to 𝑛 do

statement...statement...

end for

▶ Run 𝑛 copies in parallel with local setting of 𝑖.▶ Effectively 𝑛-way spawn▶ Can be implemented with spawn and sync

▶ Span𝑇∞(𝑛) = Θ(log 𝑛) + max

𝑖𝑇∞(iteration i)

Page 53: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Loopsparallel for 𝑖 = 1 to 𝑛 do

statement...statement...

end for

▶ Run 𝑛 copies in parallel with local setting of 𝑖.▶ Effectively 𝑛-way spawn▶ Can be implemented with spawn and sync▶ Span

𝑇∞(𝑛) = Θ(log 𝑛) + max𝑖

𝑇∞(iteration i)

Page 54: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Matrix-Vector productTo compute 𝑦 = 𝐴𝑥, in parallel

𝑦𝑖 =𝑛

∑𝑗=1

𝑎𝑖𝑗𝑥𝑗

function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do

𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for

end function

function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do

RowMult(A,x,y,i)end for

end function

𝑇1(𝑛) ∈ Θ(𝑛2) (serialization)𝑇∞(𝑛) = Θ(log(𝑛))⏟⏟⏟⏟⏟

parallel for

+ Θ(𝑛)⏟RowMult

Page 55: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Matrix-Vector productTo compute 𝑦 = 𝐴𝑥, in parallel

𝑦𝑖 =𝑛

∑𝑗=1

𝑎𝑖𝑗𝑥𝑗

function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do

𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for

end function

function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do

RowMult(A,x,y,i)end for

end function

𝑇1(𝑛) ∈ Θ(𝑛2) (serialization)𝑇∞(𝑛) = Θ(log(𝑛))⏟⏟⏟⏟⏟

parallel for

+ Θ(𝑛)⏟RowMult

Page 56: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Matrix-Vector productTo compute 𝑦 = 𝐴𝑥, in parallel

𝑦𝑖 =𝑛

∑𝑗=1

𝑎𝑖𝑗𝑥𝑗

function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do

𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for

end function

function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do

RowMult(A,x,y,i)end for

end function

𝑇1(𝑛) ∈ Θ(𝑛2) (serialization)𝑇∞(𝑛) = Θ(log(𝑛))⏟⏟⏟⏟⏟

parallel for

+ Θ(𝑛)⏟RowMult

Page 57: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Matrix-Vector productTo compute 𝑦 = 𝐴𝑥, in parallel

𝑦𝑖 =𝑛

∑𝑗=1

𝑎𝑖𝑗𝑥𝑗

function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do

𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for

end function

function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do

RowMult(A,x,y,i)end for

end function

𝑇1(𝑛) ∈ Θ(𝑛2) (serialization)𝑇∞(𝑛) = Θ(log(𝑛))⏟⏟⏟⏟⏟

parallel for

+ Θ(𝑛)⏟RowMult

Page 58: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Matrix-Vector product

function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do

𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for

end function

function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do

RowMult(A,x,y,i)end for

end function

▶ Why is RowMult not usingparallel for?

Page 59: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Matrix-Vector product

function RowMult(A,x,y,i)𝑦𝑖 = 0for 𝑗 = 1 to 𝑛 do

𝑦𝑖 = 𝑦𝑖 + 𝑎𝑖𝑗𝑥𝑗end for

end function

function Mat-Vec(𝐴, 𝑥, 𝑦)Let 𝑛 = rows(𝐴)parallel for 𝑖 = 1 to 𝑛 do

RowMult(A,x,y,i)end for

end function

▶ Why is RowMult not usingparallel for?

Page 60: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Divide and Conquer Matrix-Vector product

function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then

RowMult(A,x,y,f)else

𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync

end ifend function

1,1 2,2 3,3 4,4 5,5 6,6 7,7 8,8

1,2 3,4 5,6 7,8

1,4 5,8

1,8

Page 61: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Divide and Conquer Matrix-Vector product

function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then

RowMult(A,x,y,f)else

𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync

end ifend function

▶ 𝑇∞(𝑛) = Θ(log 𝑛)(binary tree)

▶ Θ(𝑛) leaves (one perrow)

▶ Θ(𝑛) interior nodes(binary tree)

▶ 𝑇1(𝑛) = Θ(𝑛2)

Page 62: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Divide and Conquer Matrix-Vector product

function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then

RowMult(A,x,y,f)else

𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync

end ifend function

▶ 𝑇∞(𝑛) = Θ(log 𝑛)(binary tree)

▶ Θ(𝑛) leaves (one perrow)

▶ Θ(𝑛) interior nodes(binary tree)

▶ 𝑇1(𝑛) = Θ(𝑛2)

Page 63: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Divide and Conquer Matrix-Vector product

function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then

RowMult(A,x,y,f)else

𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync

end ifend function

▶ 𝑇∞(𝑛) = Θ(log 𝑛)(binary tree)

▶ Θ(𝑛) leaves (one perrow)

▶ Θ(𝑛) interior nodes(binary tree)

▶ 𝑇1(𝑛) = Θ(𝑛2)

Page 64: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Divide and Conquer Matrix-Vector product

function MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑡)if 𝑓 == 𝑡 then

RowMult(A,x,y,f)else

𝑚 = ⌊(𝑓 + 𝑡)/2⌋spawn MVDC(𝐴, 𝑥, 𝑦, 𝑓, 𝑚)MVDC(𝐴, 𝑥, 𝑦, 𝑚 + 1, 𝑡)sync

end ifend function

▶ 𝑇∞(𝑛) = Θ(log 𝑛)(binary tree)

▶ Θ(𝑛) leaves (one perrow)

▶ Θ(𝑛) interior nodes(binary tree)

▶ 𝑇1(𝑛) = Θ(𝑛2)

Page 65: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Contents

Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions

Page 66: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

SchedulingScheduling Problem

Abstractly Mapping threads to processorsPragmatically Mapping logical threads to a thread pool.

Ideal SchedulerOn-Line No advance knowledge of when threads will spawn or

complete.Distributed No central controller.

▶ to simplify analysis, we relax the second condition

Page 67: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

SchedulingScheduling Problem

Abstractly Mapping threads to processorsPragmatically Mapping logical threads to a thread pool.

Ideal SchedulerOn-Line No advance knowledge of when threads will spawn or

complete.Distributed No central controller.

▶ to simplify analysis, we relax the second condition

Page 68: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

SchedulingScheduling Problem

Abstractly Mapping threads to processorsPragmatically Mapping logical threads to a thread pool.

Ideal SchedulerOn-Line No advance knowledge of when threads will spawn or

complete.Distributed No central controller.

▶ to simplify analysis, we relax the second condition

Page 69: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

A greedy centralized schedulerMaintain a ready queue of strands ready to run.

Scheduling Step

Complete Step If ≥ 𝑝 (# processors) strands are ready, assign 𝑝strands to processors.

Incomplete Step Otherwise, assign all waiting strands to processors

▶ To simplify analysis, split any non-unit strands into a chain ofunit strands

▶ Therefore, after one time step, we schedule again.

Page 70: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

A greedy centralized schedulerMaintain a ready queue of strands ready to run.

Scheduling Step

Complete Step If ≥ 𝑝 (# processors) strands are ready, assign 𝑝strands to processors.

Incomplete Step Otherwise, assign all waiting strands to processors

▶ To simplify analysis, split any non-unit strands into a chain ofunit strands

▶ Therefore, after one time step, we schedule again.

Page 71: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

A greedy centralized schedulerMaintain a ready queue of strands ready to run.

Scheduling Step

Complete Step If ≥ 𝑝 (# processors) strands are ready, assign 𝑝strands to processors.

Incomplete Step Otherwise, assign all waiting strands to processors

▶ To simplify analysis, split any non-unit strands into a chain ofunit strands

▶ Therefore, after one time step, we schedule again.

Page 72: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

A greedy centralized schedulerMaintain a ready queue of strands ready to run.

Scheduling Step

Complete Step If ≥ 𝑝 (# processors) strands are ready, assign 𝑝strands to processors.

Incomplete Step Otherwise, assign all waiting strands to processors

▶ To simplify analysis, split any non-unit strands into a chain ofunit strands

▶ Therefore, after one time step, we schedule again.

Page 73: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Optimal and Approximate SchedulingRecall

𝑇𝑝 ≥ 𝑇1/𝑝 (work law)𝑇𝑝 ≥ 𝑇∞ (span)

Therefore

𝑇𝑝 ≥ max(𝑇1/𝑝, 𝑇∞) = opt

With the greedy algorithm we can achieve

𝑇𝑝 ≤ 𝑇1𝑝

+ 𝑇∞ ≤ 2 max(𝑇1/𝑝, 𝑇∞) = 2 × opt

Page 74: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Optimal and Approximate SchedulingRecall

𝑇𝑝 ≥ 𝑇1/𝑝 (work law)𝑇𝑝 ≥ 𝑇∞ (span)

Therefore

𝑇𝑝 ≥ max(𝑇1/𝑝, 𝑇∞) = opt

With the greedy algorithm we can achieve

𝑇𝑝 ≤ 𝑇1𝑝

+ 𝑇∞ ≤ 2 max(𝑇1/𝑝, 𝑇∞) = 2 × opt

Page 75: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Complete Steps

▶ Let 𝑘 be the number of complete steps.

▶ At each complete step we do 𝑝 units of work.▶ Every unit of work corresponds to one step of the serialization,

so 𝑘𝑝 ≤ 𝑇1.▶ Therefore 𝑘 ≤ 𝑇1/𝑝

Page 76: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Complete Steps

▶ Let 𝑘 be the number of complete steps.▶ At each complete step we do 𝑝 units of work.

▶ Every unit of work corresponds to one step of the serialization,so 𝑘𝑝 ≤ 𝑇1.

▶ Therefore 𝑘 ≤ 𝑇1/𝑝

Page 77: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Complete Steps

▶ Let 𝑘 be the number of complete steps.▶ At each complete step we do 𝑝 units of work.▶ Every unit of work corresponds to one step of the serialization,

so 𝑘𝑝 ≤ 𝑇1.

▶ Therefore 𝑘 ≤ 𝑇1/𝑝

Page 78: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Complete Steps

▶ Let 𝑘 be the number of complete steps.▶ At each complete step we do 𝑝 units of work.▶ Every unit of work corresponds to one step of the serialization,

so 𝑘𝑝 ≤ 𝑇1.▶ Therefore 𝑘 ≤ 𝑇1/𝑝

Page 79: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining

strands.

▶ The ready queue of strands isexactly the set of sources in 𝐺

▶ In incomplete step runs all sourcesin 𝐺

▶ Every longest path starts at asource (otherwise, extend)

▶ After an incomplete step, length oflongest path shrinks by 1

▶ There can be at most 𝑇∞ steps.

Page 80: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining

strands.▶ The ready queue of strands is

exactly the set of sources in 𝐺

▶ In incomplete step runs all sourcesin 𝐺

▶ Every longest path starts at asource (otherwise, extend)

▶ After an incomplete step, length oflongest path shrinks by 1

▶ There can be at most 𝑇∞ steps.

Page 81: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining

strands.▶ The ready queue of strands is

exactly the set of sources in 𝐺▶ In incomplete step runs all sources

in 𝐺

▶ Every longest path starts at asource (otherwise, extend)

▶ After an incomplete step, length oflongest path shrinks by 1

▶ There can be at most 𝑇∞ steps.

Page 82: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining

strands.▶ The ready queue of strands is

exactly the set of sources in 𝐺▶ In incomplete step runs all sources

in 𝐺▶ Every longest path starts at a

source (otherwise, extend)

▶ After an incomplete step, length oflongest path shrinks by 1

▶ There can be at most 𝑇∞ steps.

Page 83: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining

strands.▶ The ready queue of strands is

exactly the set of sources in 𝐺▶ In incomplete step runs all sources

in 𝐺▶ Every longest path starts at a

source (otherwise, extend)▶ After an incomplete step, length of

longest path shrinks by 1

▶ There can be at most 𝑇∞ steps.

Page 84: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Counting Incomplete Steps▶ Let 𝐺 be the DAG of remaining

strands.▶ The ready queue of strands is

exactly the set of sources in 𝐺▶ In incomplete step runs all sources

in 𝐺▶ Every longest path starts at a

source (otherwise, extend)▶ After an incomplete step, length of

longest path shrinks by 1▶ There can be at most 𝑇∞ steps.

Page 85: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Slackness

parallel slackness = parallelism𝑝

= 𝑇1𝑝𝑇∞

speedup = 𝑇1𝑇𝑝

≤ 𝑇1𝑇∞

= 𝑝 × slackness

▶ If slackness < 1, speedup < 𝑝

▶ If slackness ≥ 1, linear speedup achievable for given number ofprocessors

Page 86: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Parallel Slackness

parallel slackness = parallelism𝑝

= 𝑇1𝑝𝑇∞

speedup = 𝑇1𝑇𝑝

≤ 𝑇1𝑇∞

= 𝑝 × slackness

▶ If slackness < 1, speedup < 𝑝▶ If slackness ≥ 1, linear speedup achievable for given number of

processors

Page 87: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Slackness and Schedulingslackness ∶= 𝑇1

𝑝 × 𝑇∞

TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.

Suppose

𝑇1𝑝 × 𝑇∞

≥ 𝑐

Then𝑇∞ ≤ 𝑇1

𝑐𝑝(2)

Recall that with the greedyscheduler,

𝑇𝑝 ≤ (𝑇1𝑝

+ 𝑇∞)

Substituting (2), we have

𝑇𝑝 ≤ 𝑇1𝑝

(1 + 1𝑐)

Page 88: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Slackness and Schedulingslackness ∶= 𝑇1

𝑝 × 𝑇∞

TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.

Suppose

𝑇1𝑝 × 𝑇∞

≥ 𝑐

Then𝑇∞ ≤ 𝑇1

𝑐𝑝(2)

Recall that with the greedyscheduler,

𝑇𝑝 ≤ (𝑇1𝑝

+ 𝑇∞)

Substituting (2), we have

𝑇𝑝 ≤ 𝑇1𝑝

(1 + 1𝑐)

Page 89: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Slackness and Schedulingslackness ∶= 𝑇1

𝑝 × 𝑇∞

TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.

Suppose

𝑇1𝑝 × 𝑇∞

≥ 𝑐

Then𝑇∞ ≤ 𝑇1

𝑐𝑝(2)

Recall that with the greedyscheduler,

𝑇𝑝 ≤ (𝑇1𝑝

+ 𝑇∞)

Substituting (2), we have

𝑇𝑝 ≤ 𝑇1𝑝

(1 + 1𝑐)

Page 90: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Slackness and Schedulingslackness ∶= 𝑇1

𝑝 × 𝑇∞

TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.

Suppose

𝑇1𝑝 × 𝑇∞

≥ 𝑐

Then𝑇∞ ≤ 𝑇1

𝑐𝑝(2)

Recall that with the greedyscheduler,

𝑇𝑝 ≤ (𝑇1𝑝

+ 𝑇∞)

Substituting (2), we have

𝑇𝑝 ≤ 𝑇1𝑝

(1 + 1𝑐)

Page 91: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Slackness and Schedulingslackness ∶= 𝑇1

𝑝 × 𝑇∞

TheoremFor sufficiently large slackness,the greed scheduler approachestime 𝑇1/𝑝.

Suppose

𝑇1𝑝 × 𝑇∞

≥ 𝑐

Then𝑇∞ ≤ 𝑇1

𝑐𝑝(2)

Recall that with the greedyscheduler,

𝑇𝑝 ≤ (𝑇1𝑝

+ 𝑇∞)

Substituting (2), we have

𝑇𝑝 ≤ 𝑇1𝑝

(1 + 1𝑐)

Page 92: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Contents

Dynamic Multithreaded AlgorithmsFork-Join ModelSpan, Work, And ParallelismParallel LoopsSchedulingRace Conditions

Page 93: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Race ConditionsNon-Determinism

▶ result varies from run to run▶ sometimes OK (in certain randomized algorithms)▶ mostly a bug.

Example

x = 0parallel for i ← 1 to 2 do

x ← x + 1

▶ This is nondeterministic unless incrementing x is atomic

Page 94: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Race ConditionsNon-Determinism

▶ result varies from run to run▶ sometimes OK (in certain randomized algorithms)▶ mostly a bug.

Example

x = 0parallel for i ← 1 to 2 do

x ← x + 1

▶ This is nondeterministic unless incrementing x is atomic

Page 95: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Racy execution

𝑥 = 0

𝑟1 ← 𝑥𝑟2 ← 𝑥

incr 𝑟1incr 𝑟2

𝑥 ← 𝑟1𝑥 ← 𝑟2

print 𝑥

▶ all possible topological sorts arevalid execution orders

▶ In particular it’s not hard for bothloads to complete before eitherstore

▶ In practice there are varioussynchronization strategies (locks,etc…).

▶ Here we will insist that parallelstrands are independent

Page 96: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Racy execution

𝑥 = 0

𝑟1 ← 𝑥𝑟2 ← 𝑥

incr 𝑟1incr 𝑟2

𝑥 ← 𝑟1𝑥 ← 𝑟2

print 𝑥

▶ all possible topological sorts arevalid execution orders

▶ In particular it’s not hard for bothloads to complete before eitherstore

▶ In practice there are varioussynchronization strategies (locks,etc…).

▶ Here we will insist that parallelstrands are independent

Page 97: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Racy execution

𝑥 = 0

𝑟1 ← 𝑥𝑟2 ← 𝑥

incr 𝑟1incr 𝑟2

𝑥 ← 𝑟1𝑥 ← 𝑟2

print 𝑥

▶ all possible topological sorts arevalid execution orders

▶ In particular it’s not hard for bothloads to complete before eitherstore

▶ In practice there are varioussynchronization strategies (locks,etc…).

▶ Here we will insist that parallelstrands are independent

Page 98: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Racy execution

𝑥 = 0

𝑟1 ← 𝑥𝑟2 ← 𝑥

incr 𝑟1incr 𝑟2

𝑥 ← 𝑟1𝑥 ← 𝑟2

print 𝑥

▶ all possible topological sorts arevalid execution orders

▶ In particular it’s not hard for bothloads to complete before eitherstore

▶ In practice there are varioussynchronization strategies (locks,etc…).

▶ Here we will insist that parallelstrands are independent

Page 99: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

We can write bad code with spawn too

sum(i, j)if (i>j)

return;if (i==j)

x++;else

m=(i+j)/2;spawn sum(i,m);sum(m+1,j);sync;

▶ here we have the samenon-deterministic interleaving ofreading and writing 𝑥

▶ the style is a bit unnatural, inparticular we are not using thereturn value of spawn at all.

Page 100: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Being more functional helps

sum(i, j)if (i>j) return 0;if (i==j) return 1;

m ← (i+j)/2;

left ← spawn sum(i,m);right ← sum(m+1,j);sync;return left + right;

▶ each strand writes intodifferent variables

▶ sync is used as a barrier toserialize

Page 101: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Being more functional helps

sum(i, j)if (i>j) return 0;if (i==j) return 1;

m ← (i+j)/2;

left ← spawn sum(i,m);right ← sum(m+1,j);sync;return left + right;

▶ each strand writes intodifferent variables

▶ sync is used as a barrier toserialize

Page 102: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Single Writer races

x ← spawn foo(x)y ← foo(x)sync

▶ arguments to spawnedroutines are evaluated in theparent context

▶ but this isn’t enough to berace free.

▶ which value 𝑥 is passed tothe second call of ’foo’depends how long the firstone takes.

Page 103: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Single Writer races

x ← spawn foo(x)y ← foo(x)sync

▶ arguments to spawnedroutines are evaluated in theparent context

▶ but this isn’t enough to berace free.

▶ which value 𝑥 is passed tothe second call of ’foo’depends how long the firstone takes.

Page 104: CS3383 Unit 4: dynamic multithreaded algorithms€¦ · IntroductiontoParallelAlgorithms DynamicMultithreading Alsoknownasthefork-joinmodel Sharedmemory,multicore Cormenet.al3rdedition,Chapter27

Single Writer races

x ← spawn foo(x)y ← foo(x)sync

▶ arguments to spawnedroutines are evaluated in theparent context

▶ but this isn’t enough to berace free.

▶ which value 𝑥 is passed tothe second call of ’foo’depends how long the firstone takes.