Data Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy &...

Preview:

Citation preview

Data Parallelism

Companion slides forThe Art of Multiprocessor Programming

by Maurice Herlihy & Nir Shavit

Ever Wonder …

Art of Multiprocessor Programming 2

When did the term multicore become popular?

A multi-core processor is a single computing component with two or more independent actual central processing units, which are the units that read and execute program instructions.)

wikipedia

Let’s Ask Google Ngram!

Art of Multiprocessor Programming 3

usage of multicore in books by publication year

Let’s Ask Google Ngram!

Art of Multiprocessor Programming 4

This part since 2000 is obvious …

Let’s Ask Google Ngram!

Art of Multiprocessor Programming 5

???

Let’s Ask Google Ngram!

Art of Multiprocessor Programming 6

multicore cablemulticore fiber

but we digress …

WordCount

Art of Multiprocessor Programming 7

alpha 8

easy to do sequentially …what about in parallel?

bravo 3charlie 9

zulu 1

MapReduce

Art of Multiprocessor Programming 8

chapter 1 chapter 2 chapter k

split text among mapping threads

Map Phase

Art of Multiprocessor Programming 9

chapter 1 chapter 2 chapter k

must count words!

must count words!

must count words!

a mapping thread per chapter

Map Phase

Art of Multiprocessor Programming 10

chapter 1

alpha 9juliet 2,alpha 1tango 4

…each mapper thread produces a stream …of key-value pairs …

key: wordvalue: local count

Art of Multiprocessor Programming 11

Mapper Class

abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}

Art of Multiprocessor Programming 12

Mapper Class

abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}

input: document fragment

Art of Multiprocessor Programming 13

Mapper Class

abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}

key: individual word

Art of Multiprocessor Programming 14

Mapper Class

abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}

value: local count

Art of Multiprocessor Programming 15

Mapper Class

abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}

a task that runs in parallel with other tasks

Art of Multiprocessor Programming 16

Mapper Class

abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}

produces a map: word count

Art of Multiprocessor Programming 17

Mapper Class

abstract class Mapper<IN, K, V> extends RecursiveTask<Map<K, V>> { IN input; public void setInput(IN anInput) { input = anInput; }}

initialize input: which document fragment?

Art of Multiprocessor Programming 18

WordCount Mapperclass WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … }

Art of Multiprocessor Programming 19

WordCount Mapperclass WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … }document fragment is list of words

Art of Multiprocessor Programming 20

WordCount Mapperclass WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … }document fragment is list of words

map each word …

Art of Multiprocessor Programming 21

WordCount Mapperclass WordCountMapper extends mapreduce.Mapper< List<String>, String, Long > { … }document fragment is list of words

map each word …to its count in the fragment

Art of Multiprocessor Programming 22

WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }

Art of Multiprocessor Programming 23

WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }

the compute() method constructs the local word count

Art of Multiprocessor Programming 24

WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }

create a map to hold the output

Art of Multiprocessor Programming 25

WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }

examine each word in the document fragment

Art of Multiprocessor Programming 26

WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } } increment that word’s

count in the map

Art of Multiprocessor Programming 27

WordCount MapperMap<String,Long> compute() { Map<String,Long> map = new HashMap<>(); for (String word : input) { map.merge(word, 1L, (x, y) -> x + y); } return map; } }

when the local count is complete, return the map

Reduce Phase

Art of Multiprocessor Programming 28

alpha 4bravo 2

…zulu 1

a reducer thread merges mapper outputs

alpha 2juliet 1tango 1

alpha 1foxtrot 1papa 1tango 1

alpha 1oscar 1,bravo

2…

alpha 3bravo 2

…zulu 1

Reduce Phase

Art of Multiprocessor Programming 29

the reducer task produces a stream …of key-value pairs …

key: word value: word count

Art of Multiprocessor Programming 30

Reducer Classabstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aList; }}

Art of Multiprocessor Programming 31

Reducer Classabstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aList; }}

each reducer is givena single key (word)

Art of Multiprocessor Programming 32

Reducer Classabstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aList; }}

and a list of associated values (word count per fragment)

Art of Multiprocessor Programming 33

Reducer Classabstract class Reducer<K, V, OUT> extends RecursiveTask<OUT> { K key; List<V> valueList; public void setInput( K aKey, List<V> aList) { key = aKey; valueList = aList; }}

It produces a single summary value(the total count for that word)

WordCount

Art of Multiprocessor Programming 34

0.037

normalizing document wordcountgives a fingerprint vector

0.002

0.045

0.000

Document Fingerprint

Art of Multiprocessor Programming 35

a fingerprint is a point in a high-dimensional space

Clustering

Art of Multiprocessor Programming 36

romance novels

tango lyrics

Usenix Procedings

k-means

Art of Multiprocessor Programming 37

Find k clusters from raw data

k-means

Art of Multiprocessor Programming 38

Find k clusters from raw data

each vector closer to those in same cluster …

than in different clusters.

MapReduce

Art of Multiprocessor Programming 39

thread 1 thread 2 thread k

split points among mapping threads

k-means

Art of Multiprocessor Programming 40

Reducer picks k “centers” at random

Reduce Phase

Art of Multiprocessor Programming 41

0 c0

1 c1

2 c2

thread 1 thread 2 thread k

reducer sends key-value pair to mappers

key: cluster number

value: center point

Mappers

Art of Multiprocessor Programming 42

Each mapper uses centers to assign each vector to a cluster

0 c0

1 c1

2 c2

Mappers

Art of Multiprocessor Programming 43

Each mapper uses centers to assign each vector to a cluster

0 c0

1 c1

2 c2

Mappers

Art of Multiprocessor Programming 44

p0 2p1 1p2 1p3 0

mapper sends key-value stream to reducerkey: point

value: cluster ID

C0 = {…}C1 = {…}C2 = {…}

Back at the Reducer

Art of Multiprocessor Programming 45

The reducer merges the streams ….and assembles clusters …

Back at the Reducer

Art of Multiprocessor Programming 46

The reducer computes new centersbased on new clusters …

0 c0’1 c1’2 c2’

Once is Not Enough

Art of Multiprocessor Programming 47

thread 1 thread 2 thread k

reducer sends new centers to mappers

process ends when centers become stable

0 c0’1 c1’2 c2’

To Recaptulate

Art of Multiprocessor Programming 48

We saw two problems …

wordcount & k-means …

with similar parallel solutions

Map part is parallel …

Reduce part is sequential.

Art of Multiprocessor Programming 49

abstraction

Map Function

Art of Multiprocessor Programming 50

(k1, v1) list(k2, v2)

doc, contents word, count

cluster ID, center point, cluster ID

Reduce Function

Art of Multiprocessor Programming 51

(k2, list(v2)) list(v2)

word, list of counts count

cluster ID, list of points new cluster center

Example

Art of Multiprocessor Programming 52

Distributed Grep

Map:

line of document

Reduce:

copy line to display

Example

Art of Multiprocessor Programming 53

URL Access Frequency

Map:

(URL, local count)

Reduce:

(URL, total count)

Example

Art of Multiprocessor Programming 54

Reverse web link graph

Map:

(target link, source page)

Reduce:

(target link, list of source pages)

Other Examples

Art of Multiprocessor Programming 55

histogram

matrix multiplication

PageRank

Betweenness centrality

Art of Multiprocessor Programming 56

Distributed MapReduce

Google, Hadoop, etc…

Communication by message

Fault-tolerance important

Art of Multiprocessor Programming 57

Multicore MapReduce

Phoenix, Phoenix++, Metis …

Communication by shared memory objects

Fault-tolerance unimportant

Costs

Art of Multiprocessor Programming 58

key-value layout

memory allocation

mechanism overhead

cache pressure

static vs dynamic

Part Two

Data Streams59

Streams

Art of Multiprocessor Programming 60

sequence of transformations

transformation

transformation

data

source

data

data

consumer

no relation toI/O streams

sometimes in parallel

Streams

Art of Multiprocessor Programming 61

transformation

transformation

data

source

data

data

consumer

transformations given bymathematical functions

Streams

Art of Multiprocessor Programming 62

transformation

transformation

data

source

data

data

consumer

transformation createsnew stream

no modifications or side-effects

correctness easier?

Functional Programming

Art of Multiprocessor Programming 63

functions map old state to new state

old state never changed

no complex side-effects

elegant, easier proofs of correctness

Oh, Really?

Art of Multiprocessor Programming 64

“Functional languages are unnatural to use; but so are knives and forks, diplomatic protocols, double-entry bookkeeping, and a host of other things modern civilization has found useful.”

Jim Morris, 1982

Haiku

Art of Multiprocessor Programming 65

esthetically pleasing

only works on those who understand Haiku

Karate

Art of Multiprocessor Programming 66

esthetically pleasing

works even on those who do not understand Karate

Jim Morris’s Question

Art of Multiprocessor Programming 67

Is functional programming more like Haiku or Karate?

1981: Haiku Today: Karate

Laziness

Art of Multiprocessor Programming 68

x x+1

1,2,3,…

x 2x

sum

No computation until absolutely necessary

add 1 to each element: no computation

double each element: no computation

2(xi+1) is terminal

Laziness

Art of Multiprocessor Programming 69

x x+1

1,2,3,…

x 2x

collect in Listmove to container is terminal

Laziness

Art of Multiprocessor Programming 70

x x+1

1,2,3,…

x 2x

collect in ListLaziness permits optimizations

x 2(x+1)

Laziness

Art of Multiprocessor Programming 71

1 1 2 3 5 8 12 20 32 …

Laziness permits infinite streams …

Stream<Integer> fib = new FibStream();

Art of Multiprocessor Programming 72

Unbounded Random Stream

Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() );}

Art of Multiprocessor Programming 73

Unbounded Random Stream

Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() );}Unbounded stream of double-precision random numbers

Art of Multiprocessor Programming 74

Random Stream

Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() );}

Stream that generates new elements on the fly

Art of Multiprocessor Programming 75

Random Stream

Stream<Double> randomDoubleStream() { return Stream.generate( () -> random.nextDouble() );}

function to call when generating new element

example of Java lambda expression (anon method)

Art of Multiprocessor Programming 76

WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}

Art of Multiprocessor Programming 77

WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}

puts each word from the document into a List

Art of Multiprocessor Programming 78

WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}

open the file, create a FileReader

Art of Multiprocessor Programming 79

WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}

how a stream program looks

each line creates a new stream

Art of Multiprocessor Programming 80

WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}

turn the FileReader into a stream of lines, each line a string

Art of Multiprocessor Programming 81

WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}

map creates a new stream by applying a function to each stream element

here, converts to lower case

Art of Multiprocessor Programming 82

WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}

flatMap replaces one stream element with multiple stream elements

here, splits line into words

Art of Multiprocessor Programming 83

WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}

collect puts stream elements in a container

here, in a List

terminal operation

Art of Multiprocessor Programming 84

WordCountList<String> readFile(String fileName) { … return reader .lines() .map(String::toLowerCase) .flatMap(s -> pattern.splitAsStream(s)) .collect(Collectors.toList());}

No loops

No conditionals

No mutable objects

Art of Multiprocessor Programming 85

WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));

now let’s count the words

Art of Multiprocessor Programming 86

WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));

start with list of words

Art of Multiprocessor Programming 87

WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));

turn List into a stream

Art of Multiprocessor Programming 88

WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));

put stream into a container (Map)word count

Art of Multiprocessor Programming 89

WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));

each element’s key is that element

Art of Multiprocessor Programming 90

WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));

each element’s value is the number of times it appears

Art of Multiprocessor Programming 91

WordCountMap<String,Long> map = text .stream() .collect( Collectors.groupingBy( Function.identity(), Collectors.counting()));

No loops

No conditionals

No mutable objects

k-Means

Art of Multiprocessor Programming 92

class Point { Point(double x, double y) {…} Point plus(Point other) {…} Point scale(double x) {…} static Point barycenter( List<Point> cluster ) {…}}

k-Means

Art of Multiprocessor Programming 93

class Point { Point(double x, double y) {…} Point plus(Point other) {…} Point scale(double x) {…} static Point barycenter( List<Point> cluster ) {…}}

stream-based!

BaryCenter

Art of Multiprocessor Programming 94

Stream Barycenter

Art of Multiprocessor Programming 95

Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); }

Stream Barycenter

Art of Multiprocessor Programming 96

Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); }

cluster size

Stream Barycenter

Art of Multiprocessor Programming 97

Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); }

turn List into Stream

Reduce

Art of Multiprocessor Programming 98

stream.reduce(+):

() ;

(a) a

(a,b) a+b

(a,b,c) (a+b)+c

etc. …terminal operation

k-Means

Art of Multiprocessor Programming 99

Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); } sum points in cluster

Optional because sum might be empty!

k-Means

Art of Multiprocessor Programming 100

Point barycenter(List<Point> cluster){ double numPoints = cluster.size(); Optional<Point> sum = cluster .stream() .reduce(Point::plus); return sum.get() .scale(1 / numPoints); }

extract sum and divide by # points

k-Means

Art of Multiprocessor Programming 101

List<Point> points = readFile("cluster.dat");centers = randomDistinctCenters(points);double convergence = 1.0; while (convergence > EPSILON) { … }

k-Means

Art of Multiprocessor Programming 102

List<Point> points = readFile("cluster.dat");centers = randomDistinctCenters(points);double convergence = 1.0; while (convergence > EPSILON) { … }

read points from file

k-Means

Art of Multiprocessor Programming 103

List<Point> points = readFile("cluster.dat");centers = randomDistinctCenters(points);double convergence = 1.0; while (convergence > EPSILON) { … }

pick random centers

k-Means

Art of Multiprocessor Programming 104

List<Point> points = readFile("cluster.dat");centers = randomDistinctCenters(points);double convergence = 1.0; while (convergence > EPSILON) { … }

keep going until centers are stable

Compute New Clusters

Art of Multiprocessor Programming 105

while (convergence > EPSILON) { Map<Integer,List<Point>> clusters = points .stream() .collect( Collectors.groupingBy( p -> closestCenter(centers, p) ) );}

turn list of points into a Stream

Compute New Clusters

Art of Multiprocessor Programming 106

while (convergence > EPSILON) { Map<Integer,List<Point>> clusters = points .stream() .collect( Collectors.groupingBy( p -> closestCenter(centers, p) ) );}

put each point in a mapkey is closest centervalue is list of points with that center

k-Means Con’t

Art of Multiprocessor Programming 107

while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}

compute new centers map: ID center

k-Means Con’t

Art of Multiprocessor Programming 108

while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}

turn map into a Stream of (ID, point) pairs

k-Means Con’t

Art of Multiprocessor Programming 109

while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}

turn stream into a mapID barycenter

k-Means Con’t

Art of Multiprocessor Programming 110

while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}

new Key is still the cluster ID

k-Means Con’t

Art of Multiprocessor Programming 111

while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}

new value is the barycenter computed earlier

k-Means Con’t

Art of Multiprocessor Programming 112

while (convergence > EPSILON) { ... Map<Integer, Point> newCenters = clusters .entrySet() .stream() .collect( Collectors.toMap( e -> e.getKey(), e -> Point.barycenter(e.getValue()) ) ); convergence = distance(centers, newCenters); centers = newCenters;}

If centers have moved, start again with the new centers

Functional k-Means

Art of Multiprocessor Programming 113

many fewer lines of code

easier to read (really!)

easier to reason

easier to optimize

Art of Multiprocessor Programming 114

Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s));

So far, Streams are sequential

Art of Multiprocessor Programming 115

Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s));

make List of strings and turn them into a stream

Art of Multiprocessor Programming 116

Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s));

forEach applies a method to each element (not functional)

Art of Multiprocessor Programming 117

Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .forEach(s -> printf("%s\n", s));

ArlingtonBerkeleyClarendonDartmouthExeter

prints

Art of Multiprocessor Programming 118

Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .parallelStream() .forEach(s -> printf("%s\n", s));

turn List into a parallel stream

Art of Multiprocessor Programming 119

Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .parallelStream() .forEach(s -> printf("%s\n", s));

ArlingtonBerkeleyClarendonDartmouthExeter

prints

Dartmouth ArlingtonClarendonBerkeleyExeter

Dartmouth BerkeleyExeterArlingtonClarendon

Art of Multiprocessor Programming 120

Parallelism?Arrays.asList("Arlington", "Berkeley", "Clarendon", "Dartmouth", "Exeter") .stream() .parallel() .forEach(s -> printf("%s\n", s));

can turn stream into a parallel stream

Art of Multiprocessor Programming 121

Pitfallslist.stream().forEach( s -> list.add(0));

Art of Multiprocessor Programming 122

Pitfallslist.stream().forEach( s -> list.add(0));

lambda (function) must not modify source!

Art of Multiprocessor Programming 123

Pitfallssource.parallelStream() .forEach( s -> target.add(s));

exception if target not thread-safeorder added is non-deterministic

Conclusions

Art of Multiprocessor Programming 124

Streams support

functional programming

data parallelism

compiler optimizations

Recommended