Scala%Parallel%Collec+ons%days2011.scala-lang.org/sites/days2011/files/29. Parallel Collections.pdfScala parallelcollecons for { s

Scala Parallel Collec+ons Aleksandar Prokopec

EPFL

Scala collec+ons

for { s <- surnames n <- names if s endsWith n } yield (n, s)

McDonald

Scala collec+ons


1040 ms

Scala parallel collec+ons



for { s <- surnames.par n <- names.par if s endsWith n } yield (n, s)



2 cores

575 ms



4 cores

305 ms

for comprehensions

surnames.par.flatMap { s => names.par .filter(n => s endsWith n) .map(n => (n, s)) }

for comprehensions nested parallelized bulk opera+ons

surnames.par.flatMap { s => names.par .filter(n => s endsWith n) .map(n => (n, s)) }

Nested parallelism

Nested parallelism parallel within parallel

composi+on

surnames.par.flatMap { s => surnameToCollection(s) // may invoke parallel ops }

Nested parallelism going recursive

def vowel(c: Char): Boolean = ...


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield

recursive algorithms


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, Array(""))


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, Array(""))

1545 ms


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray(""))


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray("")) 1 core

1575 ms


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray("")) 2 cores

809 ms


def vowel(c: Char): Boolean = ... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray("")) 4 cores

530 ms

So, I just use par and I’m home free?

How to think parallel

Character count use case for foldLeQ

val txt: String = ... txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }

6 5 4 3 2 1 0


txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }

going leQ to right -‐ not parallelizable!

A B C D E F

_ + 1


txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }

going leQ to right – not really necessary

3 2 1 0 A B C

_ + 1

3 2 1 0 D E F

_ + 1

_ + _ 6

Character count in parallel

txt.fold(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }

Character count in parallel


3 2 1 1 A B C

_ + 1

3 2 1 1 A B C

: (Int, Char) => Int

Character count fold not applicable


3 2 1 3 A B C

_ + _ 3 3

3 2 1 3 A B C

! (Int, Int) => Int

Character count use case for aggregate

txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1 }, _ + _)

3 2 1 1 A B C



_ + _ 3 3

3 2 1 3 A B C

_ + 1


aggrega+on element

3 2 1 1 A B C

_ + _ 3 3

3 2 1 3 A B C


B

_ + 1


aggrega+on aggrega+on aggrega+on element

3 2 1 1 A B C

_ + _ 3 3

3 2 1 3 A B C


B

_ + 1

Word count another use case for foldLeQ

txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) }

Word count ini+al accumula+on


0 words so far last character was a space

“Folding me softly.”

Word count a space



last seen character is a space

Word count a non space



last seen character was a space – a new word

Word count a non space



last seen character wasn’t a space – no new word

Word count in parallel

“softly.“ “Folding me “

P1 P2



wc = 2; rs = 1 wc = 1; ls = 0

P1 P2



wc = 2; rs = 1 wc = 1; ls = 0 wc = 3

P1 P2

Word count must assume arbitrary par++ons

“g me softly.“ “Foldin“

wc = 1; rs = 0 wc = 3; ls = 0

P1 P2

Word count must assume arbitrary par++ons

“g me softly.“ “Foldin“

wc = 1; rs = 0 wc = 3; ls = 0

P1 P2

wc = 3

Word count ini+al aggrega+on

txt.par.aggregate((0, 0, 0))



# spaces on the leQ # spaces on the right #words



# spaces on the leQ # spaces on the right #words

””

Word count aggrega+on aggrega+on

... }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res

““ “Folding me“ “softly.“ ““


... }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs)

“e softly.“ “Folding m“


... }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)

“ softly.“ “Folding me”

Word count aggrega+on element

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)

”_”

0 words and a space – add one more space each side


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0)

” m”

0 words and a non-‐space – one word, no spaces on the right side


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)

” me_”

nonzero words and a space – one more space on the right side


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0)

” me sof”

nonzero words, last non-‐space and current non-‐space – no change


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)

” me s”

nonzero words, last space and current non-‐space – one more word


txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })

Word count using parallel strings?

txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })

Word count string not really parallelizable

scala> (txt: String).par


scala> (txt: String).par collection.parallel.ParSeq[Char] = ParArray(…)



different internal representa+on!




ParArray




ParArray

copy string contents into an array

Conversions going parallel

// par is efficient – no copying mutable.{Array, ArrayBuffer, ArraySeq}

mutable.{HashMap, HashSet} immutable.{Vector, Range} immutable.{HashMap, HashSet}


// par is efficient – no copying mutable.{Array, ArrayBuffer, ArraySeq}

mutable.{HashMap, HashSet} immutable.{Vector, Range} immutable.{HashMap, HashSet}

most other collec+ons construct a new parallel collec+on!


sequen&al parallel

Array, ArrayBuffer, ArraySeq mutable.ParArray

mutable.HashMap mutable.ParHashMap

mutable.HashSet mutable.ParHashSet

immutable.Vector immutable.ParVector

immutable.Range immutable.ParRange

immutable.HashMap immutable.ParHashMap

immutable.HashSet immutable.ParHashSet

Custom collec+ons

Custom collec+on

class ParString(val str: String)

Custom collec+on

class ParString(val str: String) extends parallel.immutable.ParSeq[Char] {

Custom collec+on

class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length

Custom collec+on

class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str)

Custom collec+on

class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter: Splitter[Char]

Custom collec+on

class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter = new ParStringSplitter(0, str.length)

Custom collec+on spli_er defini+on

class ParStringSplitter(var i: Int, len: Int) extends Splitter[Char] {

Custom collec+on spli_ers are iterators

class ParStringSplitter(i: Int, len: Int) extends Splitter[Char] { def hasNext = i < len def next = { val r = str.charAt(i) i += 1 r }

Custom collec+on spli_ers must be duplicated

... def dup = new ParStringSplitter(i, len)

Custom collec+on spli_ers know how many elements remain

... def dup = new ParStringSplitter(i, len) def remaining = len - i

Custom collec+on spli_ers can be split

... def psplit(sizes: Int*): Seq[ParStringSplitter] = { val splitted = new ArrayBuffer[ParStringSplitter] for (sz <- sizes) { val next = (i + sz) min ntl splitted += new ParStringSplitter(i, next) i = next } splitted }

Word count now with parallel strings

new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })

Word count performance


new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })

100 ms

cores: 1 2 4 +me: 137 ms 70 ms 35 ms

Hierarchy

GenTraversable

GenIterable

GenSeq

Traversable

Iterable

Seq

ParIterable

ParSeq

Hierarchy

def nonEmpty(sq: Seq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res }

Hierarchy

def nonEmpty(sq: ParSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res }

Hierarchy


side-‐effects! ArrayBuffer is not synchronized!

Hierarchy


side-‐effects! ArrayBuffer is not synchronized!

ParSeq

Seq

Hierarchy

def nonEmpty(sq: GenSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res.synchronized { res += s } } res }

Thank you!

Examples at: git://github.com/axel22/sd.git

Accessors vs. transformers some methods need more than just spli_ers

foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …

map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …




These return collec+ons!




Sequen+al collec+ons – builders




Sequen+al collec+ons – builders Parallel collec+ons – combiners

Builders building a sequen+al collec+on

1 2 3 4 5 6 7 Nil 2 4 6

Nil

ListBuilder

+= += +=

result

Combiners building parallel collec+ons

trait Combiner[-Elem, +To] extends Builder[Elem, To] { def combine[N <: Elem, NewTo >: To] (other: Combiner[N, NewTo]): Combiner[N, NewTo] }



Combiner Combiner Combiner



either use an efficient merge opera+on or do lazy evalua+on

Parallel arrays

1, 2, 3, 4 5, 6, 7, 8 2, 4 6, 8 3, 1, 8, 0 2, 2, 1, 9 8, 0 2, 2

merge merge

merge copy

allocate

2 4 6 8 8 0 2 2

Parallel hash tables

ParHashMap


ParHashMap 0 1 2 4 5 7 8 9

e.g. calling filter


ParHashMap 0 1 2 4 5 7 8 9

ParHashCombiner ParHashCombiner

e.g. calling filter

0 5 1 7 9 4


ParHashMap 0 1 2 4 5 7 8 9

ParHashCombiner

0 1 4

ParHashCombiner

5 7 9


ParHashMap 0 1 2 4 5 7 8 9

ParHashCombiner

0 1 4

ParHashCombiner

5 9

5 7 0 1 4

7

9


ParHashMap


How to merge?

5 7 0 1 4 9

5 7 8 9 1 4 0


buckets! ParHashCombiner ParHashCombiner

0 1 4 9 7 5

ParHashMap 2 0 = 00002

1 = 00012 4 = 01002



0

1

4 9

7

5

combine



9

7

5 0

1

4

ParHashCombiner

no copying!


9

7

5

0

1

4

ParHashCombiner


9 7 5 0 1 4

ParHashMap

Documents

Scala%Parallel%Collec+ons%days2011.scala-lang.org/sites/days2011/files/29. Parallel Collections.pdfScala parallelcollecons for { s