Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Scala Parallel Collec+ons Aleksandar Prokopec
EPFL
Scala collec+ons
for { s <- surnames n <- names if s endsWith n } yield (n, s)
McDonald
Scala collec+ons
for { s <- surnames n <- names if s endsWith n } yield (n, s)
1040 ms
Scala parallel collec+ons
for { s <- surnames n <- names if s endsWith n } yield (n, s)
Scala parallel collec+ons
for { s <- surnames.par n <- names.par if s endsWith n } yield (n, s)
Scala parallel collec+ons
for { s <- surnames.par n <- names.par if s endsWith n } yield (n, s)
2 cores
575 ms
Scala parallel collec+ons
for { s <- surnames.par n <- names.par if s endsWith n } yield (n, s)
4 cores
305 ms
for comprehensions
surnames.par.flatMap { s => names.par .filter(n => s endsWith n) .map(n => (n, s)) }
for comprehensions nested parallelized bulk opera+ons
surnames.par.flatMap { s => names.par .filter(n => s endsWith n) .map(n => (n, s)) }
Nested parallelism
Nested parallelism parallel within parallel
composi+on
surnames.par.flatMap { s => surnameToCollection(s) // may invoke parallel ops }
Nested parallelism going recursive
def vowel(c: Char): Boolean = ...
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield
recursive algorithms
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, Array(""))
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: Seq[String]): Seq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, Array(""))
1545 ms
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray(""))
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray("")) 1 core
1575 ms
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray("")) 2 cores
809 ms
Nested parallelism going recursive
def vowel(c: Char): Boolean = ... def gen(n: Int, acc: ParSeq[String]): ParSeq[String] = if (n == 0) acc else for (s <- gen(n - 1, acc); c <- 'a' to 'z') yield if (s.length == 0) s + c else if (vowel(s.last) && !vowel(c)) s + c else if (!vowel(s.last) && vowel(c)) s + c else s gen(5, ParArray("")) 4 cores
530 ms
So, I just use par and I’m home free?
How to think parallel
Character count use case for foldLeQ
val txt: String = ... txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }
6 5 4 3 2 1 0
Character count use case for foldLeQ
txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }
going leQ to right -‐ not parallelizable!
A B C D E F
_ + 1
Character count use case for foldLeQ
txt.foldLeft(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }
going leQ to right – not really necessary
3 2 1 0 A B C
_ + 1
3 2 1 0 D E F
_ + 1
_ + _ 6
Character count in parallel
txt.fold(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }
Character count in parallel
txt.fold(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }
3 2 1 1 A B C
_ + 1
3 2 1 1 A B C
: (Int, Char) => Int
Character count fold not applicable
txt.fold(0) { case (a, ‘ ‘) => a case (a, c) => a + 1 }
3 2 1 3 A B C
_ + _ 3 3
3 2 1 3 A B C
! (Int, Int) => Int
Character count use case for aggregate
txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1 }, _ + _)
3 2 1 1 A B C
Character count use case for aggregate
txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1 }, _ + _)
_ + _ 3 3
3 2 1 3 A B C
_ + 1
Character count use case for aggregate
aggrega+on element
3 2 1 1 A B C
_ + _ 3 3
3 2 1 3 A B C
txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1 }, _ + _)
B
_ + 1
Character count use case for aggregate
aggrega+on aggrega+on aggrega+on element
3 2 1 1 A B C
_ + _ 3 3
3 2 1 3 A B C
txt.aggregate(0)({ case (a, ‘ ‘) => a case (a, c) => a + 1 }, _ + _)
B
_ + 1
Word count another use case for foldLeQ
txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) }
Word count ini+al accumula+on
txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) }
0 words so far last character was a space
“Folding me softly.”
Word count a space
txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) }
“Folding me softly.”
last seen character is a space
Word count a non space
txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) }
“Folding me softly.”
last seen character was a space – a new word
Word count a non space
txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) }
“Folding me softly.”
last seen character wasn’t a space – no new word
Word count in parallel
“softly.“ “Folding me “
P1 P2
Word count in parallel
“softly.“ “Folding me “
wc = 2; rs = 1 wc = 1; ls = 0
P1 P2
Word count in parallel
“softly.“ “Folding me “
wc = 2; rs = 1 wc = 1; ls = 0 wc = 3
P1 P2
Word count must assume arbitrary par++ons
“g me softly.“ “Foldin“
wc = 1; rs = 0 wc = 3; ls = 0
P1 P2
Word count must assume arbitrary par++ons
“g me softly.“ “Foldin“
wc = 1; rs = 0 wc = 3; ls = 0
P1 P2
wc = 3
Word count ini+al aggrega+on
txt.par.aggregate((0, 0, 0))
Word count ini+al aggrega+on
txt.par.aggregate((0, 0, 0))
# spaces on the leQ # spaces on the right #words
Word count ini+al aggrega+on
txt.par.aggregate((0, 0, 0))
# spaces on the leQ # spaces on the right #words
””
Word count aggrega+on aggrega+on
... }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res
““ “Folding me“ “softly.“ ““
Word count aggrega+on aggrega+on
... }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs)
“e softly.“ “Folding m“
Word count aggrega+on aggrega+on
... }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs)
“ softly.“ “Folding me”
Word count aggrega+on element
txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1)
”_”
0 words and a space – add one more space each side
Word count aggrega+on element
txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0)
” m”
0 words and a non-‐space – one word, no spaces on the right side
Word count aggrega+on element
txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1)
” me_”
nonzero words and a space – one more space on the right side
Word count aggrega+on element
txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0)
” me sof”
nonzero words, last non-‐space and current non-‐space – no change
Word count aggrega+on element
txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0)
” me s”
nonzero words, last space and current non-‐space – one more word
Word count in parallel
txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })
Word count using parallel strings?
txt.par.aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })
Word count string not really parallelizable
scala> (txt: String).par
Word count string not really parallelizable
scala> (txt: String).par collection.parallel.ParSeq[Char] = ParArray(…)
Word count string not really parallelizable
scala> (txt: String).par collection.parallel.ParSeq[Char] = ParArray(…)
different internal representa+on!
Word count string not really parallelizable
scala> (txt: String).par collection.parallel.ParSeq[Char] = ParArray(…)
different internal representa+on!
ParArray
Word count string not really parallelizable
scala> (txt: String).par collection.parallel.ParSeq[Char] = ParArray(…)
different internal representa+on!
ParArray
copy string contents into an array
Conversions going parallel
// par is efficient – no copying mutable.{Array, ArrayBuffer, ArraySeq}
mutable.{HashMap, HashSet} immutable.{Vector, Range} immutable.{HashMap, HashSet}
Conversions going parallel
// par is efficient – no copying mutable.{Array, ArrayBuffer, ArraySeq}
mutable.{HashMap, HashSet} immutable.{Vector, Range} immutable.{HashMap, HashSet}
most other collec+ons construct a new parallel collec+on!
Conversions going parallel
sequen&al parallel
Array, ArrayBuffer, ArraySeq mutable.ParArray
mutable.HashMap mutable.ParHashMap
mutable.HashSet mutable.ParHashSet
immutable.Vector immutable.ParVector
immutable.Range immutable.ParRange
immutable.HashMap immutable.ParHashMap
immutable.HashSet immutable.ParHashSet
Custom collec+ons
Custom collec+on
class ParString(val str: String)
Custom collec+on
class ParString(val str: String) extends parallel.immutable.ParSeq[Char] {
Custom collec+on
class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length
Custom collec+on
class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str)
Custom collec+on
class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter: Splitter[Char]
Custom collec+on
class ParString(val str: String) extends parallel.immutable.ParSeq[Char] { def apply(i: Int) = str.charAt(i) def length = str.length def seq = new WrappedString(str) def splitter = new ParStringSplitter(0, str.length)
Custom collec+on spli_er defini+on
class ParStringSplitter(var i: Int, len: Int) extends Splitter[Char] {
Custom collec+on spli_ers are iterators
class ParStringSplitter(i: Int, len: Int) extends Splitter[Char] { def hasNext = i < len def next = { val r = str.charAt(i) i += 1 r }
Custom collec+on spli_ers must be duplicated
... def dup = new ParStringSplitter(i, len)
Custom collec+on spli_ers know how many elements remain
... def dup = new ParStringSplitter(i, len) def remaining = len - i
Custom collec+on spli_ers can be split
... def psplit(sizes: Int*): Seq[ParStringSplitter] = { val splitted = new ArrayBuffer[ParStringSplitter] for (sz <- sizes) { val next = (i + sz) min ntl splitted += new ParStringSplitter(i, next) i = next } splitted }
Word count now with parallel strings
new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })
Word count performance
txt.foldLeft((0, true)) { case ((wc, _), ' ') => (wc, true) case ((wc, true), x) => (wc + 1, false) case ((wc, false), x) => (wc, false) }
new ParString(txt).aggregate((0, 0, 0))({ case ((ls, 0, _), ' ') => (ls + 1, 0, ls + 1) case ((ls, 0, _), c) => (ls, 1, 0) case ((ls, wc, rs), ' ') => (ls, wc, rs + 1) case ((ls, wc, 0), c) => (ls, wc, 0) case ((ls, wc, rs), c) => (ls, wc + 1, 0) }, { case ((0, 0, 0), res) => res case (res, (0, 0, 0)) => res case ((lls, lwc, 0), (0, rwc, rrs)) => (lls, lwc + rwc - 1, rrs) case ((lls, lwc, _), (_, rwc, rrs)) => (lls, lwc + rwc, rrs) })
100 ms
cores: 1 2 4 +me: 137 ms 70 ms 35 ms
Hierarchy
GenTraversable
GenIterable
GenSeq
Traversable
Iterable
Seq
ParIterable
ParSeq
Hierarchy
def nonEmpty(sq: Seq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res }
Hierarchy
def nonEmpty(sq: ParSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res }
Hierarchy
def nonEmpty(sq: ParSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res }
side-‐effects! ArrayBuffer is not synchronized!
Hierarchy
def nonEmpty(sq: ParSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res += s } res }
side-‐effects! ArrayBuffer is not synchronized!
ParSeq
Seq
Hierarchy
def nonEmpty(sq: GenSeq[String]) = { val res = new mutable.ArrayBuffer[String]() for (s <- sq) { if (s.nonEmpty) res.synchronized { res += s } } res }
Thank you!
Examples at: git://github.com/axel22/sd.git
Accessors vs. transformers some methods need more than just spli_ers
foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …
map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …
Accessors vs. transformers some methods need more than just spli_ers
foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …
map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …
These return collec+ons!
Accessors vs. transformers some methods need more than just spli_ers
foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …
map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …
Sequen+al collec+ons – builders
Accessors vs. transformers some methods need more than just spli_ers
foreach, reduce, find, sameElements, indexOf, corresponds, forall, exists, max, min, sum, count, …
map, flatMap, filter, partition, ++, take, drop, span, zip, patch, padTo, …
Sequen+al collec+ons – builders Parallel collec+ons – combiners
Builders building a sequen+al collec+on
1 2 3 4 5 6 7 Nil 2 4 6
Nil
ListBuilder
+= += +=
result
Combiners building parallel collec+ons
trait Combiner[-Elem, +To] extends Builder[Elem, To] { def combine[N <: Elem, NewTo >: To] (other: Combiner[N, NewTo]): Combiner[N, NewTo] }
Combiners building parallel collec+ons
trait Combiner[-Elem, +To] extends Builder[Elem, To] { def combine[N <: Elem, NewTo >: To] (other: Combiner[N, NewTo]): Combiner[N, NewTo] }
Combiner Combiner Combiner
Combiners building parallel collec+ons
trait Combiner[-Elem, +To] extends Builder[Elem, To] { def combine[N <: Elem, NewTo >: To] (other: Combiner[N, NewTo]): Combiner[N, NewTo] }
either use an efficient merge opera+on or do lazy evalua+on
Parallel arrays
1, 2, 3, 4 5, 6, 7, 8 2, 4 6, 8 3, 1, 8, 0 2, 2, 1, 9 8, 0 2, 2
merge merge
merge copy
allocate
2 4 6 8 8 0 2 2
Parallel hash tables
ParHashMap
Parallel hash tables
ParHashMap 0 1 2 4 5 7 8 9
e.g. calling filter
Parallel hash tables
ParHashMap 0 1 2 4 5 7 8 9
ParHashCombiner ParHashCombiner
e.g. calling filter
0 5 1 7 9 4
Parallel hash tables
ParHashMap 0 1 2 4 5 7 8 9
ParHashCombiner
0 1 4
ParHashCombiner
5 7 9
Parallel hash tables
ParHashMap 0 1 2 4 5 7 8 9
ParHashCombiner
0 1 4
ParHashCombiner
5 9
5 7 0 1 4
7
9
Parallel hash tables
ParHashMap
ParHashCombiner ParHashCombiner
How to merge?
5 7 0 1 4 9
5 7 8 9 1 4 0
Parallel hash tables
buckets! ParHashCombiner ParHashCombiner
0 1 4 9 7 5
ParHashMap 2 0 = 00002
1 = 00012 4 = 01002
Parallel hash tables
ParHashCombiner ParHashCombiner
0
1
4 9
7
5
combine
Parallel hash tables
ParHashCombiner ParHashCombiner
9
7
5 0
1
4
ParHashCombiner
no copying!
Parallel hash tables
9
7
5
0
1
4
ParHashCombiner
Parallel hash tables
9 7 5 0 1 4
ParHashMap