Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Java 8 Parallel Stream Internals
(Part 3)
Douglas C. [email protected]
www.dre.vanderbilt.edu/~schmidt
Professor of Computer Science
Institute for Software
Integrated Systems
Vanderbilt University
Nashville, Tennessee, USA
2
• Understand parallel stream internals, e.g.
• Know what can change & what can’t
• Partition a data source into “chunks”
• Process chunks in parallel via thecommon fork-join pool
Learning Objectives in this Part of the Lesson
join join
join
Processsequentially
Processsequentially
Processsequentially
Processsequentially
InputString1.1 InputString1.2 InputString2.1 InputString2.2
InputString1 InputString2
trySplit()
InputString
trySplit() trySplit()
See www.ibm.com/developerworks/library/j-java-streams-3-brian-goetz
3
• Understand parallel stream internals, e.g.
• Know what can change & what can’t
• Partition a data source into “chunks”
• Process chunks in parallel via thecommon fork-join pool
• Recognize how the common fork-join pool is implemented
Learning Objectives in this Part of the Lesson
See gee.cs.oswego.edu/dl/papers/fj.pdf
4
Processing Chunks in Parallel via the Common
ForkJoinPool
5
• Chunks created by a spliterator are processed in the common fork-join pool
Fork-Join Pool
See gee.cs.oswego.edu/dl/papers/fj.pdf
Processing Chunks in Parallel via the Common ForkJoinPool
6
• A fork-join pool provides a high performance, fine-grained task execution framework for Java data parallelism
See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html
Processing Chunks in Parallel via the Common ForkJoinPool
7
• A fork-join pool provides a high performance, fine-grained task execution framework for Java data parallelism
• It provides a parallel computing engine for many higher-level frameworks
See www.infoq.com/interviews/doug-lea-fork-join
filter(not(this::urlCached))
collect(toFuture())
map(this::downloadImageAsync)
flatMap(this::applyFiltersAsync)
collect(toList())
Parallel Streams
…
filter(not(this::urlCached))
map(this::downloadImage)
flatMap(this::applyFilters)
Completable Futures
…
ForkJoinPool
Processing Chunks in Parallel via the Common ForkJoinPool
8
• ForkJoinPool implements the Executor Service interface
See docs.oracle.com/javase/tutorial/essential/concurrency/executors.html
Processing Chunks in Parallel via the Common ForkJoinPool
9
• ForkJoinPool implements the Executor Service interface
• A ForkJoinPool executes ForkJoinTasks
See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinTask.html
Processing Chunks in Parallel via the Common ForkJoinPool
10
• ForkJoinPool implements the Executor Service interface
• A ForkJoinPool executes ForkJoinTasks
• ForkJoinTask associates a chunk of data along with a computation on that datato enable fine-grained parallelism
See www.dre.vanderbilt.edu/~schmidt/PDF/DataParallelismInJava.pdf
Processing Chunks in Parallel via the Common ForkJoinPool
11
• A ForkJoinTask is lighter weight than a Java thread
Thread
ForkJoinTask
e.g., it omits its own run-time stack, registers, thread-local storage, etc.
Processing Chunks in Parallel via the Common ForkJoinPool
12
• A ForkJoinTask is lighter weight than a Java thread
• A large # of ForkJoinTasksthus run in a small # of Java threads in a ForkJoinPool
ForkJoinTasks
See www.infoq.com/interviews/doug-lea-fork-join
Processing Chunks in Parallel via the Common ForkJoinPool
13
Sub-Task1.1
• Parallel streams are a “user friendly” ForkJoinPool façade
See espressoprogrammer.com/fork-join-vs-parallel-stream-java-8
Sub-Task1.2
Sub-Task1.3
Sub-Task1.4
Sub-Task3.3
Sub-Task3.4
Deque Deque Deque
map(phrase -> searchForPhrase(…))
filter(not(SearchResults::isEmpty))
collect(toList())
45,000+ phrases
Search Phrases
Processing Chunks in Parallel via the Common ForkJoinPool
14
• Parallel streams are a “user friendly” ForkJoinPool façade
• You can program directly to the ForkJoinPool API, though it can be somewhat painful!
List<List<SearchResults>>
listOfListOfSearchResults =
ForkJoinPool.commonPool()
.invoke(new
SearchWithForkJoinTask
(inputList,
mPhrasesToFind, ...));
I gave you the
chance of
programming
Java 8 streams
But you have
elected the
way of pain!
Processing Chunks in Parallel via the Common ForkJoinPool
15
• Parallel streams are a “user friendly” ForkJoinPool façade
• You can program directly to the ForkJoinPool API, though it can be somewhat painful!
See SearchStreamGang/src/main/java/livelessons/streamgangs/SearchWithForkJoin.java
Use the common fork-join pool to search input strings
for phrases that match
45,000+ phrases
Search Phrases
Input Strings to Search
…
List<List<SearchResults>>
listOfListOfSearchResults =
ForkJoinPool.commonPool()
.invoke(new
SearchWithForkJoinTask
(inputList,
mPhrasesToFind, ...));
Processing Chunks in Parallel via the Common ForkJoinPool
16
• Parallel streams are a “user friendly” ForkJoinPool façade
• You can program directly to the ForkJoinPool API, though it can be somewhat painful!
• Best used for algorithms that don’t match Java 8’s parallel streams programming model
See www.oracle.com/technetwork/articles/java/fork-join-422606.html
Long compute() {
long count = 0L;
List<RecursiveTask<Long>> forks =
new LinkedList<>();
for (Folder sub : mFolder.getSubs()){
FolderSearchTask task = new
FolderSearchTask(sub, mWord);
forks.add(task); task.fork();
}
for (Doc doc : mFolder.getDocs()) {
DocSearchTask task =
new DocSearchTask(doc, mWord);
forks.add(task); task.fork();
}
for (RecursiveTask<Long> task : forks)
count = count + task.join();
return count;
}
Processing Chunks in Parallel via the Common ForkJoinPool
17
• All parallel streams in a process share the common fork-join pool
See dzone.com/articles/common-fork-join-pool-and-streams
Processing Chunks in Parallel via the Common ForkJoinPool
18
• All parallel streams in a process share the common fork-join pool
• Helps optimize resource utilizationby knowing what cores are being used globally within a process
See dzone.com/articles/common-fork-join-pool-and-streams
Processing Chunks in Parallel via the Common ForkJoinPool
19
• All parallel streams in a process share the common fork-join pool
• Helps optimize resource utilization by knowing what cores are being used globally within a process
• There are (intentionally) few “knobs” to control this (or any) fork-join pool
Processing Chunks in Parallel via the Common ForkJoinPool
See www.youtube.com/watch?v=sq0MX3fHkro
20
• All parallel streams in a process share the common fork-join pool
• Helps optimize resource utilization by knowing what cores are being used globally within a process
• There are (intentionally) few “knobs” to control this (or any) fork-join pool
• Contrast with the ThreadPoolExecutorframework
Processing Chunks in Parallel via the Common ForkJoinPool
21
• All parallel streams in a process share the common fork-join pool
• Helps optimize resource utilization by knowing what cores are being used globally within a process
• There are (intentionally) few “knobs” that control this fork-join pool
• You can configure the pool size
Processing Chunks in Parallel via the Common ForkJoinPool
See Part 4 of this lesson for details
System.setProperty
("java.util.concurrent"
+ ".ForkJoinPool.common"
+ ".parallelism",
10);
Desired # threads
22
Mapping Parallel Streams onto the Java Fork-Join Pool
23
• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool
Mapping Parallel Streams onto the Java Fork-Join Pool
abstract class AbstractTask ... { ...
public void compute() {
Spliterator<P_IN> rs = spliterator, ls;
boolean forkRight = false; ...
while(... (ls = rs.trySplit()) != null){
K taskToFork;
if (forkRight)
{ forkRight = false; ... taskToFork = ...makeChild(rs); }
else
{ forkRight = true; ... taskToFork = ...makeChild(ls); }
taskToFork.fork();
}
} ...
See openjdk/8-b132/java/util/stream/AbstractTask.java
Abstract base class for most fork-join tasks used to implement stream ops
24
• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool
Mapping Parallel Streams onto the Java Fork-Join Pool
abstract class AbstractTask ... { ...
public void compute() {
Spliterator<P_IN> rs = spliterator, ls;
boolean forkRight = false; ...
while(... (ls = rs.trySplit()) != null){
K taskToFork;
if (forkRight)
{ forkRight = false; ... taskToFork = ...makeChild(rs); }
else
{ forkRight = true; ... taskToFork = ...makeChild(ls); }
taskToFork.fork();
}
} ...
Decides whether to split a task further or compute it directly
25See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#trySplit
• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool
Mapping Parallel Streams onto the Java Fork-Join Pool
abstract class AbstractTask ... { ...
public void compute() {
Spliterator<P_IN> rs = spliterator, ls;
boolean forkRight = false; ...
while(... (ls = rs.trySplit()) != null){
K taskToFork;
if (forkRight)
{ forkRight = false; ... taskToFork = ...makeChild(rs); }
else
{ forkRight = true; ... taskToFork = ...makeChild(ls); }
taskToFork.fork();
}
} ...
Try to partition the input source until trySplit() returns null
26
• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool
Mapping Parallel Streams onto the Java Fork-Join Pool
abstract class AbstractTask ... { ...
public void compute() {
Spliterator<P_IN> rs = spliterator, ls;
boolean forkRight = false; ...
while(... (ls = rs.trySplit()) != null){
K taskToFork;
if (forkRight)
{ forkRight = false; ... taskToFork = ...makeChild(rs); }
else
{ forkRight = true; ... taskToFork = ...makeChild(ls); }
taskToFork.fork();
}
} ...
Alternative which child is forked to avoid biased spliterators
27
• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool
Mapping Parallel Streams onto the Java Fork-Join Pool
abstract class AbstractTask ... { ...
public void compute() {
Spliterator<P_IN> rs = spliterator, ls;
boolean forkRight = false; ...
while(... (ls = rs.trySplit()) != null){
K taskToFork;
if (forkRight)
{ forkRight = false; ... taskToFork = ...makeChild(rs); }
else
{ forkRight = true; ... taskToFork = ...makeChild(ls); }
taskToFork.fork();
}
} ...
Fork off a new sub-task & continue processing the other in the loop
See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinTask.html#fork
28
• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool
Mapping Parallel Streams onto the Java Fork-Join Pool
abstract class AbstractTask ... { ...
public void compute() {
Spliterator<P_IN> rs = spliterator, ls;
boolean forkRight = false; ...
while(... (ls = rs.trySplit()) != null){
...
}
task.setLocalResult(task.doLeaf());
} ...
This method typically calls forEachRemaining() to process elements in the stream sequentially
See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#forEachRemaining
29
• Each worker thread in the common fork-join pool runs a loop scanningfor parallel stream tasks to run
Mapping Parallel Streams onto the Java Fork-Join Pool
30
• Each worker thread in the common fork-join pool runs a loop scanningfor parallel stream tasks to run
• Goal is to keep worker threads & cores as busy as possible!
Mapping Parallel Streams onto the Java Fork-Join Pool
31
• A worker thread has a “double-ended queue” (aka “deque”) that serves as its main source of tasks
Sub-Task1.1
Sub-Task1.2
Sub-Task1.3 Sub-Task3.3
Sub-Task3.4
WorkQueue WorkQueue WorkQueue
See en.wikipedia.org/wiki/Double-ended_queue
Sub-Task1.4
Mapping Parallel Streams onto the Java Fork-Join Pool
32
• A worker thread has a “double-ended queue” (aka “deque”) that serves as its main source of tasks
• Implemented by WorkQueue
Sub-Task1.1
Sub-Task1.2
Sub-Task1.3 Sub-Task3.3
Sub-Task3.4
WorkQueue WorkQueue WorkQueue
See java8/util/concurrent/ForkJoinPool.java
Sub-Task1.4
Mapping Parallel Streams onto the Java Fork-Join Pool
33
• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque
Sub-Task1.1
Sub-Task1.2
Sub-Task1.3 Sub-Task3.3
Sub-Task3.4Sub-Task2.4
WorkQueue WorkQueue WorkQueue
Sub-Task1.4
See gee.cs.oswego.edu/dl/papers/fj.pdf
2.push()
1.fork()
Mapping Parallel Streams onto the Java Fork-Join Pool
34
• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque
• Each worker thread processes its deque in LIFO order
Sub-Task1.1
Sub-Task1.2
Sub-Task1.3 Sub-Task3.3
Sub-Task3.4
Sub-Task2.4
WorkQueue WorkQueue WorkQueue
Sub-Task1.4
2.pop()
1.join()
See en.wikipedia.org/wiki/Stack_(abstract_data_type)
Mapping Parallel Streams onto the Java Fork-Join Pool
35
• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque
• Each worker thread processes its deque in LIFO order
• A task pop’d from the head of a deque is run to completion
Sub-Task1.1
Sub-Task1.2
Sub-Task1.3 Sub-Task3.3
Sub-Task3.4
Sub-Task2.4
WorkQueue WorkQueue WorkQueue
Sub-Task1.4
2.pop()
1.join()
See en.wikipedia.org/wiki/Run_to_completion_scheduling
Mapping Parallel Streams onto the Java Fork-Join Pool
36
• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque
• Each worker thread processes its deque in LIFO order
• A task pop’d from the head of a deque is run to completion
• join() “pitches in” to pop& execute (sub-)tasks
Sub-Task1.1
Sub-Task1.2
Sub-Task1.3 Sub-Task3.3
Sub-Task3.4
Sub-Task2.4
WorkQueue WorkQueue WorkQueue
Sub-Task1.4
2.pop()
1.join()
Mapping Parallel Streams onto the Java Fork-Join Pool
37
• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque
• Each worker thread processes its deque in LIFO order
• A task pop’d from the head of a deque is run to completion
• join() “pitches in” to pop& execute (sub-)tasks
Sub-Task1.1
Sub-Task1.2
Sub-Task1.3 Sub-Task3.3
Sub-Task3.4
Sub-Task2.4
WorkQueue WorkQueue WorkQueue
Sub-Task1.4
2.pop()
1.join()
“Collaborative Jiffy Lube” model of processing!
Mapping Parallel Streams onto the Java Fork-Join Pool
38
• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque
• Each worker thread processes its deque in LIFO order
• LIFO order improves locality of reference & cache performance
Sub-Task1.1
Sub-Task1.2
Sub-Task1.3 Sub-Task3.3
Sub-Task3.4
Sub-Task2.4
WorkQueue WorkQueue WorkQueue
Sub-Task1.4
See en.wikipedia.org/wiki/Locality_of_reference
2.pop()
1...
Mapping Parallel Streams onto the Java Fork-Join Pool
39
• Worker threads only block if no parallel stream tasks are available for them to run
Mapping Parallel Streams onto the Java Fork-Join PoolWorkQueueWorkQueue WorkQueue
40See Doug Lea’s talk at www.youtube.com/watch?v=sq0MX3fHkro
Mapping Parallel Streams onto the Java Fork-Join Pool• Worker threads only block if no
parallel stream tasks are available for them to run
• Blocking worker threads & cores are costly on modern processors
WorkQueueWorkQueue WorkQueue
41
• Worker threads only block if no parallel stream tasks are available for them to run
• Blocking worker threads & cores are costly on modern processors
• Each worker thread thereforechecks other deques in thepool to find other tasks to run
Mapping Parallel Streams onto the Java Fork-Join Pool
Sub-Task1.1
Sub-Task1.2
Sub-Task1.3 Sub-Task3.3
Sub-Task3.4
WorkQueue WorkQueue WorkQueue
Sub-Task1.4
42
• To maximize core utilization, idle worker threads “steal” work from the tail of busy threads’ deques
See docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
Sub-Task1.2
Sub-Task1.3
Sub-Task1.4
Sub-Task1.1
Sub-Task3.3
Sub-Task3.4
WorkQueue WorkQueue WorkQueue
poll()
Mapping Parallel Streams onto the Java Fork-Join Pool
43
• To maximize core utilization, idle worker threads “steal” work from the tail of busy threads’ deques
Sub-Task1.2
Sub-Task1.3
Sub-Task1.4
Sub-Task1.1
Sub-Task3.3
Sub-Task3.4
WorkQueue WorkQueue WorkQueue
poll()
A worker thread deque to steal from is selected randomly to lower contention
Mapping Parallel Streams onto the Java Fork-Join Pool
44
• Tasks are stolen in FIFO order
Sub-Task1.2
Sub-Task1.3
Sub-Task1.4
Sub-Task1.1
Sub-Task3.3
Sub-Task3.4
WorkQueue WorkQueue WorkQueue
See en.wikipedia.org/wiki/FIFO_(computing_and_electronics)
poll()
Mapping Parallel Streams onto the Java Fork-Join Pool
45
• Tasks are stolen in FIFO order, e.g.
• Minimizes contention withthread owning the deque
Sub-Task1.2
Sub-Task1.3
Sub-Task1.4
Sub-Task1.1
Sub-Task3.3
Sub-Task3.4
WorkQueue WorkQueue WorkQueue
See www.ibm.com/support/knowledgecenter/en/SS3KLZ/com.ibm.java.diagnostics.healthcenter.doc/topics/resolving.html
poll()
Mapping Parallel Streams onto the Java Fork-Join Pool
46
• Tasks are stolen in FIFO order, e.g.
• Minimizes contention withthread owning the deque
• An older stolen task may yielda larger unit of work due to theway in which spliterators work
Sub-Task1.2
Sub-Task1.3
Sub-Task1.4
Sub-Task1.1
Sub-Task3.3
Sub-Task3.4
WorkQueue WorkQueue WorkQueue
poll()
Mapping Parallel Streams onto the Java Fork-Join Pool
List<String>1.1 List<String>1.2
List<String>1 List<String>2
trySplit()
List<String>
trySplit()
List<String>2.1 List<String>2.2
trySplit()
47
• Tasks are stolen in FIFO order, e.g.
• Minimizes contention withthread owning the deque
• An older stolen task may yielda larger unit of work due to theway in which spliterators work
• Enables further recursive decompositions by the stealing thread
Sub-Task3.3
Sub-Task3.4
WorkQueue WorkQueue WorkQueue
Sub-Task1.1.1
Sub-Task1.1.2
Sub-Task1.1.3
Sub-Task1.1.4
Sub-Task1.2
Sub-Task1.3
Sub-Task1.4
Mapping Parallel Streams onto the Java Fork-Join Pool
48See www.dre.vanderbilt.edu/~schmidt/PDF/work-stealing-deque.pdf
• The WorkQueue deque that implements work-stealing minimizes locking contention
poll()
push()
pop()
Mapping Parallel Streams onto the Java Fork-Join Pool
49
• The WorkQueue deque that implements work-stealing minimizes locking contention
• push() & pop() are only called by the owning worker thread
poll()
push()
pop()
Mapping Parallel Streams onto the Java Fork-Join Pool
50
• The WorkQueue deque that implements work-stealing minimizes locking contention
• push() & pop() are only called by the owning worker thread
• These methods use wait-free “compare-and-swap” (CAS) operations
poll()
push()
pop()
See en.wikipedia.org/wiki/Compare-and-swap
Mapping Parallel Streams onto the Java Fork-Join Pool
51
• The WorkQueue deque that implements work-stealing minimizes locking contention
• push() & pop() are only called by the owning worker thread
• poll() may be called from another worker thread to “steal” a task
poll()
push()
pop()
Mapping Parallel Streams onto the Java Fork-Join Pool
52
• The WorkQueue deque that implements work-stealing minimizes locking contention
• push() & pop() are only called by the owning worker thread
• poll() may be called from another worker thread to “steal” a task
• May not always be wait-free
poll()
push()
pop()
See gee.cs.oswego.edu/dl/papers/fj.pdf
Mapping Parallel Streams onto the Java Fork-Join Pool
53
• The WorkQueue deque that implements work-stealing minimizes locking contention
• push() & pop() are only called by the owning worker thread
• poll() may be called from another worker thread to “steal” a task
• May not always be wait-free
• See “Implementation Overview” comments in the ForkJoinPoolsource code for details..
poll()
push()
pop()
See java8/util/concurrent/ForkJoinPool.java
Mapping Parallel Streams onto the Java Fork-Join Pool
54
End of Java 8 Parallel Stream Internals (Part 3)