Upload
techtalks
View
385
Download
0
Embed Size (px)
Citation preview
Copyright 2015 Kirk Pepperdine. All rights reserved
Ripping Apart Java 8 Parallel Streams
Kirk Pepperdine
Copyright 2015 Kirk Pepperdine. All rights reserved
About MeSpecialize in performance tuning
speak frequently about performance
author of performance tuning workshop
Co-founder jClarity
performance diagnositic tooling
Java Champion (since 2006)
Copyright 2015 Kirk Pepperdine. All rights reserved
Copyright 2015 Kirk Pepperdine. All rights reserved
Java 8 Lambda expressionsSyntaxtic sugar for a specific type of inner class
(parameters) -> body; () -> 42; (x,y) -> x * y;
Copyright 2015 Kirk Pepperdine. All rights reserved
Java 8 Lambda expressions
Pattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");
Function<String,Matcher> match = new Function<String,Matcher>() { @Override public Matcher apply(String logEntry) { return applicationTimePattern.matcher(logEntry); }}; String example = "2.869: Application time: 1.0001540 seconds"; Matcher matcher = match.apply(example);
Copyright 2015 Kirk Pepperdine. All rights reserved
Java 8 Lambda expressions
Pattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");
Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); String example = "2.869: Application time: 1.0001540 seconds"; Matcher matcher = match.apply(example);
Copyright 2015 Kirk Pepperdine. All rights reserved
Java 8 Lambda expressions
Pattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");
Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); Predicate<Matcher> matches = new Predicate<Matcher>() { @Override public boolean test(Matcher matcher) { return matcher.find(); }};
String example = "2.869: Application time: 1.0001540 seconds";
Copyright 2015 Kirk Pepperdine. All rights reserved
Java 8 Lambda expressions
Pattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");
Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); Predicate<Matcher> matches = matcher -> matcher.find();
String example = "2.869: Application time: 1.0001540 seconds”;
Boolean b = matches.test(match.apply(example));
Copyright 2015 Kirk Pepperdine. All rights reserved
Java 8 Lambda expressionsPattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");
Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); Predicate<Matcher> matches = matcher -> matcher.find();
Function<Matcher,String> extract = matcher -> { if ( matches.test( matcher)) return matcher.group(2); else return ""; };
String example = "2.869: Application time: 1.0001540 seconds”;
String value = extract.apply(match.apply(example));
Copyright 2015 Kirk Pepperdine. All rights reserved
Java 8 Lambda expressionsPattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");
Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); Predicate<Matcher> matches = matcher -> matcher.find();
Function<Matcher,String> extract = matcher -> { if ( matches.test( matcher)) return matcher.group(2); else return ""; };
Consumer<Matcher> toDouble = matcher -> { if ( matches.test(matcher)) pauseTimes.add(Double.parseDouble(extract.apply(matcher)));}; Matcher matcher = match.apply(example);toDouble.accept(matcher);
Copyright 2015 Kirk Pepperdine. All rights reserved
Lazy Execution
for ( int i = 0; i < 5000; i++) { LOGGER.fine (() -> "Trace value: " + getValue());}
for ( int i = 0; i < 5000; i++) { LOGGER.fine ("Trace value: " + getValue());} 3200ms
0ms
Copyright 2015 Kirk Pepperdine. All rights reserved
Java 8 Lambda expressions
public void processStrings( List<String> logEntries, Function<String,Matcher> mapper, Consumer collector) { logEntries.forEach( entry -> collector.accept( mapper.apply(entry)));
main.processStrings(logEntries, match, toDouble); main.processStrings(logEntries, anotherMatch, toKBytes);
Copyright 2015 Kirk Pepperdine. All rights reserved
Language changes in Java 8Method references
String::length aString::length ClassName::new
(aString) -> aString::length;
Copyright 2015 Kirk Pepperdine. All rights reserved
StreamAbstraction that allows you to apply a set of functions over the elements in a collection
functions are combined to form a stream pipeline
sequential or parallel
operations divided into intermediate and terminal operations
Copyright 2015 Kirk Pepperdine. All rights reserved
StreamDefined in interface Collection::stream()
many classes implement stream()
Arrays.stream(Object[])
Stream.of(Object[]), .iterate(Object,UnaryOperator)
File.lines(), BufferedReader.lines(), Random.ints(), JarFile.stream()
Copyright 2015 Kirk Pepperdine. All rights reserved
Old School Code
public void imperative() throws IOException { Population pop = new Population(); String logRecord; double value = 0.0d;
BufferedReader reader = new BufferedReader(new FileReader(gcLogFileName)); while ( ( logRecord = reader.readLine()) != null) { Matcher matcher = applicationStoppedTimePattern.matcher(logRecord); if ( matcher.find()) { pop.add(Double.parseDouble( matcher.group(2))); } } System.out.println(pop.toString()); }
Copyright 2015 Kirk Pepperdine. All rights reserved
gcLogEntries.stream() .map(applicationStoppedTimePattern::matcher) .filter(Matcher::find) .map(matcher -> matcher.group(2)) .mapToDouble(Double::parseDouble) .summaryStatistics();
data source start streaming
map to Matcher
filter out uninteresting bits
map to Double
extract group
aggregate results
Copyright 2015 Kirk Pepperdine. All rights reserved
Copyright 2015 Kirk Pepperdine. All rights reserved
Parallel StreamsIf collection supports Spliterator operations can be performed in parallel
Spliterator is like an internal iterator
split source into several spliterator
traverse source
Many of the same rules as Iterator
however may work (non-deterministically) with I/O
Copyright 2015 Kirk Pepperdine. All rights reserved
gcLogEntries.stream().parallel(). .map(applicationStoppedTimePattern::matcher) .filter(Matcher::find) .map(matcher -> matcher.group(2)) .mapToDouble(Double::parseDouble) .summaryStatistics();
parallelize stream
Copyright 2015 Kirk Pepperdine. All rights reserved
Copyright 2015 Kirk Pepperdine. All rights reserved
The first time we tried, the imperative single threaded code
Copyright 2015 Kirk Pepperdine. All rights reserved
ran faster (????)
Copyright 2015 Kirk Pepperdine. All rights reserved
Flat profile of 8.70 secs (638 total ticks): ForkJoinPool.commonPool-worker-3
Compiled + native Method 60.8% 118 + 0 java.util.concurrent.ForkJoinTask.doExec
Raw Profiler Output
Copyright 2015 Kirk Pepperdine. All rights reserved
Flat profile of 8.70 secs (638 total ticks): ForkJoinPool.commonPool-worker-3
Compiled + native Method 60.8% 118 + 0 java.util.concurrent.ForkJoinTask.doExec
Raw Profiler Output
Learned 2 things very quickly
Parallel streams use Fork-Join
Fork-Join comes with some significant over-head
Copyright 2015 Kirk Pepperdine. All rights reserved
Predicted that applcations using parallel streams would
For other reasons….
run very very slowly
Copyright 2015 Kirk Pepperdine. All rights reserved
Then, witnessing it!
Copyright 2015 Kirk Pepperdine. All rights reserved
When to use StreamsTragedy of the commons
Handoff costs
CNPQ performance model
Fork-Join
Copyright 2015 Kirk Pepperdine. All rights reserved
Tragidy of the Commons
Copyright 2015 Kirk Pepperdine. All rights reserved
You have a finite amount of hardware
it might be in your best interest to grab it all
if everyone behaves the same way….
Be a good neighbour
Copyright 2015 Kirk Pepperdine. All rights reserved
Handoff CostsIt costs about ~80,000 clocks to handoff data between threads
overhead = 80000*handoff rate/cycles
10%= 80000*handoff rate/2000000000~2500 handoffs/sec
Copyright 2015 Kirk Pepperdine. All rights reserved
CPNQ Performance ModelC - number of submitters
P - number of CPUs
N - number of elements
Q - cost of the operation
Copyright 2015 Kirk Pepperdine. All rights reserved
C/P/N/QNeed to amortize setup costs
NQ needs to be large
Q can often only be estimated
N often should be > 10,000 elements
P assumes CPU is the limiting resource
Copyright 2015 Kirk Pepperdine. All rights reserved
Fork-JoinSupport for Fork-Join added in Java 7
difficult coding idiom to master
Streams make Fork-Join more reachable
how fork-join works and performs is important to your latency picture
Copyright 2015 Kirk Pepperdine. All rights reserved
ImplementationUsed internally by a parallel stream
break the stream up into chunks and submit each chunk as a ForkJoinTask
apply filter().map().reduce() to each ForkJoinTask
Calls ForkJoinTask invoke() and join() to retrieve results
Copyright 2015 Kirk Pepperdine. All rights reserved
Common Thread PoolFork-Join by default uses a common thread pool
default number of worker threads == number of logical cores - 1
Always contains at least one thread
Copyright 2015 Kirk Pepperdine. All rights reserved
Little’s LawFork-Join uses a work queue
work queue behavior can be modeled using Little’s Law
Number of tasks in the system is the Arrival rate * average service time
Copyright 2015 Kirk Pepperdine. All rights reserved
Little’s Law ExampleSystem sees 400 TPS. It takes 100ms to process a request
Number of tasks in system = 0.100 * 400 = 40
On an 8 core machine
implies 39 Fork-Join tasks are sitting in queue accumulating dead time
40*100 = 4000 ms of which 3900 is dead time
97.5% of service time is in waiting
Copyright 2015 Kirk Pepperdine. All rights reserved
Normal Reaction
if there is sufficient capacity then make the pool bigger else add capacity or tune to reduce strength of the dependency
Copyright 2015 Kirk Pepperdine. All rights reserved
Configuring Common PoolSize of common ForkJoinPool is
Runtime.getRuntime().availableProcessors() - 1
Can configure things
-Djava.util.concurrent.ForkJoinPool.common.parallelism=N
-Djava.util.concurrent.ForkJoinPool.common.threadFactory
-Djava.util.concurrent.ForkJoinPool.common.exceptionHandler
Copyright 2015 Kirk Pepperdine. All rights reserved
All Hands on Deck!!!
Everyone is working together to get it done!
Copyright 2015 Kirk Pepperdine. All rights reserved
Resource DependencyIf task is CPU bound, you can’t increase the size of the thread pool
CPU may not be the limiting factor
throughput is limited by another dependent resource
I/O, shared variable (lock)
Adding threads == increased latency
predicted by Little’s Law
Copyright 2015 Kirk Pepperdine. All rights reserved
ForkJoinPool invokeForkJoinPool.invoke(ForkJoinTask) uses the submitting thread as a worker
If 100 threads all call invoke(), we would have 100+ForkJoinThreads exhausting the limiting resource, e.g. CPUs, IO, etc.
Copyright 2015 Kirk Pepperdine. All rights reserved
ForkJoinPool submit/getForkJoinPool.submit(Callable).get() suspends the submitting thread
If 100 jobs all call submit(), the work queue can become very long, thus adding latency
Copyright 2015 Kirk Pepperdine. All rights reserved
Our own ForkJoinPoolWhen used inside a ForkJoinPool, the ForkJoinTask.fork() method uses the current pool
ForkJoinPool ourOwnPool = new ForkJoinPool(10); ourOwnPool.invoke(() -> stream.parallel(). …
Copyright 2015 Kirk Pepperdine. All rights reserved
ForkJoinPool
public void parallel() throws IOException { ForkJoinPool forkJoinPool = new ForkJoinPool(10); Stream<String> stream = Files.lines( new File(gcLogFileName).toPath()); forkJoinPool.submit(() -> stream.parallel() .map(applicationStoppedTimePattern::matcher) .filter(Matcher::find) .map(matcher -> matcher.group(2)) .mapToDouble(Double::parseDouble) .summaryStatistics().toString()); }
Copyright 2015 Kirk Pepperdine. All rights reserved
ForkJoinPool ObservabilityForkJoinPool comes with no visibility
need to instrument ForkJoinTask.invoke()
gather needed from ForkJoinPool needed to feed into Little’s Law
Copyright 2015 Kirk Pepperdine. All rights reserved
Instrumenting ForkJoinPoolWe can get the statistics needed from ForkJoinPool needed for Little’s Law
need to instrument ForkJoinTask::invoke()
public final V invoke() { ForkJoinPool.common.getMonitor().submitTask(this); int s; if ((s = doInvoke() & DONE_MASK) != NORMAL) reportException(s); ForkJoinPool.common.getMonitor().retireTask(this); return getRawResult(); }
Copyright 2015 Kirk Pepperdine. All rights reserved
In an application where you have many parallel stream operations all running concurrently performance will be limited by the size of the common thread pool
Copyright 2015 Kirk Pepperdine. All rights reserved
Benchmark Alert!!!!!!
Copyright 2015 Kirk Pepperdine. All rights reserved
PerformanceSubmitt log parsing to our own ForkJoinPool
new ForkJoinPool(16).submit(() -> ……… ).get() new ForkJoinPool(8).submit(() -> ……… ).get() new ForkJoinPool(4).submit(() -> ……… ).get()
Copyright 2015 Kirk Pepperdine. All rights reserved
16 workerthreads
8 workerthreads
4 workerthreads
Copyright 2015 Kirk Pepperdine. All rights reserved
0
75000
150000
225000
300000
Stream Parallel Flood Stream Flood Parallel
Copyright 2015 Kirk Pepperdine. All rights reserved
Going parallel might not give you the expected gains
Copyright 2015 Kirk Pepperdine. All rights reserved
You may not know this until you hit production!
Copyright 2015 Kirk Pepperdine. All rights reserved
Monitoring internals of JDK is important to understanding where bottlenecks are
Copyright 2015 Kirk Pepperdine. All rights reserved
JDK is not all that well instrumented
Copyright 2015 Kirk Pepperdine. All rights reserved
APIs have changed so you need to re-read the javadocs
even for your old familiar classes
Copyright 2015 Kirk Pepperdine. All rights reserved
Performance Seminar
www.kodewerk.com
Java P
erform
ance
Tuning
,
May 26-
29, Chan
ia
Greece