59
Copyright 2015 Kirk Pepperdine. All rights reserved Ripping Apart Java 8 Parallel Streams Kirk Pepperdine

Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Embed Size (px)

Citation preview

Page 1: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Ripping Apart Java 8 Parallel Streams

Kirk Pepperdine

Page 2: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

About MeSpecialize in performance tuning

speak frequently about performance

author of performance tuning workshop

Co-founder jClarity

performance diagnositic tooling

Java Champion (since 2006)

Page 3: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Page 4: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Java 8 Lambda expressionsSyntaxtic sugar for a specific type of inner class

(parameters) -> body; () -> 42; (x,y) -> x * y;

Page 5: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Java 8 Lambda expressions

Pattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");

Function<String,Matcher> match = new Function<String,Matcher>() { @Override public Matcher apply(String logEntry) { return applicationTimePattern.matcher(logEntry); }}; String example = "2.869: Application time: 1.0001540 seconds"; Matcher matcher = match.apply(example);

Page 6: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Java 8 Lambda expressions

Pattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");

Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); String example = "2.869: Application time: 1.0001540 seconds"; Matcher matcher = match.apply(example);

Page 7: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Java 8 Lambda expressions

Pattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");

Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); Predicate<Matcher> matches = new Predicate<Matcher>() { @Override public boolean test(Matcher matcher) { return matcher.find(); }};

String example = "2.869: Application time: 1.0001540 seconds";

Page 8: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Java 8 Lambda expressions

Pattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");

Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); Predicate<Matcher> matches = matcher -> matcher.find();

String example = "2.869: Application time: 1.0001540 seconds”;

Boolean b = matches.test(match.apply(example));

Page 9: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Java 8 Lambda expressionsPattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");

Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); Predicate<Matcher> matches = matcher -> matcher.find();

Function<Matcher,String> extract = matcher -> { if ( matches.test( matcher)) return matcher.group(2); else return ""; };

String example = "2.869: Application time: 1.0001540 seconds”;

String value = extract.apply(match.apply(example));

Page 10: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Java 8 Lambda expressionsPattern applicationTimePattern = Pattern. compile("(\\d+\\.\\d+): Application time: (\\d+\\.\\d+)");

Function<String,Matcher> match = logEntry -> applicationTimePattern.matcher(logEntry); Predicate<Matcher> matches = matcher -> matcher.find();

Function<Matcher,String> extract = matcher -> { if ( matches.test( matcher)) return matcher.group(2); else return ""; };

Consumer<Matcher> toDouble = matcher -> { if ( matches.test(matcher)) pauseTimes.add(Double.parseDouble(extract.apply(matcher)));}; Matcher matcher = match.apply(example);toDouble.accept(matcher);

Page 11: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Lazy Execution

for ( int i = 0; i < 5000; i++) { LOGGER.fine (() -> "Trace value: " + getValue());}

for ( int i = 0; i < 5000; i++) { LOGGER.fine ("Trace value: " + getValue());} 3200ms

0ms

Page 12: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Java 8 Lambda expressions

public void processStrings( List<String> logEntries, Function<String,Matcher> mapper, Consumer collector) { logEntries.forEach( entry -> collector.accept( mapper.apply(entry)));

main.processStrings(logEntries, match, toDouble); main.processStrings(logEntries, anotherMatch, toKBytes);

Page 13: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Language changes in Java 8Method references

String::length aString::length ClassName::new

(aString) -> aString::length;

Page 14: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

StreamAbstraction that allows you to apply a set of functions over the elements in a collection

functions are combined to form a stream pipeline

sequential or parallel

operations divided into intermediate and terminal operations

Page 15: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

StreamDefined in interface Collection::stream()

many classes implement stream()

Arrays.stream(Object[])

Stream.of(Object[]), .iterate(Object,UnaryOperator)

File.lines(), BufferedReader.lines(), Random.ints(), JarFile.stream()

Page 16: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Old School Code

public void imperative() throws IOException { Population pop = new Population(); String logRecord; double value = 0.0d;

BufferedReader reader = new BufferedReader(new FileReader(gcLogFileName)); while ( ( logRecord = reader.readLine()) != null) { Matcher matcher = applicationStoppedTimePattern.matcher(logRecord); if ( matcher.find()) { pop.add(Double.parseDouble( matcher.group(2))); } } System.out.println(pop.toString()); }

Page 17: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

gcLogEntries.stream() .map(applicationStoppedTimePattern::matcher) .filter(Matcher::find) .map(matcher -> matcher.group(2)) .mapToDouble(Double::parseDouble) .summaryStatistics();

data source start streaming

map to Matcher

filter out uninteresting bits

map to Double

extract group

aggregate results

Page 18: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Page 19: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Parallel StreamsIf collection supports Spliterator operations can be performed in parallel

Spliterator is like an internal iterator

split source into several spliterator

traverse source

Many of the same rules as Iterator

however may work (non-deterministically) with I/O

Page 20: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

gcLogEntries.stream().parallel(). .map(applicationStoppedTimePattern::matcher) .filter(Matcher::find) .map(matcher -> matcher.group(2)) .mapToDouble(Double::parseDouble) .summaryStatistics();

parallelize stream

Page 21: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Page 22: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

The first time we tried, the imperative single threaded code

Page 23: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

ran faster (????)

Page 24: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Flat profile of 8.70 secs (638 total ticks): ForkJoinPool.commonPool-worker-3

Compiled + native Method 60.8% 118 + 0 java.util.concurrent.ForkJoinTask.doExec

Raw Profiler Output

Page 25: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Flat profile of 8.70 secs (638 total ticks): ForkJoinPool.commonPool-worker-3

Compiled + native Method 60.8% 118 + 0 java.util.concurrent.ForkJoinTask.doExec

Raw Profiler Output

Learned 2 things very quickly

Parallel streams use Fork-Join

Fork-Join comes with some significant over-head

Page 26: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Predicted that applcations using parallel streams would

For other reasons….

run very very slowly

Page 27: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Then, witnessing it!

Page 28: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

When to use StreamsTragedy of the commons

Handoff costs

CNPQ performance model

Fork-Join

Page 29: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Tragidy of the Commons

Page 30: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

You have a finite amount of hardware

it might be in your best interest to grab it all

if everyone behaves the same way….

Be a good neighbour

Page 31: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Handoff CostsIt costs about ~80,000 clocks to handoff data between threads

overhead = 80000*handoff rate/cycles

10%= 80000*handoff rate/2000000000~2500 handoffs/sec

Page 32: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

CPNQ Performance ModelC - number of submitters

P - number of CPUs

N - number of elements

Q - cost of the operation

Page 33: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

C/P/N/QNeed to amortize setup costs

NQ needs to be large

Q can often only be estimated

N often should be > 10,000 elements

P assumes CPU is the limiting resource

Page 34: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Fork-JoinSupport for Fork-Join added in Java 7

difficult coding idiom to master

Streams make Fork-Join more reachable

how fork-join works and performs is important to your latency picture

Page 35: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

ImplementationUsed internally by a parallel stream

break the stream up into chunks and submit each chunk as a ForkJoinTask

apply filter().map().reduce() to each ForkJoinTask

Calls ForkJoinTask invoke() and join() to retrieve results

Page 36: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Common Thread PoolFork-Join by default uses a common thread pool

default number of worker threads == number of logical cores - 1

Always contains at least one thread

Page 37: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Little’s LawFork-Join uses a work queue

work queue behavior can be modeled using Little’s Law

Number of tasks in the system is the Arrival rate * average service time

Page 38: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Little’s Law ExampleSystem sees 400 TPS. It takes 100ms to process a request

Number of tasks in system = 0.100 * 400 = 40

On an 8 core machine

implies 39 Fork-Join tasks are sitting in queue accumulating dead time

40*100 = 4000 ms of which 3900 is dead time

97.5% of service time is in waiting

Page 39: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Normal Reaction

if there is sufficient capacity then make the pool bigger else add capacity or tune to reduce strength of the dependency

Page 40: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Configuring Common PoolSize of common ForkJoinPool is

Runtime.getRuntime().availableProcessors() - 1

Can configure things

-Djava.util.concurrent.ForkJoinPool.common.parallelism=N

-Djava.util.concurrent.ForkJoinPool.common.threadFactory

-Djava.util.concurrent.ForkJoinPool.common.exceptionHandler

Page 41: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

All Hands on Deck!!!

Everyone is working together to get it done!

Page 42: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Resource DependencyIf task is CPU bound, you can’t increase the size of the thread pool

CPU may not be the limiting factor

throughput is limited by another dependent resource

I/O, shared variable (lock)

Adding threads == increased latency

predicted by Little’s Law

Page 43: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

ForkJoinPool invokeForkJoinPool.invoke(ForkJoinTask) uses the submitting thread as a worker

If 100 threads all call invoke(), we would have 100+ForkJoinThreads exhausting the limiting resource, e.g. CPUs, IO, etc.

Page 44: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

ForkJoinPool submit/getForkJoinPool.submit(Callable).get() suspends the submitting thread

If 100 jobs all call submit(), the work queue can become very long, thus adding latency

Page 45: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Our own ForkJoinPoolWhen used inside a ForkJoinPool, the ForkJoinTask.fork() method uses the current pool

ForkJoinPool ourOwnPool = new ForkJoinPool(10); ourOwnPool.invoke(() -> stream.parallel(). …

Page 46: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

ForkJoinPool

public void parallel() throws IOException { ForkJoinPool forkJoinPool = new ForkJoinPool(10); Stream<String> stream = Files.lines( new File(gcLogFileName).toPath()); forkJoinPool.submit(() -> stream.parallel() .map(applicationStoppedTimePattern::matcher) .filter(Matcher::find) .map(matcher -> matcher.group(2)) .mapToDouble(Double::parseDouble) .summaryStatistics().toString()); }

Page 47: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

ForkJoinPool ObservabilityForkJoinPool comes with no visibility

need to instrument ForkJoinTask.invoke()

gather needed from ForkJoinPool needed to feed into Little’s Law

Page 48: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Instrumenting ForkJoinPoolWe can get the statistics needed from ForkJoinPool needed for Little’s Law

need to instrument ForkJoinTask::invoke()

public final V invoke() { ForkJoinPool.common.getMonitor().submitTask(this); int s; if ((s = doInvoke() & DONE_MASK) != NORMAL) reportException(s); ForkJoinPool.common.getMonitor().retireTask(this); return getRawResult(); }

Page 49: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

In an application where you have many parallel stream operations all running concurrently performance will be limited by the size of the common thread pool

Page 50: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Benchmark Alert!!!!!!

Page 51: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

PerformanceSubmitt log parsing to our own ForkJoinPool

new ForkJoinPool(16).submit(() -> ……… ).get() new ForkJoinPool(8).submit(() -> ……… ).get() new ForkJoinPool(4).submit(() -> ……… ).get()

Page 52: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

16 workerthreads

8 workerthreads

4 workerthreads

Page 53: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

0

75000

150000

225000

300000

Stream Parallel Flood Stream Flood Parallel

Page 54: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Going parallel might not give you the expected gains

Page 55: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

You may not know this until you hit production!

Page 56: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Monitoring internals of JDK is important to understanding where bottlenecks are

Page 57: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

JDK is not all that well instrumented

Page 58: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

APIs have changed so you need to re-read the javadocs

even for your old familiar classes

Page 59: Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved

Performance Seminar

www.kodewerk.com

Java P

erform

ance

Tuning

,

May 26-

29, Chan

ia

Greece