73
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | The Java HotSpot VM Under the Hood Tobias Hartmann Compiler Group Java HotSpot Virtual Machine Oracle Corporation May 2017

The Java HotSpot VM - ETH Zpeople.inf.ethz.ch/.../slides/w15_01-hotspot-jvm-jit-compilers.pdf · •Software engineer in the HotSpot JVM Compiler Team at Oracle –Based in Baden,

Embed Size (px)

Citation preview

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

The Java HotSpot VMUnder the Hood

Tobias Hartmann

Compiler Group – Java HotSpot Virtual MachineOracle Corporation

May 2017

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

About me

• Software engineer in the HotSpot JVM Compiler Team at Oracle

– Based in Baden, Switzerland

• Master’s degree in Computer Science from ETH Zurich

• Worked on various compiler-related projects

– Currently working on future Value Type support for Java

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

3

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Outline

• Intro: Why virtual machines?

• Part 1: The Java HotSpot VM

– JIT compilation in HotSpot

– Tiered Compilation

• Part 2: What's new in Java 9– Segmented Code Cache

– Compact Strings

– Ahead-of-time Compilation

4

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

A typical computing platform

5

Hardware

Operating system

Java Virtual Machine

User Applications

Java EEJava SE

Application Software

System Software

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

A typical computing platform

6

Hardware

Operating system

Java Virtual Machine

User Applications

Java EEJava SE

Application Software

System Software

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

A typical computing platform

7

Hardware

Operating system

Java Virtual Machine

User Applications

Java EEJava SE

Application Software

System Software

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Programming language implementation

C

Windows

Intel x86

8

Operatingsystem

Languageimplementation

Hardware

Programminglanguage

CompilerStandard librariesDebuggerMemory management

Linux

Intel x86

CompilerStandard librariesDebuggerMemory management

Linux

ARM

CompilerStandard librariesDebuggerMemory management

Solaris

SPARC

CompilerStandard librariesDebuggerMemory management

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 9

(Language) virtual machine

Java

Windows

Intel x86

Operatingsystem

Virtual machine

Hardware

Programminglanguage

HotSpot VM

PPC ARM SPARC

Mac OS X SolarisLinux

JavaScript Scala Python

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Outline

• Intro: Why virtual machines?

• Part 1: The Java HotSpot VM

– JIT compilation in HotSpot

– Tiered Compilation

• Part 2: What's new in Java 9– Segmented Code Cache

– Compact Strings

– Ahead-of-time Compilation

10

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

The JVM: An application developer’s view

Java source code

int i = 0;do {

i++;} while (i < f());

Bytecodes

0: iconst_01: istore_12: iinc5: iload_16: invokestatic f9: if_icmplt 212: return

compileHotSpotJava VMexecute

• Ahead-of-time• Using javac

• Instructions for an abstract machine• Stack-based machine (no registers)

11

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

The JVM: A VM engineer’s view

12

Bytecodes

0: iconst_01: istore_12: iinc5: iload_16: invokestatic f9: if_icmplt 212: return

HotSpot Java VM

Garbage collector

manage

Interpreter

executeHeap

Stack

access

access

Compilation system

compile

C1

C2

Compiled methodproduce

Machine code

Debug info

Object maps

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Outline

• Intro: Why virtual machines?

• Part 1: The Java HotSpot VM

– JIT compilation in HotSpot

– Tiered Compilation

• Part 2: What's new in Java 9– Segmented Code Cache

– Compact Strings

– Ahead-of-time Compilation

13

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Interpretation vs. compilation in HotSpot• Template-based interpreter

– Generated at VM startup (before program execution)

– Maps a well-defined machine code sequence to every bytecode instruction

• Compilation system

– Speedup relative to interpretation: ~100X

– Two just-in-time compilers (C1, C2)

– Aggressive optimistic optimizations

14

Bytecodes0: iconst_01: istore_12: iinc5: iload_16: invokestatic f9: if_icmplt 212: return

Machine codemov -0x8(%r14), %eaxmovzbl 0x1(%r13), %ebxinc %r13mov $0xff40,%r10jpmq *(%r10, %rbx, 8)

Load local variable 1

Dispatch next instruction

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Ahead-of-time vs. just-in-time compilation

• AOT: Before program execution

• JIT: During program execution

– Tradeoff: Resource usage vs. performance of generated code

15

Performance

Amount of compilationInterpretation Compile everything

Bad performancedue to interpretation

Bad performancedue to compilation overhead

Good performancedue to good selection of compiled methods and applied optimizations

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

JIT compilation in HotSpot

• Resource usage vs. performance

– Getting to the “sweet spot”

1. Selecting methods to compile

2. Selecting compiler optimizations

16

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

1. Selecting methods to compile

• Hot methods (frequently executed methods)

• Profile method execution

– # of method invocations, # of backedges

• A method’s lifetime in the VM

17

Interpreter Compiler (C1 or C2) Code cache

Gather profiling information Compile bytecode to native code Store machine code

# method invocations > THRESHOLD1

# of backedges > THRESHOLD2

Deoptimization

Compiler’s optimistic assumptionsproven wrong

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

False

Control flow graph Generated code

Example optimization:

S1;S2;S3;if (x > 3)

S4; S5;S6;S7;

S8;S9;

10’000 0

guard(x > 3)S1;S2;S3;S4;S8;S9;

Deoptimize

18

True

Hot path compilation

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Example optimization:

19

class A {void bar() {

S1;}

}

class B extends A {void bar() {

S2;}

}

void foo() {A a = create(); // return A or Ba.bar();

}

Class hierarchy Method to be compiled

loaded

not loaded

Compiler:Inline call?Yes.

Virtual call inlining

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Example optimization: Virtual call inlining

• Benefits of inlining

– Virtual call avoided

– Code locality

• Optimistic assumption: only A is loaded

– Note dependence on class hierarchy

– Deoptimize if hierarchy changes

20

class A {void bar() {

S1;}

}

class B extends A {void bar() {

S2;}

}

void foo() {A a = create(); // return A or BS1;

}

Class hierarchy Method to be compiled

loaded

not loaded

Compiler:Inline call?Yes.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Example optimization: Virtual call inlining

21

class A {void bar() {

S1;}

}

class B extends A {void bar() {

S2;}

}

void foo() {A a = create(); // return A or Ba.bar();

}

Class hierarchy Method to be compiled

loaded

loaded

Compiler:Inline call?No.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Deoptimization

• Compiler’s optimistic assumption proven wrong

– Assumptions about class hierarchy

– Profile information does not match method behavior

• Switch execution from compiled code to interpretation

– Reconstruct state of the interpreter at runtime

– Complex implementation

• Compiled code

– Possibly thrown away

– Possibly reprofiled and recompiled

22

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Performance effect of deoptimization

• Follow the variation of a method’s performance

23

Performance

Time

Interpreted Compiled Interpreted Compiled

Compilation DeoptimizationVM Startup Compilation

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

JIT compilation in HotSpot

• Resource usage vs. performance

– Getting to the “sweet spot”

1. Selecting methods to compile

2. Selecting compiler optimizations

24

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

2. Selecting compiler optimizations

• C1 compiler

– Limited set of optimizations

– Fast compilation

– Small footprint

• C2 compiler

– Aggressive optimistic optimizations

– High resource demands

– High-performance code

• Graal

– Experimental compiler

– Will be part of HotSpot for AOT in JDK 9

25

Client VM

Server VM

Tiered Compilation(enabled since JDK 8)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Outline

• Why virtual machines?

• Part 1: The Java HotSpot VM

– JIT compilation in HotSpot

– Tiered Compilation

• Part 2: What's new in Java 9– Segmented Code

– Compact Strings

– Ahead-of-Time Compilation

• Conclusion

26

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Tiered Compilation

• Introduced in JDK 7, enabled by default in JDK 8

• Combines the benefits of

– Interpreter: Fast startup

– C1: Fast compilation

– C2: High peak performance

• Within the sweet spot

– Faster startup

– More profile information

27

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Benefits of Tiered Compilation

28

Performance

Time

VM Startup

Interpreted C1-compiled

warm-up time

Client VM (C1 only)

Compilation

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Benefits of Tiered Compilation

29

PerformanceInterpreted C2-compiled

warm-up time

Server VM (C2 only)

Time

VM Startup Compilation

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Benefits of Tiered Compilation

30

PerformanceInterpreted C1-compiled

warm-up time

Tiered compilation

C2-compiled

Time

VM Startup Compilation Compilation

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Additional benefit: More accurate profiling

time

Interpreter C1 (profiled) C2 (non-profiled)

Interpreter

Profiling without tiered compilation

Profiling with tiered compilation

C2 (non-profiled)

300 samples

100 samples 1000 samples

100 samples 200 samples

w/o tiered compilation: 300 samples gatheredw/ tiered compilation: 1’100 samples gathered

31

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Tiered Compilation

• Combined benefits of interpreter, C1, and C2

• Additional benefits

– More accurate profiling information

• Drawbacks

– Complex implementation

– Careful tuning of compilation thresholds needed

– More pressure on code cache

32

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

A method’s lifetime (Tiered Compilation)

Interpreter C1 C2

Code cache

Collect profiling information Generate code quicklyContinue collectingprofiling information

Generate high-quality codeUse profiling information

Deoptimization

33

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Performance of a method (Tiered Compilation)

34

Performance

Time

VM Startup

Interpreted C1 compiled C2 compiled

Compilation Compilation

Interpreted C2 compiled

Deoptimization Compilation

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Compilation levels (detailed view)

Interpreter

C1: no profiling

C1: limited profiling

C1: full profiling

C2

0

1

2

3

4

Co

mp

ilati

on

leve

lTypical compilation sequence

Associated thresholds:Tier3InvokeNotifyFreqLogTier3BackedgeNotifyFreqLogTier3InvocationThresholdTier3MinInvocationThresholdTier3BackEdgeThresholdTier3CompileThreshold

Associated thresholds:Tier4InvocationThresholdTier4MinInvocationThresholdTier4CompileThresholdTier4BackEdgeThreshold

35

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Outline

• Intro: Why virtual machines?

• Part 1: The Java HotSpot VM

– JIT compilation in HotSpot

– Tiered Compilation

• Part 2: What's new in Java 9– Segmented Code Cache

– Compact Strings

– Ahead-of-time Compilation

36

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

What is the code cache?

• Stores code generated by JIT compilers

• Continuous chunk of memory

– Managed (similar to the Java heap)

– Fixed size

• Essential for performance

37

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Code cache usage: JDK 6 and 7

free space

VM internals

compiled code

38

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Code cache usage: JDK 8 (Tiered Compilation)

39

free space

VM internals

C1 compiled (profiled)

C2 compiled (non-profiled)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Code cache usage: JDK 9

40

free space

VM internals

C1 compiled (profiled)

C2 compiled (non-profiled)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Challenges

• Tiered compilation increases amount of code by up to 4X

• All code is stored in a single code cache

• High fragmentation and bad locality

• But is this a problem in real life?

41

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 42

Code cache usage: Reality

profiled code

non-profiled code

free space

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 43

Code cache usage: Reality

hotness

profiled code

non-profiled code

free space

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Design: Types of compiled code

Optimization level Size Cost Lifetime

Non-method code optimized small cheap immortal

Profiled code (C1) instrumented medium cheap limited

Non-profiled code (C2) highly optimized large expensive long

44

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

• Without Segmented Code Cache • With Segmented Code Cache

45

Design

Code Cache

non-profiled methods

profiled methods

non-methods

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

non-profiled methodsprofiled methods

46

Segmented Code Cache: Reality

profiled code

non-profiled code

free space

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 47

Segmented Code Cache: Reality

non-profiled methodsprofiled methods hotness

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Evaluation: Code locality

48

targets[0].amount()

targets[0].amount()

targets[1].amount()

targets[1].amount()

targets[2].amount()

Code Cache

profiled code

non-profiled code

targets[2].amount()

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Evaluation: Code locality

49

profiled code

non-profiled code

targets[0].amount()

targets[0].amount()

targets[1].amount()

targets[1].amount()

targets[2].amount()

Code Cache

targets[2].amount()

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

0

1

2

3

4

5

6

7

8

9

10

128 256 512 1024 2048 4096

Spe

ed

up

in %

Number of call targets

Evaluation: Code locality

50

L1 ITLB L2 STLB

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Evaluation: Code locality

• Instruction Cache (ICache)

– 14% less ICache misses

• Instruction Translation Lookaside Buffer (ITLB1)

– 44% less ITLB misses

• Overall performance– 9% speedup with microbenchmark

51

1 caches virtual to physical address mappings to avoid slow page walks

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Evaluation: Responsiveness

• Sweeper (GC for compiled code)

52

0

5

10

15

20

25

30

35

40

# full sweeps Cleanup pause time Sweep time

Re

du

ctio

n in

%

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Evaluation: Performance

53

0

2

4

6

8

10

12

14

SPECjbb2005 SPECjbb2013 JMH-Javac Octane (Typescript) Octane (Gbemu)

Imp

rove

me

nt

in %

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

What we have learned

• Segmented Code Cache helps

– To reduce the sweeper overhead and improve responsiveness

– To reduce memory fragmentation

– To improve code locality

• And thus improves overall performance

• To be released with JDK 9

54

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Outline

• Intro: Why virtual machines?

• Part 1: What's cool in Java 8

– Background: JIT compilation in HotSpot

– Tiered Compilation

• Part 2: What's new in Java 9– Segmented Code Cache

– Compact Strings

– Ahead-of-Time Compilation

55

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 56

public class HelloWorld {public static void main(String[] args) {

String myString = "HELLO";System.out.println(myString);

}}

Java Strings

public final class String {private final char value[];...

}

char value[] =

H

0x0048 0x0045 0x004C 0x004C 0x004F

2 bytes

E L L O

UTF-16 encoded

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

“Perfection is achieved, not when there is nothing more to add, but when there is nothing more to take away.”

– Antoine de Saint Exupéry

57

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

There is a lot to take away here..

• UTF-16 encoded Strings always occupy two bytes per char

• Wasted memory if only Latin-1 (one-byte) characters used:

• But is this a problem in real life?

58

char value[] =

H

0x0048 0x0045 0x004C 0x004C 0x004F

2 bytes

E L L O

0x0048 0x0045 0x004C 0x004C 0x004F

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Real life analysis: char[] footprint

• 950 heap dumps from a variety of applications

– char[] footprint makes up 10% - 45% of live data

– Majority of characters are single byte

• Predicted footprint reduction of 5% - 10%

59

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Project Goals

• Memory footprint reduction by improving space efficiency of Strings

• Meet or beat performance of JDK 9

• Full compatibility with related Java and native interfaces

• Full platform support

– x86/x64, SPARC, ARM

– Linux, Solaris, Windows, Mac OS X

60

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Design

• String class now uses a byte[] instead of a char[]

• Additional 'coder' field indicates which encoding is used

61

public final class String {private final byte value[];private final byte coder;...

}

H E L L O

byte value[] = 0x00 0x48 0x00 0x45 0x00 0x4C 0x00 0x4C 0x00 0x4F

byte value[] = 0x48 0x45 0x4C 0x4C 0x4F

UTF-16 encoded

Latin-1 encoded

H E L L O

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Design

• If all characters have a zero upper byte

→ String is compressed to Latin-1 by stripping off high order bytes

• If a character has a non-zero upper byte

→ String cannot be compressed and is stored UTF-16 encoded

62

byte value[] = 0x00 0x48 0x00 0x45 0x00 0x4C 0x00 0x4C 0x00 0x4F

byte value[] = 0x48 0x45 0x4C 0x4C 0x4F

UTF-16 encoded

Latin-1 encoded

InflationCompression

0x47

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Design

• Compression / inflation needs to fast

• Requires HotSpot support in addition to Java class library changes

– JIT compilers: Intrinsics and String concatenation optimizations

– Runtime: String object constructors, JNI, JVMTI

– GC: String deduplication

• Kill switch to enforce UTF-16 encoding (-XX:-CompactStrings)

– For applications that extensively use UTF-16 characters

63

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 64

public class LogLineBench {int size;

String method = generateString(size);

public String work() throws Exceptions {return "[" + System.nanoTime() + "] " +

Thread.currentThread().getName() +"Calling an application method \"" + method +"\" without fear and prejudice.";

}

Microbenchmark: LogLineBench

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

LogLineBench results

Performance ns/op Allocated b/op

1 10 100 1 10 100

Baseline 149 153 231 888 904 1680

CS disabled 152 150 230 888 904 1680

CS enabled 142 139 169 504 512 904

65

• Kill switch works (no regression)

• 27% performance improvement and 46% footprint reduction

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Evaluation: Performance

• SPECjbb2005

– 21% footprint reduction

– 27% less GCs

– 5% throughput improvement

• SPECjbb2015

– 7% footprint reduction

– 11% critical-jOps improvement

• Weblogic (startup)

– 10% footprint reduction

– 5% startup time improvement

• To be released with JDK 9

66

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Outline

• Intro: Why virtual machines?

• Part 1: The Java HotSpot VM

– JIT compilation in HotSpot

– Tiered Compilation

• Part 2: What's new in Java 9– Segmented Code Cache

– Compact Strings

– Ahead-of-time Compilation

67

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Ahead-of-Time Compilation

• Compile Java classes to native code prior to launching the VM

• AOT compilation is done by new jaotc tool

– Uses Java based Graal compiler as backend

– Stores code and metadata in shared object file

• Improves start-up time– Limited impact on peak performance

• Sharing of compiled code between VM instances

68

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Revisit: Performance of a method (Tiered Compilation)

69

Performance

Time

VM Startup

Interpreted C1 compiled C2 compiled

Compilation Compilation

Interpreted C2 compiled

Deoptimization Compilation

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Performance of a method (Tiered AOT)

70

Performance

Time

VM Startup

C2 compiled

Compilation Compilation

Interpreted C2 compiled

Deoptimization Compilation

AOT compiled C1 compiled

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Ahead-of-Time Compilation

• Experimental feature

– Supported on Linux x64

– Limited to the java.base module

• Try with your own code - feedback is welcome!

• To be released with JDK 9

– More to come in future releases

71

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

Summary

• Many cool features to come with Java 9 in July 2017

– Segmented Code Cache, Compact Strings, Ahead-of-Time compilation

• Java – A vibrant platform

– Early access releases are available: https://jdk9.java.net/download/

• The future of the Java platform"Our SaaS products are built on top of Java and the Oracle DB—that’s the platform.”

Larry Ellison, Oracle CTO

• Questions?

[email protected]

72