33
Metascala A tiny DIY JVM https://github.com/lihaoyi/Metascala Li Haoyi [email protected] Scala Exchange 2nd Dec 2013

Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

MetascalaA tiny DIY JVM

https://github.com/lihaoyi/Metascala

Li [email protected]

Scala Exchange2nd Dec 2013

Page 2: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Who am I?

Li Haoyi

Write Python during the day

Write Scala at night

Page 3: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

What is Metascala?

● A JVM

● in 3000 lines of Scala

● Which can load & interpret java programs

● And can interpret itself!

Page 4: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Size comparison

● Metascala: ~3,000 lines

● Avian JVM: ~80,000 lines

● OpenJDK: ~1,000,000 lines

Page 5: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Basic Usage

Closure’s class file is given to VM to load/parse/execute

Result is extracted from VM into host environment

Captured variables are serialized into VM’s environment

Create a new metascala VMPlain Old Java Object

Any other classes necessary to evaluate the closure are loaded from the current ClasspathNo global state

Page 6: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

It’s Metacircular! Need to give the outer VM more than the 1mb default heap

Simpler program avoids initializing the scala/java std libraries, which takes forever under double-interpretation.

Takes a while (~10s) to produce result

VM inside a VM!

Page 7: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Limitations

● Single-threaded

● Limited IO

● Slowww

Page 8: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Performance Comparison

● OpenJDK: 1.0x

● Metascala: ~100x

● Meta-Metascala: ~10000x

Page 9: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Why Metascala?

● Fun to explore the innards of the JVM

● An almost-fully secure Java runtime!

● Small size makes fiddling fun

Page 10: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Why Metascala?

● Fun to explore the innards of the JVM

● An almost-fully secure Java runtime!

● Small size makes fiddling fun

Page 11: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Quick TourImmutable Defs: ~380 loc

Native Bindings: ~650 loc

Bytecode SSA transform: ~650 loc

Runtime data structures: 820 loc

Binary heap & Copying GC: 132 loc

DIY scala-pickling: 132 loc

“This is a VM”: 243 loc

Page 12: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Quick Tour: Tests

Tests for basic Java features

GC Fuzz-tests

Test Metacircularity!

Scala std lib usage

Page 13: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

What’s a Heap?

Fig 1. A Heap

Page 14: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

What’s a Garbage Collector?Blit (copy) all roots to new heap

Scan the already copied things for more things and copy them too

Stop when you’ve scanned everything

Not pseudocode

Page 15: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Why Metascala?

● Fun to explore the innards of the JVM

● An almost-fully secure Java runtime!

● Small size makes fiddling fun

Page 16: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Limited Instruction Count

Page 17: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

And Limited Memory!

Not an OOM Error!We throw this ourselves

Page 18: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Explicitly defined capabilities

Every external call has to be explicitly defined and enabled

Page 19: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Security Characteristics

● Finite instruction count● Finite memory● Well-defined interface to outside world● Doesn’t rely on Java Security Model at all!

● Still some holes…

Page 20: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Security Holes

● Classloader can read from anywhere● Time spent classloading not accounted● Memory spent on classes not accounted● GC time not accounted● “native” methods’ time/memory not

accounted

Page 21: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Basic Problem

User code resource consumption is bounded

VM’s runtime resource usage can be made to grow arbitrarily large

User Code

Classes

Runtime Data Structures

Native method calls

Garbage Collector

Outside World

Page 22: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Possible Solution

Put a VM Inside a VM!

Works,

... but 10000x slowdown

Outside World

User Code

Classes

Runtime Data Structures

Native method calls

Garbage Collector

Outside WorldFixed Unaccounted Costs

Page 23: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Another Possible Solution

Move more components intovirtual runtime

Difficult to bootstrap correctly

WIP

Outside World

User Code

Classes

Runtime Data Structures

Native method calls

Garbage Collector

Page 24: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Why Metascala?

● Fun to explore the innards of the JVM

● An almost-fully secure Java runtime!

● Small size makes fiddling fun

Page 25: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Live Demo

Page 26: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Ugliness

● Long compile times

● Nasty JVM Interface

● Impossible Debugging

Page 27: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Long compile times

● 100 lines/s

● Twice as slow (50 lines/s) on my older machine!

Page 28: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Real World

Nasty JVM Interface

Ideal World

User Code

Std Library

VM

Initialized

Std Library

User Code

VM

Initialized

Nasty Language VM Interface

Lazy-Initialization means repeated dives back into lib/native code

Clean Interfaces

Linear Initialization

Page 29: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Java’s dirty little secret

WTF! I’d never use these things!

The Verbosity of Java with the Safety of C

Page 30: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

You probably do

Almost every Java program ever uses these things.

What happens if you don’t have them

Page 31: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Next Steps

● Maximize correctness○ Implement Threads & IO○ Fix bugs (GC, native calls, etc.)

● Solidify security characteristics○ Still plenty of unaccounted-for memory/processing○ Some can be hosted “within” VM itself

● Simplify Std-Lib/VM interface○ Try using Android Std Lib?

Page 32: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Possible Experiments

● Native codegen instead of an interpreter?○ Generate/exec native code through JNI○ Heap is already a binary blob that can be easily

passed to native code● Bytecode transforms and optimizations?

○ Already in SSA form● Continuations, Isolates, Value Classes?● Port the whole thing to Scala.Js?

Page 33: Metascala - Haoyi's Programming Blog · in 3000 lines of Scala Which can load & interpret java programs And can interpret itself! Size ... Why Metascala? Fun to explore the innards

Metascala: a tiny DIY JVM

Ask me about:● Single Static Assignment form● Copying Garbage Collection● sun.misc.Unsafe● Warts of the .class file format● Capabilities-based security● Abandoned approaches