Upload
higepon-taro-minowa
View
1.288
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Mosh R6RS Scheme interpreterFrom slow toy interpreter to fast practical Interpreter.
Citation preview
Toy to practical
Taro Minowa (Higepon)
Shibuya.Lisp Tech Talk#2February 28, 2009
Mosh internals
Introduce myself
Higepon
http://www.monaos.org
http://d.hatena.ne.jp/higepon/
http://twitter.com/higepon/
http://www.facebook.com/people/Taro-Minowa-Higepon/649065487
MonaOpen Source OS
MoshFast Scheme Interpreter
Outputzhttp://outputz.com/
Today’s presentation is
about…
From toy to practical interpreter
Mosh
R6RS Scheme InterpreterAs fast as Gauche and Ypsilon(I believe)
Many SRFIs, DBI (MySQL)
Regexp (Oniguruma)
Object system (Tiny CLOS)
Process Management
Foreign Function Interface
Developmentversion 0.0.7
2 comitters
Higepon
kokosabu
In the future
Use shell for Mona
Toy
The MITOH 2006Scheme Shell for Mona
OS integrated R5RS Scheme
Implementation based on SICP
My first interpreter!
Basic tree-based interpreter
Written in “Pure C++”
Good pointsIt works
Almost covers R5RS
Bad pointsToo slow
fib(31) takes a few minitues !
Not for practical use
Why was it slow and bad?
Problems
Slow GC -
Scheme recursion uses native stack
-
Slow environment look up
-
Incomplete tail calloptimization
-
Slow arithmetic -
Few Optimization -
Slow arithmeticToo many heap allocations
So fib(31) causes ...
With slow GC
(+ 1 1)=> new Number(1 + 1)
Learned from toySlow interpreter is useless
Slow interpreter is not practical
Need more speed
Need better design
Tree-based→ VM
Read “The 3imp”Three implementation models for scheme
Kent Dyvbig (ChezScheme)
Bytecode VM
With sample code
http://mono.kmc.gr.jp/~yhara/w/?Reading3imp.pdf#l13
Choose Stack-based VMFast environment look up
Use display closure
Use virtual stack
tail call optimization
3imp doesn’t have
Multiple values Values register
Global variables Global hash-table
Subr Hook on APPLY instruction
let LET_FRAME instruction
Optimization on compilation
Borrowed from Gauche
Second implementation Mini Mosh
Write in Scheme instead of C++Easy to make prototype
no need to parse, we have “read”
never SEGV (very important!)
We can use backend’s proceduresOP_CAR => car
PrototypingRewrote about 50 times
VM: 1400 lines, Compiler 2400 lines
Hardest partDesigning stack layout
Wrong stack position
change stack layout => crash
some code works the other doesn’t
Bugs in compiler, VM or design?
A Pen and a notebook are more than friend.
Instruction example
VM
(+ 1 1)=> ‘(CONST 1 PUSH CONST 1 NUMBER_ADD)
[(NUMBER_ADD) (apply-native-2arg +)][(CONSTANT) (val1) (VM codes (skip 1) (next 1) fp c stack sp)]
half
Improvement in Mini Mosh
Problems of toy Solutions
Slow GC -
Scheme recursion uses native stack
Stack VM usesvirtual stack
Slow environment look up
First lookupusing display closure
Incomplete tail calloptimization
Tail call optimization
Slow arithmetic -
Few Optimization Borrowed from Gauche
Port prototypeto C++
VMEasy to port
maps cond to switch/case
maps recusive call to loop
CompilerPainful
compiler(Scheme)
Mini Mosh(Scheme)
compiler(Scheme)
LREF 0PUSHGREF ‘compileAPPLY...
compile
list of instructionscompiler.cpp
generate
read
Compile compiler with Mini MoshEmbed the instructions to C++ Mosh
Run on Gauche
Mosh(C++)
make(LREF, 0)make(PUSH)make(GREF, “compile”)
Mini Mosh(Scheme)
Mosh(C++)
compiler(Scheme)
share
Share the compiler written in SchemeEasy to debug
Easy to process intermediate code
VM in C++Use Boehm GC
Much faster than toy
fib(31) takes only a few seconds
(+ 1 1) doesn’t need heap allocation
Tag bit based Object system
Use immediate value for Number
Improvement in C++ Mosh
Problems of toy Solutions
Slow GC Boehm GC
Scheme recursion uses native stack
Stack VM usesvirtual stack
Slow environment look up
First lookupusing display closure
Incomplete tail calloptimization
Tail call optimization
Slow arithmetic Tag bit Object System
Few Optimization Borrowed from Gauche
Become a practical fast interpreter?
Not yet.
2 beard gurus(ひげのお兄さんたち)
Gauche & Ypsilon
Speed freakhttp://osdevj.g.hatena.ne.jp/osdevj/20060807/1154962935
CJava
GaucheCINTPerl
PythonRuby 1.8
0 75 150 225 300msec
We need performance tuningChart
Profiler
Tuning
Fast startup
Many optimization techniques
ChartMake a goal clear
Know what I’ve done is good or bad
Run benchmarks
make bench
Draw charts
every time
ProfilerC++ profiler tells us little
It happens inside the run-loop
We need Scheme profiler
mosh -p
SIG_PROF
Fast start up is also importantJust running empty script takes ...
Fast start up is also importantJust running empty script takes ...
Perl
Ruby
Gauche
Python
Ypsilon
Mosh
0 20 40 60 80msec
Mosh startup 80 => 20msecDon’t read many files when starts up
Don’t allocate large memory
Don’t use too many static initializer
Embed the compiler with binary format
FASL(Fast Loading)
Optimizations
CompilerBeta reduction
Procedure inlining
Constant folding
(+ 1 2) => 3
Peep hole
destination of jump is jump etc ..
VMInstructions Unification
Shorter instructions are faster
PUSH + APPLY => PUSH_APPLY
Direct threaded code
switch/case => goto
GCC only
Compare on instruction levelGauche
(disasm ...)
Ypsilon
(debug-compile ...)
More improvement in C++ Mosh
Problems of toy Solutions
Slow GC Boehm GC
Scheme recursion uses native stack
Stack VM usesvirtual stack
Slow environment look up
First lookupusing display closure
Incomplete tail calloptimization
Tail call optimization
Slow arithmetic tag bit Object System
Few Optimization many many optimizations
Finally
Mosh becomes practical!Practical speed
Conclusions
Toy to Practical is not easy.
Wear a beard!Please try Moshhttp://code.google.com/p/mosh-scheme/
trunk is better