RailswayCon 2010 - Dynamic Language VMs

Embed Size (px)

Citation preview

PowerPoint-Prsentation

Dynamic Language VMs

Ruby 1.9

Lourens Naude, WildfireApp.com

Background

Independent Contractor

Ruby / C / integrations

Well versed full stack

Architecture

WildfireApp.com

Social Marketing platform

Large whitelabel clients

Bursty traffic Lady Gaga, EA, Gatorade etc.

RUBY VM INTERNALS ?

A GOOD CRAFTSMEN KNOWS HIS TOOLS

A BAD CRAFTSMEN BLAMES HIS TOOLS

Typical public facing apps

Interaction patterns

Request / response

Time

Event driven

Overheads

Data transfer (I/0)

Serialization / coercion (CPU)

VM allocation, symbol tables etc. (CPU + mem)

Business requirements (CPU)

Ruby daemon - strace

Process 5856 detached% time calls syscall------ ------- ------------- 89.69 5092 recvfrom 5.35 5093 sendto 2.49 26300 stat 2.05 11004 clock_gettime

Ruby daemon - ltrace

% time calls function------ -------- -------- 95.78 635173 memcpy 1.38 25862 malloc 0.79 14984 free 0.60 11403 strcmp

System Resources

Data latency

CPU cache

Memory local

Disk - local

Memory + disk - remote

Record retrieval with ORM

Fetch results (local/remote memory + disk)

Serialization + conversion (CPU)

Object instantiation (CPU + memory)

Optional memcached (local or remote memory)

RUBY ?

Conversion rows to hash

Benchmark.bm do |b| b.report do1000.times{ ActiveRecord::Base.connection.select_rows "SELECT * FROM users" } endend user system total real 0.300000 0.040000 0.340000 ( 0.505095)

Conversion rows to objects

Benchmark.bm do |b| b.report do1000.times{ ActiveRecord::Base.connection.select_all "SELECT * FROM users" } endend user system total real 0.510000 0.050000 0.560000 ( 0.719201)

Instantiation

Benchmark.bm do |b| b.report do 100_000.times{ 'string'.dup } end end user system total real 0.040000 0.000000 0.040000 ( 0.043791)

Serialization load + dump

Benchmark.bm do |b| b.report do 100_000.times{ Marshal.load(Marshal.dump('ruby string')) } end end user system total real 1.660000 0.010000 1.670000 ( 1.699882)

Roadmap

VM Architecture

Symbol table

Opcodes / instructions

Dispatch

Optimizations

Ruby language

Object model

Garbage Collection

Contexts and control flow

Concurrency

VM ARCHITECTURE

Changes

Ruby 1.8 artifacts

Parser && AST nodes

Object model

Garbage Collection

No immediate performance gains for String manipulation etc.

Codegen phase

Better optimization hooks

Faster runtime

AST AND CODEGEN

Abstract Syntax Tree (AST)

Structure

Grammar representation

Annotations attach semantics to nodes

Possible to refactor the tree more nodes, less complexity

Example nodes

Literals, values and assignments

Method calls, arguments and return values

Jumps if, else, iterators

Unconditional jumps exceptions, retry etc.

Code generation

How it works

Converts the AST to compiled code segments

Reduces a tree to a linear and ordered instruction set

Fast execution no tree walking + native code

Workflow

Preprocessing AST refactoring (!YARV)

Codegen, nodes instruction sequences

Postprocessing replace with optimal instruction sequences (peephole optimization)

Pre and postprocessing phases may be multiple passes

LOOKUPS

Symbol / Hash tables

How it works

Constant time access to int/char indexed values

Table defaults: 11 bins, 5 entries per bin

Bins++, sequential lookup inside bins

Lookup of methods, variables, encodings etc.

Symbol

Entity with both a String and Number representation

!(String || Symbol), points to a table entry

Developer identifies by name, VM by int

Immutable for performance watch out for memory

VM INSTRUCTIONS

VM instructions / opcodes

Stateless functions

80+ currently

Generated from definitions at interpreter compile time
(existing ruby requirement for 1.9)

Instruction / opcode / operands notation

Categories and examples

variable: get or set local variable

class / module: definition

method / iterator: invoke method, call block

Optimization: redefines common +, > 8 * 1.8=> 14.4

>> 8 * 1.8 * 1.8

=> 25.92

>> 8 * 1.8 * 1.8 * 1.8

=> 46.656

>> 8 * 1.8 * 1.8 * 1.8 * 1.8

=> 83.9808

Heap growth mid to large app

=> 83.9808>> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8

=> 151.16544

>> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8

=> 272.097792

>> 8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8 * 1.8

=> 489.7760256

Slot structure

typedef struct RVALUE { union {

struct {

VALUE flags; /* 0 when free */

struct RVALUE *next;

}free;

struct RObject object;

struct RFloat float;

...

Pointer layout

Self describing

Program data area and heap

RVALUE union can accommodate any ruby object

Frames, variable structures etc. well defined also

40 bytes (64 bit arch) represents a slot

Free list points to the next free slot

Ruby heap VS OS heap

Ruby heap

20 bytes represents a slot

slot points to OS data, on the OS / system heap

OS heap

Thus a 20 byte slot can reference a 2MB chunk on the system heap

CRuby: Mark and Sweep

Conservative

Cannot determine with certainty if a value references an object assume it's in use

Two phase implementation

Mark phase: identifies and flags reachable objects from the current program context

Sweep phase: iterates through the object space and

free all objects not marked

unmark marked objects

Concerns

Performance

Runtime pauses

Work proportional to heap size

Prone to memory fragmentation (no compaction)

Recursive

Triggers

8m malloc calls triggers GC

Every 8MB allocated triggers GC

Not enough heap reserve

GC in action

# 4 objs, 1 Array, 3 Stringsary1 = %w(a b c)

ary2 = %w(d e f)

# both ary1 and ary2 is reachable

ary1 = nil

# ary1 and it's contents is unreachable

Generational GC

Observations

Vast majority of objects are short lived 80%+

Expensive to account for long lived objects

Parition by age and frequently collect short lived ones

How it works

Restrict GC to the most recently modified slots

These sub heaps are referred to as generations

Perform a full GC only when the youngest generationfails to meet memory requirements

CONCURRENCY

Threading

Changes

Native OS Threads

Ruby Thread == pthread

Multiple cores ftw!

but

Syscalls schedule, synchronize and create

Much more expensive to spawn and switch than green threads

Global VM Lock (GVL)

Global VM Lock (GVL)

How it works

Thread that owns the GVL is allowed to execute

Blocking operations should release the GVL

Automatically released when scheduled

C extensions : author does not concern with syncronization

Blocking VM operations

I/O

blocking reads and writes

DNS resolution or connects

Often has huge handshake overheads

Computations, processes and locks

Expensive Bignum ops blocked 1.8 interpreters

Process.waitpid

File locks

Releasing the GVL

Stable API

Blocking function: slow system call / computation

Unblock function: called on Thread interrupt

Pitfalls

Cannot access VALUEs (objects) in blocking functions

No integration with Ruby's exception / error handler

Lightweight Concurrency

Fibers

Coroutines 4k stack size

Very fast user space context switches

Cooperative scheduling required

Fiber.yield pauses the activation record, which keeps context across multiple calls

Use cases

Generators

Blocking I/0 - Neverblock

In the pipeline

MVM: Multiple Virtual Machines

Shared process state

Sandboxed per VM application state

Distribute VMs across available cores

Message passing for inter VM communication

Most Ruby deployments aren't thread safe

MVM is well suited for this

QUESTIONS ?