92
LEARNING RUBY BY READING THE SOURCE tw:@burkelibbey / gh:@burke

Learn Ruby by Reading the Source

Embed Size (px)

DESCRIPTION

An updated version of a talk I've given a few times before; this one explains ruby's object model with a bit more lucidity and more C.

Citation preview

Page 1: Learn Ruby by Reading the Source

LEARNING RUBY BY READING THE SOURCE

tw:@burkelibbey / gh:@burke

Page 2: Learn Ruby by Reading the Source

THESIS:The best way to learn a piece of infrastructure is to learn about

how it’s implemented.

Page 3: Learn Ruby by Reading the Source

So let’s dig in to ruby’s source!

Page 4: Learn Ruby by Reading the Source

TOPICS

• Basic Object Structure

• Class inheritance

• Singleton classes

•Module inheritance

•MRI Source spelunking

Page 5: Learn Ruby by Reading the Source

BASIC OBJECT STRUCTURE

Page 6: Learn Ruby by Reading the Source

Every object has an RBasic

struct RBasic { VALUE flags; VALUE klass;}

Page 7: Learn Ruby by Reading the Source

flags stores information like whether the object is frozen, tainted, etc.

struct RBasic { VALUE flags; VALUE klass;}

It’s mostly internal stuff that you don’t think about very often.

Page 8: Learn Ruby by Reading the Source

klass is a pointer to the class of the object

struct RBasic { VALUE flags; VALUE klass;}

(or singleton class, which we’ll talk about later)

Page 9: Learn Ruby by Reading the Source

...but what’s a VALUE?

struct RBasic { VALUE flags; VALUE klass;}

Page 10: Learn Ruby by Reading the Source

VALUE is basically used as a void pointer.

typedef uintptr_t VALUE;

It can point to any ruby value.

Page 11: Learn Ruby by Reading the Source

You should interpret “VALUE” as:“a (pointer to a) ruby object”

Page 12: Learn Ruby by Reading the Source

This is a Float.

struct RFloat { struct RBasic basic; double float_value;}

Page 13: Learn Ruby by Reading the Source

Every type of object, including Float, has an RBasic.

struct RFloat { struct RBasic basic; double float_value;}

Page 14: Learn Ruby by Reading the Source

And then, after the RBasic, type-specific info.

struct RFloat { struct RBasic basic; double float_value;}

Page 15: Learn Ruby by Reading the Source

Ruby has quite a few types.

Each of them has their own type-specific data fields.

Page 16: Learn Ruby by Reading the Source

But given a ‘VALUE’, we don’tknow which type we have.

How does ruby know?

Page 17: Learn Ruby by Reading the Source

Every object has an RBasic

struct RBasic { VALUE flags; VALUE klass;}

And the object type is stored inside flags.

Page 18: Learn Ruby by Reading the Source

Given an object of unknown type...

struct αѕgєנqqωσ { struct RBasic basic; ιηт נѕƒкq; // ??? ƒנє σтнנ¢є; // ???}

We can extract the type from ‘basic’, which is guaranteed to be the first struct member.

VALUE a

Page 19: Learn Ruby by Reading the Source

e.g. if the type is T_STRING,struct RString { struct RBasic basic; union { struct { long len; ...

then we know it’s a `struct RString`.

Page 20: Learn Ruby by Reading the Source

Every* type corresponds to a struct type, which ALWAYShas an RBasic as the firststruct member.

* exceptions for immediate values

Page 21: Learn Ruby by Reading the Source

There are custom types forprimitives, mostly to make them faster.

Page 22: Learn Ruby by Reading the Source

The special-case primitivetypes aren’t particularlysurprising or interesting.

Page 23: Learn Ruby by Reading the Source

T_STRING => RString RBasic, string data, length.

T_ARRAY => RArray RBasic, array data, length.

T_HASH => RHash RBasic, hashtable.

...and so on.

Page 24: Learn Ruby by Reading the Source

T_OBJECT (struct RObject)is pretty interesting.

It’s what’s used for instances of any classes you define, or most of the standard library.

Page 25: Learn Ruby by Reading the Source

TL;DR: Instance Variables.

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

This makes sense; an instance of a class has its own data, and nothing else.

Page 26: Learn Ruby by Reading the Source

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

It stores the number of instance variables

Page 27: Learn Ruby by Reading the Source

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

And a pointer to a hashtable containing the instance variables

Page 28: Learn Ruby by Reading the Source

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

This is a shortcut to the class variables of the object’s class.

You could get the same result by looking it up onbasic.klass (coming up right away)

Page 29: Learn Ruby by Reading the Source

struct RObject { struct RBasic basic; long numiv; VALUE *ivptr; struct st_table *iv_index_tbl;}

This definition is actually slightly simplified. I omitted another performance optimization for

readability.

Go read the full one after this talk if you’re so inclined!

Page 30: Learn Ruby by Reading the Source

Class and Module types

Page 31: Learn Ruby by Reading the Source

struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

Classes have instance variables (ivars),class variables (cvars), methods, and a superclass.

Page 32: Learn Ruby by Reading the Source

struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

This is where the methods live.

st_table is the hashtable implementation ruby uses internally.

Page 33: Learn Ruby by Reading the Source

struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

Class variables live in iv_index_tbl.

Page 34: Learn Ruby by Reading the Source

struct RClass { struct RBasic basic; rb_classext_t *ptr; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

struct rb_classext_struct { VALUE super; struct st_table *iv_tbl; struct st_table *const_tbl;}typedef struct rb_classext_struct \ rb_classext_t;

Page 35: Learn Ruby by Reading the Source

struct rb_classext_struct { VALUE super; struct st_table *iv_tbl; struct st_table *const_tbl;}

The superclass, instance variables, and constants defined inside the class.

Page 36: Learn Ruby by Reading the Source

struct RClass { struct RBasic basic; VALUE super; struct st_table *iv_tbl; struct st_table *const_tbl; struct st_table *m_tbl; struct st_table *iv_index_tbl;}

It ends up looking kinda like:

...though this isn’t really valid because rb_classext_t is referred to by a pointer.

Page 37: Learn Ruby by Reading the Source

struct RClass { struct RBasic basic; VALUE super; (st) *iv_tbl; (st) *const_tbl; (st) *m_tbl; (st) *iv_index_tbl;}

So classes have:

* RBasic* superclass* instance vars.* constants* methods* class vars.

Page 38: Learn Ruby by Reading the Source

Modules

Page 39: Learn Ruby by Reading the Source

#define RCLASS(obj) (R_CAST(RClass)(obj))#define RMODULE(obj) RCLASS(obj)

Same underlying type (struct RClass) as a class

...just has different handling in a few code paths.

Page 40: Learn Ruby by Reading the Source

Immediate values

Page 41: Learn Ruby by Reading the Source

Sort of complicated.

Page 42: Learn Ruby by Reading the Source

For an integer N:The fixnum representation is:

2N + 1

Page 43: Learn Ruby by Reading the Source

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

Page 44: Learn Ruby by Reading the Source

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

A pointer is basically just a big integer, with a number referring to a memory address.

Page 45: Learn Ruby by Reading the Source

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

Remember how a VALUE is mostly a pointer?These tiny addresses are in the kernel space

in a process image, which means they’re unaddressable.

So ruby uses them to refer to special values.

Page 46: Learn Ruby by Reading the Source

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

Any VALUE equal to 0 is false, 2 is true, 4 is nil, and 6 is a special value only used internally.

Page 47: Learn Ruby by Reading the Source

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

Integers and Symbols work on the principle that memory is never allocated without 4-byte

alignment.

Page 48: Learn Ruby by Reading the Source

enum ruby_special_consts { RUBY_Qfalse = 0, RUBY_Qtrue = 2, RUBY_Qnil = 4, RUBY_Qundef = 6,

RUBY_IMMEDIATE_MASK = 0x03, RUBY_FIXNUM_FLAG = 0x01, RUBY_SYMBOL_FLAG = 0x0e, RUBY_SPECIAL_SHIFT = 8};

Any odd VALUE > 0 is a Fixnum.

An even VALUE not divisible by 4 might be a Symbol.

Page 49: Learn Ruby by Reading the Source

Symbols are just integers.

Page 50: Learn Ruby by Reading the Source

There is a global table mapping Symbol IDs to the strings they

represent.

Page 51: Learn Ruby by Reading the Source

Symbols are immediates because their IDs are stored in VALUE, and looked up in the symbol

table for display.

Page 52: Learn Ruby by Reading the Source

CLASS INHERITANCE

Page 53: Learn Ruby by Reading the Source

We have a pretty good picture of how values are represented; now we’re going to talk about how

they interact.

Page 54: Learn Ruby by Reading the Source

class Language @@random_cvar = true attr_reader :name def initialize(name) @name = name endend

basic.klass

ptr->super

iv_tbl

const_tbl

m_tbl

iv_index_tbl

Class

Object

{}

{}

{name: #<M>, initialize: #<M>}

{@@random_cvar: true}

Page 55: Learn Ruby by Reading the Source

class Ruby < Language CREATOR = :matz @origin = :japanend

basic.klass

ptr->super

iv_tbl

const_tbl

m_tbl

iv_index_tbl

Class

Language

{@origin: :japan}

{CREATOR: :matz}

{} # NB. Empty!

{} # NB. Empty!

Page 56: Learn Ruby by Reading the Source

When you subclass, you create a new RClass with

super=(parent) and klass=Class

Page 57: Learn Ruby by Reading the Source

When you instantiate a class, you create a new RObject with

klass=(the class)

Page 58: Learn Ruby by Reading the Source
Page 59: Learn Ruby by Reading the Source
Page 60: Learn Ruby by Reading the Source
Page 61: Learn Ruby by Reading the Source

Method lookup

Page 62: Learn Ruby by Reading the Source
Page 63: Learn Ruby by Reading the Source

Class methods

class Foo def bar :baz endend

Foo.new.bar

class Foo def self.bar :baz endend

Foo.baz

We know howthis works now.

But how doesthis work?

Page 64: Learn Ruby by Reading the Source

SINGLETON CLASSES

Page 65: Learn Ruby by Reading the Source

class Klass def foo; endendobj = Klass.newdef obj.bar ; end

Image borrowed from Ruby Hacking Guide

ptr->super(superclass)

basic.klass(class)

Page 66: Learn Ruby by Reading the Source

Singleton classes get type T_ICLASS.

T_ICLASS objects are never*returned to ruby-land methods.

*for sufficiently loose definitions of “never”

Page 67: Learn Ruby by Reading the Source

class A def foo ; endendclass B < A def self.bar ; endend

Image borrowed from Ruby Hacking Guide

ptr->super(superclass)

basic.klass(class)

Page 68: Learn Ruby by Reading the Source

class A def foo ; endendclass B < A def self.bar ; endend

Image borrowed from Ruby Hacking Guide

ptr->super(superclass)

basic.klass(class)

Page 69: Learn Ruby by Reading the Source

MODULE INHERITANCE

Page 70: Learn Ruby by Reading the Source
Page 71: Learn Ruby by Reading the Source
Page 72: Learn Ruby by Reading the Source
Page 73: Learn Ruby by Reading the Source

MRI SOURCE SPELUNKING

Page 74: Learn Ruby by Reading the Source

First, check out the source

Page 75: Learn Ruby by Reading the Source

github.com/ruby/ruby

Page 76: Learn Ruby by Reading the Source
Page 77: Learn Ruby by Reading the Source
Page 78: Learn Ruby by Reading the Source

google “<your editor> ctags”

Page 79: Learn Ruby by Reading the Source

CASE STUDY:How does Array#cycle work?

Page 80: Learn Ruby by Reading the Source

brb live demo

Page 81: Learn Ruby by Reading the Source

Builtin types have a <type>.c(string.c, array.c, proc.c, re.c, etc.)

Page 82: Learn Ruby by Reading the Source

Interesting methods tend to be in those files

Page 83: Learn Ruby by Reading the Source

They are always present inside double quotes

(easy to search for)

Page 84: Learn Ruby by Reading the Source

The next parameter after the string is the C function name

Page 85: Learn Ruby by Reading the Source

e.g. Search for “upcase” (with the quotes) in string.c and follow the

chain.

Page 86: Learn Ruby by Reading the Source

Most of the supporting VM internals are in vm_*.c

Page 87: Learn Ruby by Reading the Source

Garbage collection is in gc.c

Page 88: Learn Ruby by Reading the Source

Don’t look at parse.y.Trust me.

Page 89: Learn Ruby by Reading the Source

Almost all of the stuff we’ve looked at today is in object.c,

class.c, or ruby.h

Page 90: Learn Ruby by Reading the Source

I mostly look up definitions of built-in methods

Page 91: Learn Ruby by Reading the Source

Further reading:

Ruby under a Microscopehttp://patshaughnessy.net/ruby-under-a-microscope

Ruby Hacking Guidehttp://ruby-hacking-guide.github.io/

Page 92: Learn Ruby by Reading the Source

Thanks, questions?