Upload
aman-gupta
View
42
Download
0
Tags:
Embed Size (px)
Citation preview
Garbage Collection and the Ruby Heap
Joe Damato and Aman Gupta
@joedamato @tmm1
CMU/VMWare alum
memprof, ltrace-libdl, performance improvements to REE
http://timetobleed.com
@joedamato
About Joe Damato
About Aman Gupta
San Francisco, CA
Ruby Hero 2009
EventMachine, amqp, REE, sinbook, perftools.rb, gdb.rb
github.com/tmm1
@tmm1
Why Garbage Collection?
We use Ruby because it’s simple and elegant
the GC is designed to make your life easier
how is it easier? no more:
memory management
memory leaks
not convinced? let’s look at some C code...
C code vs Ruby codevoid func() { char *stack = "hello"; char *heap = malloc(6); strncpy(heap, "world", 5); free(heap);}
def func local = "hello" @instance = "world"end
memory explicitly allocated on either the stack or the heap
local variables usually live on the stack
heap allocated memory must be free()d or it will leak
no concept of stack allocated variables
even local variables live on the heap
no way to explicitly free memory
bytes on stack bytes on heap
Recap: Stack vs Heap
func1() void *data; func2();
4 bytes
bytes on stack bytes on heap
Recap: Stack vs Heap
func1() void *data; func2();
func2() char *string = func3(); free(string);
4 bytes
4 bytes
bytes on stack bytes on heap
Recap: Stack vs Heap
func1() void *data; func2();
func2() char *string = func3(); free(string);
char *func3()char buffer[8];char *string = malloc(10);
return string;
12 bytes
4 bytes
4 bytes
bytes on stack bytes on heap
Recap: Stack vs Heap
func1() void *data; func2();
func2() char *string = func3(); free(string);
char *func3()char buffer[8];char *string = malloc(10);
return string;
12 bytes
4 bytes
4 bytes
10 bytes
bytes on stack bytes on heap
Recap: Stack vs Heap
func1() void *data; func2();
func2() char *string = func3(); free(string);
4 bytes
4 bytes
10 bytes
bytes on stack bytes on heap
Recap: Stack vs Heap
func1() void *data; func2();
4 bytes
bytes on stack bytes on heap
Recap: Stack vs Heap
Ruby Objects (in MRI)
always allocated on the heap (even local variables)
fixed size structure
sizeof(struct RVALUE) = 40 bytes
“allocated” in gc.c’s rb_newobj()
“freed” in gc.c’s add_freelist() via garbage_collect()
let’s look at some code...
VALUErb_newobj(){ VALUE obj;
if (during_gc) rb_bug("allocation during GC");
if (!freelist) garbage_collect();
obj = (VALUE)freelist; freelist = freelist->as.free.next; MEMZERO((void*)obj, RVALUE, 1); return obj;}
return new object
force GC if freelist is empty
pull object off the freelist
rb_newobj creates a new ruby object
VALUErb_newobj(){ VALUE obj;
if (during_gc) rb_bug("allocation during GC");
if (!freelist) garbage_collect();
obj = (VALUE)freelist; freelist = freelist->as.free.next; MEMZERO((void*)obj, RVALUE, 1); return obj;}
static inline voidadd_freelist(p) RVALUE *p;{ p->as.free.flags = 0; p->as.free.next = freelist; freelist = p;}
add_freelist frees an existing object
add object to top of freelist
The Ruby heap sort of resembles a slab allocator
Ruby allocates a slab by calling malloc
This space is carved up into fixed size slots for holding Ruby objects
You can get an unused object from the Ruby heap by calling rb_newobj
If there are no objects available, GC is run
If there are still no objects available, another slab is created
The Ruby heap
Heaps on top of heaps
The Freelistrb_newobj() tries to pull a free slot off the freelist
the freelist is a linked list across slots on the ruby
heap
The Freelistrb_newobj() tries to pull a free slot off the freelist
the freelist is a linked list across slots on the ruby
heap
The Freelistrb_newobj() tries to pull a free slot off the freelist
the freelist is a linked list across slots on the ruby
heap
The Freelistrb_newobj() tries to pull a free slot off the freelist
the freelist is a linked list across slots on the ruby
heap
The Freelist
if the freelist is empty, GC is run
rb_newobj() tries to pull a free slot off the freelist
the freelist is a linked list across slots on the ruby
heap
The Freelist
if the freelist is empty, GC is run
GC finds non-reachable objects and adds them to the freelist
rb_newobj() tries to pull a free slot off the freelist
the freelist is a linked list across slots on the ruby
heap
The Freelist
if the freelist is empty, GC is run
GC finds non-reachable objects and adds them to the freelist
rb_newobj() tries to pull a free slot off the freelist
the freelist is a linked list across slots on the ruby
heap
if the freelist is still empty (all slots were in use)
The Freelist
if the freelist is empty, GC is run
GC finds non-reachable objects and adds them to the freelist
rb_newobj() tries to pull a free slot off the freelist
the freelist is a linked list across slots on the ruby
heap
if the freelist is still empty (all slots were in use)
another heap is allocated
all the slots on the new heap are added to the freelist
but what’s inside these slots...?
MRI Heap slots are RVALUEs
typedef struct RVALUE { union { struct { unsigned long flags; struct RVALUE *next; } free; struct RBasic basic; struct RObject object; struct RClass klass; struct RFloat flonum; struct RString string; struct RArray array; struct RRegexp regexp; struct RHash hash; struct RData data; struct RStruct rstruct; struct RBignum bignum; struct RFile file; struct RNode node; struct RMatch match; struct RVarmap varmap; struct SCOPE scope; } as;} RVALUE;
can be one of many different types of ruby objects (uses a C union)
union is called as, so you can do
obj->as.string
union contains free section for unused slots
obj->free.next points to the next free slot for the freelist
RBasic is a basic ruby objectstruct RBasic { unsigned long flags; VALUE klass;};
all objects have flags
flags == 0 means unused slot
flags contains information about the type of object (T_STRING, T_FLOAT, etc)
#define T_NONE 0x00
#define T_NIL 0x01#define T_OBJECT 0x02#define T_CLASS 0x03#define T_ICLASS 0x04#define T_MODULE 0x05#define T_FLOAT 0x06#define T_STRING 0x07#define T_REGEXP 0x08#define T_ARRAY 0x09#define T_FIXNUM 0x0a#define T_HASH 0x0b#define T_STRUCT 0x0c#define T_BIGNUM 0x0d#define T_FILE 0x0e
#define T_TRUE 0x20#define T_FALSE 0x21#define T_DATA 0x22#define T_MATCH 0x23#define T_SYMBOL 0x24
#define T_BLKTAG 0x3b#define T_UNDEF 0x3c#define T_VARMAP 0x3d#define T_SCOPE 0x3e#define T_NODE 0x3f
RString is for Stringstruct RString { struct RBasic basic; long len; char *ptr; union { long capa; VALUE shared; } aux;};
if obj->as.basic.flags contains T_STRING, you can interpret the slot as a RString
RString “extends” RBasic by including it and adding additional fields
slot for ruby object is fixed width, but obj->as.string.ptr points to variable sized memory on the heap holding the actual string data
strings can also point to another obj->as.string.aux.shared object instead of making a copy of the string data
RString is for Stringstruct RString { struct RBasic basic; long len; char *ptr; union { long capa; VALUE shared; } aux;};
if obj->as.basic.flags contains T_STRING, you can interpret the slot as a RString
RString “extends” RBasic by including it and adding additional fields
slot for ruby object is fixed width, but obj->as.string.ptr points to variable sized memory on the heap holding the actual string data
strings can also point to another obj->as.string.aux.shared object instead of making a copy of the string data
10.times{"abc"}will use up 10 slots on the ruby heap, but they’ll all point to the same string “abc” on the heap
RClass is for Class/Modulemodules are just classes as far as MRI is concerned
classes contain a m_tbl
contains pointers to method bodies
has a super class which is used in method lookup
also contain an iv_tbl
actually holds instance vars, class vars and constants
struct RClass { struct RBasic basic; struct st_table *iv_tbl; struct st_table *m_tbl; VALUE super;};
RNode is for your coderuby code is stored on the heap like any other object
allows code to be dynamically added and removed at runtime
MRI has over 130 different types of nodes
including a NODE_NEWLINE for newline or semicolon in your codebase
nodes point to other objects created during code parse
a literal in your code creates a NODE_LIT that points to a RString/RFloat/RRegexp
strings are special: new slot with shared pointer used every time a string is evaluated
floats/regexp/etc are created only once upfront during parse and reused during evaluation
enum node_type { NODE_METHOD, NODE_FBODY, NODE_CFUNC, NODE_SCOPE, NODE_BLOCK, NODE_IF, NODE_CASE, NODE_WHEN, NODE_WHILE, NODE_UNTIL, NODE_ITER, NODE_FOR, NODE_BREAK, NODE_NEXT, NODE_HASH, NODE_RETURN, NODE_STR, NODE_SPLAT, NODE_TO_ARY, NODE_CLASS, NODE_MODULE, NODE_SELF, NODE_NIL, NODE_TRUE, NODE_FALSE, NODE_DEFINED, NODE_NEWLINE, ...};
And many more...For details about hashes, arrays, floats, blocks, fixnums, symbols and many other types of ruby objects in MRI, see:
http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/
...so how are all these objects garbage collected?
Finding Garbage
MRI uses a
Finding Garbageconservative
Finding Garbagestop the world
Finding Garbagemark and sweep
Finding Garbage
garbage collector.
conservative
MRI has a conservative GC
Raw pointers are handed to C extensions
When scanning the Ruby process stack it must assume that anything that looks like a pointer to a Ruby object is a pointer to a Ruby object
stop the world
MRI’s GC stops the world
MRI uses a “big hammer” to put the system into a quiescent state
No Ruby code can run while GC is running
mark and sweep
MRI’s GC is a naïve mark and sweep collector
The collection cycle is broken up into a mark phase and a sweep phase
All objects still in use are marked
Any unmarked objects are swept away
The mark phase
During the mark phase, the garbage collector walks the entire object graph
starts at the root objects:
global variables, top level constants, threads, etc
follows all references recursively
Since raw pointers are handed out, GC needs to examine everything:
CPU registers, program stack, and thread stacks
The sweep phase
The sweep phase is pretty simple
Walk the Ruby heap and add unmarked objects to the freelist
Reset the mark flag for the next GC run
Must iterate over every slot on each slab of the heap
MRI’s GC tradeoffs
MRI’s GC is very simple
The implementation is relatively short and straightforward
however, the simple design of the system makes more advanced GC techniques difficult or impossible to implement without breaking compatibility with C extensions
Alternative Approaches
GC Algorithms
precise
incremental / concurrent
tri-color
generational
copying
Memory Management
explicit (malloc/free)
reference counting (python/perl/php)
Tri-Color GC
Tri-Color
Put objects into 3 different groups (colors)
Objects are moved from group to group as they are scanned by the GC
GC can free the recyclable group
Avoids walking the entire object tree and Ruby heap each GC cycle.
Moving/Copying GC
Used in conjunction with tri-color
Moving GC relocates reachable objects
Once an entire memory region (a slab) has no reachable objects left, the entire region is freed
Makes other lower level optimizations possible
But MRI can’t move objects
To get the full benefit of tri-color, objects need to be moved
MRI can’t move objects because objects are raw C pointers
Can use write barriers, but not without either:
breaking binary compatibility
being really slow
Generational GCGenerational GC algorithms split objects into groups based on their age
The key axiom of this algorithm is:
Freshly hatched objects are more likely to become garbage than older objects that have been around for a while
Any of the younger objects that are referenced by an older object can get promoted to an older group
The younger group can then be destroyed
MRI can fake GenerationalThe “long life GC” patch attempts to do something similar
“long life GC” moves RNode objects onto a separate heap so they are not scanned in each mark and free cycle
makes a big difference since code is a large part of the Ruby heap
still can’t take full advantage because it can’t move objects between generations
...fixing the GC is hard, can we make mark/sweep faster?
Tuning the GCRuby Enterprise Edition contains a GC tuning patch
We use:
RUBY_GC_MALLOC_LIMIT=60000000
RUBY_HEAP_MIN_SLOTS=500000
RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
RUBY_HEAP_SLOTS_INCREMENT=1
malloc_limit = 60MBforce garbage collection after every N bytes worth of calls to malloc or realloc
defaults to 8MB
high traffic ruby servers can easily allocate and free more than 8mb in a single request
gc.c’s ruby_xmalloc wrapper used by internal objects such as String, Array and Hash
void *ruby_xmalloc(size) long size;{ void *mem; if (malloced > malloc_limit) garbage_collect();
mem = malloc(size); malloced += size;
return mem;}
HEAP_MIN_SLOTS = 500k
defaults to 10k
number of slots in the first slab
a new rails app boots up with almost 500k objects on the heap (mostly nodes)
(gdb) ruby objects nodes 20996 NODE_CONST 21620 NODE_SCOPE 26329 NODE_LASGN 26747 NODE_STR 33178 NODE_METHOD 40678 NODE_LIT 79046 NODE_LVAR 90646 NODE_NEWLINE 95758 NODE_BLOCK 107357 NODE_CALL 150298 NODE_ARRAY
HEAP_SLOTS_GROWTH = 1
defaults to 1.8x
each new slab is almost twice as big as the last
normal growth:
10k
10k + 18k = 28k
10k + 18k + 36k = 64k
tuned growth:
500k
500k + 500k = 1M
...do I need to tune my GC?
You can use ltrace to measure (among other things) GC performance
The system’s ltrace will work, but the output is noisy
git://github.com/ice799/ltrace.git
flags to quiet the output
libdl support
backtrace support
and more.
Measuring GC performance
ltrace -F ltrace.conf -ttTg -x garbage_collect ruby gc.rb
15:39:22.637185 garbage_collect() = <void> <0.002420> 15:39:22.650797 garbage_collect() = <void> <0.005480> 15:39:22.677607 garbage_collect() = <void> <0.012134> 15:39:22.729645 garbage_collect() = <void> <0.024849> 15:39:22.828402 garbage_collect() = <void> <0.048067> 15:39:23.007304 garbage_collect() = <void> <0.089344> 15:39:23.339801 garbage_collect() = <void> <0.163595> 15:39:23.929944 garbage_collect() = <void> <0.297686>
GC can get pretty slow, even after tuning......so let’s reduce the # of objects to mark and sweep
Measuring GC performance
Ruby memory leaksNot your classic memory leak
classic memory leak: call malloc, but never call free
These are reference leaks
object A holds references to objects B and C
the result is objects B and C (and their data) is never freed
As long as anyone is holding a reference to an object, that object can not be freed
This dependency recurses
The leaked reference may be an object holding refs to other objects, which hold references to other objects, which hold ...
Ruby reference leaks
As long as someone, somewhere is holding a reference to this instance of classA, all the objects in this picture can not be freed
This could add up to a lot of memory very fast
How can we track down these reference leaks?
gdb.rb: gdb hooks for REE
http://github.com/tmm1/gdb.rb
attach to a running REE process and inspect the heap
number of nodes by typenumber of objects by classnumber of strings by contentnumber of arrays/hash by size
uses gdb7 + python scripting
linux only
(gdb) ruby objects strings 140 u 'lib' 158 u '0' 294 u '\n' 619 u ''
30503 unique strings 3187435 bytes
(gdb) ruby objects HEAPS 8 SLOTS 1686252 LIVE 893327 (52.98%) FREE 792925 (47.02%)
scope 1641 (0.18%) regexp 2255 (0.25%) data 3539 (0.40%) class 3680 (0.41%) hash 6196 (0.69%) object 8785 (0.98%) array 13850 (1.55%) string 105350 (11.79%) node 742346 (83.10%)
fixing a leak in rails_warden(gdb) ruby objects classes 1197 MIME::Type 2657 NewRelic::MetricSpec 2719 TZInfo::TimezoneTransitionInfo 4124 Warden::Manager 4124 MethodOverrideForAll 4124 AccountMiddleware 4124 Rack::Cookies 4125 ActiveRecord::ConnectionAdapters::ConnectionManagement 4125 ActionController::Session::CookieStore 4125 ActionController::Failsafe 4125 ActionController::ParamsParser 4125 Rack::Lock 4125 ActionController::Dispatcher 4125 ActiveRecord::QueryCache 4125 ActiveSupport::MessageVerifier 4125 Rack::Head
middleware chain leaking per request
god memory leaks
(gdb) ruby objects arrays elements instances 94310 3 94311 3 94314 2 94316 1 5369 arrays 2863364 member elements
arrays with 94k+ elements!
(gdb) ruby objects classes 43 God::Process 43 God::Watch 43 God::Driver 43 God::DriverEventQueue 43 God::Conditions::MemoryUsage 43 God::Conditions::ProcessRunning 43 God::Behaviors::CleanPidFile 45 Process::Status 86 God::Metric327 God::System::SlashProcPoller327 God::System::Process406 God::DriverEvent
useful, but you can’t tell where the objects came from...
bleak_house
http://github.com/fauna/bleak_house
installs a custom patched ruby
enables GC_DEBUG to track file/line in rb_newobjincreases size of RVALUE slots by 16 bytes
better than gdb.rb- you can see where the leaking object was allocated
but, can’t run it in production without overhead
191691 total objects Final heap size 191691 filled, 220961 free Displaying top 20 most common line/class pairs 89513 __null__:__null__:__node__ 41438 __null__:__null__:String 2348 site_ruby/1.8/rubygems/specification.rb:557:Array 1508 gems/1.8/specifications/gettext-1.9.gemspec:14:String
memprof
git://github.com/ice799/memprof.git
replacement for gdb.rb and bleak_house
requires no patches to the ruby VMsimply gem install and require ‘memprof’
well, not yet; still a work in progress
mostly works on x86_64 linuxalmost works on ruby 1.9kind of works on osx
memprof under the hoodrewrites your Ruby binary in memory (while its running)
injects short trampolines for all calls to rb_newobj() and add_freelist() to do tracking
uses libdwarf and libelf to access VM internals like the ruby heap slabs
uses libyajl to dump out ruby objects as json
http://timetobleed.com/string-together-global-offset-tables-to-build-a-ruby-memory-profiler/http://timetobleed.com/memprof-a-ruby-level-memory-profiler/http://timetobleed.com/what-is-a-ruby-object-introducing-memprof-dump/http://timetobleed.com/hot-patching-inlined-functions-with-x86_64-asm-metaprogramming/http://timetobleed.com/rewrite-your-ruby-vm-at-runtime-to-hot-patch-useful-features/
plugging a leak in rails3in dev mode, rails3 is leaking 10mb per request
# in environment.rbrequire 'memprof'Memprof.starttrap('USR2'){ pid = Process.pid fork{ # fork to prevent blocking the app Memprof.dump_all("#{pid}-#{Time.now.to_i}.json") exit! }}
let’s use memprof to find it!
plugging a leak in rails3
tell memprof to dump out the entire heap to json$ kill -USR2 3372
$ mongoimport -h localhost -d memprof --drop -c rails --file 3372-1266658113.json
import the heap dump to mongodb
$ monogo localhost/memprofconnect to mongo
> db.rails.count()809816
count the number of objects
$ ab -c 1 -n 50 http://localhost:3000/send the app some requests so it leaks
plugging a leak in rails3
> db.rails.group({ key:{file:true}, initial:{count:0}, reduce: function(d,o){ o.count++ } })
find files with the most objects
> db.rails.find({type:"class",name:"ApplicationController"}).count()50
application_controller.rb is leaking.. lets find that class
> db.rails.find({type:"class",name:/Controller$/}).count()250
is it just ApplicationController?
aha! one ApplicationController leaked per request
nope!
plugging a leak in rails3
> db.rails.findOne({type:"class",name:"AccountsController"})._id0x3b56780
find one of the leaked controllers
$ grep 0x3b56780 3372-1266658113.json
find out what’s referencing it
{"_id":"0x4a8e6d0","file":"actionpack-3.0.0.beta/lib/abstract_controller/localized_cache.rb","line":3,"type":"hash","length":21}
{"_id":"0x4c78540","file":"actionpack-3.0.0.beta/lib/action_controller/metal.rb","line":74,"type":"hash","length":21}
{"_id":"0x29be3b0","type":"hash","length":21}
plugging a leak in rails3
$ grep 0x29be3b0 3372-1266658113.json{"type":"class","name":"ActionView::Partials::PartialRenderer","ivars":{"PARTIAL_NAMES":"0x29be3b0"}}
figure out what the third leak is
first two are leaks!module ActionController class Metal < AbstractController::Base class ActionEndpoint @@endpoints = Hash.new {|h,k| h[k] = Hash.new {|sh,sk| sh[sk] = {} } }module AbstractController class HashKey @hash_keys = Hash.new {|h,k| h[k] = Hash.new {|sh,sk| sh[sk] = {} } }
dev mode enables source reload, but globals holding refs to old controllers!
module ActionView module Partials class PartialRenderer PARTIAL_NAMES = Hash.new {|h,k| h[k] = {} }
memprofstill a long and manual process, but memprof provides all the data to make debugging memory issues possible
coming soon: memprof.com
a web-based heap visualizer and leak analyzer
as a user, you simply:
gem install memprofmemprof MY_RAILS_APP_PIDvisit http://memprof.com/c4e4d3eb0e18see line numbers where your app is leaking
Questions?
Joe Damato
@joedamato
timetobleed.com
Aman Gupta
@tmm1
github.com/tmm1