87
Debugging Ruby with MongoDB Aman Gupta @tmm1

Debugging Ruby with MongoDB

Embed Size (px)

Citation preview

Page 1: Debugging Ruby with MongoDB

Debugging Rubywith MongoDB

Aman Gupta@tmm1

Page 2: Debugging Ruby with MongoDB

Ruby developers know...

Page 3: Debugging Ruby with MongoDB

Rubyis

fatboyke (flickr)

Page 4: Debugging Ruby with MongoDB

Ruby loves eating RAM

37prime (flickr)

Page 5: Debugging Ruby with MongoDB

ruby allocates memory from the OS

memory is broken up into slots

each slot holds one ruby object

Page 6: Debugging Ruby with MongoDB

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

Page 7: Debugging Ruby with MongoDB

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

Page 8: Debugging Ruby with MongoDB

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

Page 9: Debugging Ruby with MongoDB

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

Page 10: Debugging Ruby with MongoDB

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

Page 11: Debugging Ruby with MongoDB

if the freelist is empty, GC is run

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

Page 12: Debugging Ruby with MongoDB

if the freelist is empty, GC is run

GC finds non-reachable objects and adds them to the freelist

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

Page 13: Debugging Ruby with MongoDB

if the freelist is empty, GC is run

GC finds non-reachable objects and adds them to the freelist

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

if the freelist is still empty (all slots were in use)

Page 14: Debugging Ruby with MongoDB

if the freelist is empty, GC is run

GC finds non-reachable objects and adds them to the freelist

when you need an object, it’s pulled off the freelist

a linked list called the ‘freelist’ points to all the

empy slots on the ruby heap

if the freelist is still empty (all slots were in use)

another heap is allocated

all the slots on the new heap are added to the freelist

Page 15: Debugging Ruby with MongoDB

turns out,

Ruby’s GC is

also one of the

reasons it can be so

slowantphotos (flickr)

Page 16: Debugging Ruby with MongoDB

Matz’ Ruby Interpreter (MRI 1.8)has a...

john_lam (flickr)

Page 17: Debugging Ruby with MongoDB

Conservativelifeisaprayer (flickr)

Page 18: Debugging Ruby with MongoDB

Stopthe

Worldbenimoto (flickr)

Page 19: Debugging Ruby with MongoDB

Markand

Sweepmichaelgoodin (flickr)

Page 20: Debugging Ruby with MongoDB

Garbage Collector

kiksbalayon (flickr)

Page 21: Debugging Ruby with MongoDB

•conservative: the VM hands out raw pointers to ruby objects

•stop the world: no ruby code can execute during GC

•mark and sweep: mark all objects in use, sweep away unmarked objects

Page 22: Debugging Ruby with MongoDB

more objects=

longer GC

mckaysavage (flickr)

Page 23: Debugging Ruby with MongoDB

longer GC=

less time to run your ruby code

kgrocki (flickr)

Page 24: Debugging Ruby with MongoDB

fewer objects=

better performance

januskohl (flickr)

Page 25: Debugging Ruby with MongoDB

improve performance1. remove unnecessary object allocations

object allocations are not free

Page 26: Debugging Ruby with MongoDB

improve performance1. remove unnecessary object allocations

object allocations are not free

2. avoid leaked referencesnot really memory ‘leaks’

you’re holding a reference to an object you no longer need. GC sees the reference, so it keeps the object around

Page 27: Debugging Ruby with MongoDB

the GC follows

references recursively, so a reference

to classA will ‘leak’ all these objects

Page 28: Debugging Ruby with MongoDB

let’s build a debugger

• step 1: collect data

• list of all ruby objects in memory

• step 2: analyze data

• group by type

• group by file/line

Page 29: Debugging Ruby with MongoDB

• simple patch to ruby VM (300 lines of C)

• http://gist.github.com/73674

• simple text based output format

0x154750 @ -e:1 is OBJECT of type: T0x15476c @ -e:1 is HASH which has data0x154788 @ -e:1 is ARRAY of len: 00x1547c0 @ -e:1 is STRING (SHARED) len: 2 and val: hi0x1547dc @ -e:1 is STRING len: 1 and val: T0x154814 @ -e:1 is CLASS named: T inherits from Object0x154a98 @ -e:1 is STRING len: 2 and val: hi0x154b40 @ -e:1 is OBJECT of type: Range

version 1: collect data

Page 30: Debugging Ruby with MongoDB

version 1: analyze data$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

Page 31: Debugging Ruby with MongoDB

version 1: analyze data

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316

$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

Page 32: Debugging Ruby with MongoDB

version 1: analyze data

$ cat /tmp/ruby.heap | awk '{ print $3 }' | sort | uniq -c | sort -g | tail -1

 236840 memcached/memcached.rb:316

$ grep "memcached.rb:316" /tmp/ruby.heap | awk '{ print $5 }' | sort | uniq -c | sort -g | tail -5

   10948 ARRAY   20355 OBJECT   30744 DATA  64952 HASH  123290 STRING

$ wc -l /tmp/ruby.heap

 1571529 /tmp/ruby.heap

Page 33: Debugging Ruby with MongoDB

version 1

• it works!

• but...

• must patch and rebuild ruby binary

• no information about references between objects

• limited analysis via shell scripting

Page 34: Debugging Ruby with MongoDB

• better data format

• simple: one line of text per object

• expressive: include all details about object contents and references

• easy to use: easy to generate from C code & easy to consume from various scripting languages

version 2 goals

Page 35: Debugging Ruby with MongoDB

equanimity (flickr)

Page 36: Debugging Ruby with MongoDB

version 2 is memprof• no patches to ruby necessary

• gem install memprof

• require ‘memprof’

• Memprof.dump_all(“/tmp/app.json”)

• C extension for MRI ruby VMhttp://github.com/ice799/memprof

• uses libyajl to dump out all ruby objects as json

Page 37: Debugging Ruby with MongoDB

{ "_id": "0x19c610",

"file": "file.rb", "line": 2,

"type": "string", "class": "0x1ba7f0", "class_name": "String",

"length": 10, "data": "helloworld"}

memory address of object

file and line where string was created

length and contentsof this string instance

address of the class “String”

stringsMemprof.dump{ "hello" + "world"}

Page 38: Debugging Ruby with MongoDB

floats and strings are separate ruby objects

{ "_id": "0x19c5c0",

"class": "0x1b0d18", "class_name": "Array",

"length": 4, "data": [ 1, ":b",

"0x19c750", "0x19c598" ]}

integers and symbols are stored in the array itself

arraysMemprof.dump{ [ 1, :b, 2.2, "d" ]}

Page 39: Debugging Ruby with MongoDB

hashes{ "_id": "0x19c598",

"type": "hash", "class": "0x1af170", "class_name": "Hash",

"default": null,

"length": 2, "data": [ [ ":a", 1 ], [ "0xc728", "0xc750" ] ]}

hash entries as key/value pairs

no default proc

Memprof.dump{ { :a => 1, "b" => 2.2 }}

Page 40: Debugging Ruby with MongoDB

classesMemprof.dump{ class Hello @@var=1 Const=2 def world() end end}

{ "_id": "0x19c408",

"type": "class", "name": "Hello", "super": "0x1bfa48", "super_name": "Object",

"ivars": { "@@var": 1, "Const": 2 }, "methods": { "world": "0x19c318" }}

class variables and constants are stored in the instance variable table

superclass object reference

references to method objects

Page 41: Debugging Ruby with MongoDB

version 2: memprof.coma web-based heap visualizer and leak analyzer

Page 42: Debugging Ruby with MongoDB

built on...

$ mongoimport -d memprof -c rails --file /tmp/app.json$ mongo memprof

let’s run some queries.

Page 43: Debugging Ruby with MongoDB

thaths (flickr)

how many objects?

Page 44: Debugging Ruby with MongoDB

how many objects?> db.rails.count()809816

• ruby scripts create a lot of objects

• usually not a problem, but...

• MRI has a naïve stop-the-world mark/sweep GC

• fewer objects = faster GC = better performance

Page 45: Debugging Ruby with MongoDB

brettlider (flickr)

what types of objects?

Page 46: Debugging Ruby with MongoDB

what types of objects?> db.rails.distinct(‘type’)

[‘array’, ‘bignum’, ‘class’, ‘float’, ‘hash’, ‘module’, ‘node’, ‘object’, ‘regexp’, ‘string’, ...]

Page 47: Debugging Ruby with MongoDB

mongodb: distinct• distinct(‘type’)

list of types of objects

• distinct(‘file’)list of source files

• distinct(‘class_name’)list of instance class names

• optionally filter first

• distinct(‘name’, {type:“class”})names of all defined classes

Page 48: Debugging Ruby with MongoDB

improve performancewith indexes

> db.rails.ensureIndex({‘type’:1})

> db.rails.ensureIndex( {‘file’:1}, {background:true})

Page 49: Debugging Ruby with MongoDB

mongodb: ensureIndex

• add an index on a field (if it doesn’t exist yet)

• improve performance of queries against common fields: type, class_name, super, file

• can index embedded field names

• ensureIndex(‘methods.add’)

• find({‘methods.add’:{$exists:true}})find classes that define the method add

Page 50: Debugging Ruby with MongoDB

darrenhester (flickr)

how many objs per type?

Page 51: Debugging Ruby with MongoDB

> db.rails.group({ initial: {count:0}, key: {type:true}, cond: {}, reduce: function(obj, out) { out.count++ }}).sort(function(a,b) { return a.count - b.count})

how many objs per type?

group on type

increment countfor each obj

sort results

Page 52: Debugging Ruby with MongoDB

[ ..., {type: ‘array’, count: 7621}, {type: ‘string’, count: 69139}, {type: ‘node’, count: 365285}]

• nodes represent ruby code

• stored like any other ruby object

• makes ruby completely dynamic

lots of nodes

how many objs per type?

Page 53: Debugging Ruby with MongoDB

mongodb: group

• cond: query to filter objects before grouping

• key: field(s) to group on

• initial: initial values for each group’s results

• reduce: aggregation function

Page 54: Debugging Ruby with MongoDB

mongodb: group• by type or class

• key: {type:1}• key: {class_name:1}

• by file & line• key: {file:1, line:1}

• by type in a specific file• cond: {file: “app.rb”},

key: {file:1, line:1}

• by length of strings in a specific file• cond: {file:“app.rb”,type:‘string’},

key: {length:1}

Page 55: Debugging Ruby with MongoDB

davestfu (flickr)

what subclasses String?

Page 56: Debugging Ruby with MongoDB

what subclasses String?> db.rails.find( {super_name:"String"}, {name:1})

{name: "ActiveSupport::SafeBuffer"}{name: "ActiveSupport::StringInquirer"}{name: "SQLite3::Blob"}{name: "ActiveModel::Name"}{name: "Arel::Attribute::Expressions"}{name: "ActiveSupport::JSON::Variable"}

select only name field

Page 57: Debugging Ruby with MongoDB

mongodb: find

• find({type:‘string’})all strings

• find({type:{$ne:‘string’}})everything except strings

• find({type:‘string’}, {data:1})only select string’s data field

Page 58: Debugging Ruby with MongoDB

http://body.builder.hu/imagebank/pictures/1088273777.jpg

the largest objects?

Page 59: Debugging Ruby with MongoDB

the largest objects?> db.rails.find( {type: {$in:['string','array','hash']} }, {type:1,length:1}).sort({length:-1}).limit(3) {type: "string", length: 2308}{type: "string", length: 1454}{type: "string", length: 1238}

Page 60: Debugging Ruby with MongoDB

mongodb: sort, limit/skip

• sort({length:-1,file:1})sort by length desc, file asc

• limit(10)first 10 results

• skip(10).limit(10)second 10 results

Page 61: Debugging Ruby with MongoDB

zoutedrop (flickr)

when were objs created?

Page 62: Debugging Ruby with MongoDB

when were objs created?• useful to look at objects over time

• each obj has a timestamp of when it was created

• find minimum time, call it start_time

• create buckets for every minute of execution sincestart

• place objects into buckets

Page 63: Debugging Ruby with MongoDB

when were objs created?> db.rails.mapReduce(function(){ var secs = this.time - start_time; var mins_since_start = secs % 60;

emit(mins_since_start, 1); }, function(key, vals){ for(var i=0,sum=0; i<vals.length; sum += vals[i++]); return sum; }, { scope: { start_time: db.rails.find().sort({time:1}).limit(1)[0].time } }){result:"tmp.mr_1272615772_3"}

start_time = min(time)

Page 64: Debugging Ruby with MongoDB

mongodb: mapReduce• arguments

•map: function that emits one or more key/value pairs given each object this

• reduce: function to return aggregate result, given key and list of values

• scope: global variables to set for funcs

• results

• stored in a temporary collection(tmp.mr_1272615772_3)

Page 65: Debugging Ruby with MongoDB

when were objs created?> db.tmp.mr_1272615772_3.count()12

script was running for 12 minutes

> db.tmp.mr_1272615772_3.find().sort({value:-1}).limit(1){_id: 8, value: 41231}

41k objects created 8 minutes after start

Page 66: Debugging Ruby with MongoDB

jeffsmallwood (flickr)

references to this object?

Page 67: Debugging Ruby with MongoDB

references to this object?ary = [“a”,”b”,”c”]

ary references “a”“b” referenced by ary

• ruby makes it easy to “leak” references

• an object will stay around until all references to it are gone

• more objects = longer GC = bad performance

• must find references to fix leaks

Page 68: Debugging Ruby with MongoDB

references to this object?• db.rails_refs.insert({

_id:"0xary", refs:["0xa","0xb","0xc"]})create references lookup table

• db.rails_refs.ensureIndex({refs:1})add ‘multikey’ index to refs array

• db.rails_refs.find({refs:“0xa”})efficiently lookup all objs holding a ref to 0xa

Page 69: Debugging Ruby with MongoDB

mongodb: multikeys

• indexes on array values create a ‘multikey’ index

• classic example: nested array of tags

• find({tags: “ruby”})find objs where obj.tags includes “ruby”

Page 70: Debugging Ruby with MongoDB

version 2: memprof.coma web-based heap visualizer and leak analyzer

Page 71: Debugging Ruby with MongoDB

memprof.coma web-based heap visualizer and leak analyzer

Page 72: Debugging Ruby with MongoDB

memprof.coma web-based heap visualizer and leak analyzer

Page 73: Debugging Ruby with MongoDB

memprof.coma web-based heap visualizer and leak analyzer

Page 74: Debugging Ruby with MongoDB

memprof.coma web-based heap visualizer and leak analyzer

Page 75: Debugging Ruby with MongoDB

memprof.coma web-based heap visualizer and leak analyzer

Page 76: Debugging Ruby with MongoDB

memprof.coma web-based heap visualizer and leak analyzer

Page 77: Debugging Ruby with MongoDB

memprof.coma web-based heap visualizer and leak analyzer

Page 78: Debugging Ruby with MongoDB

plugging a leak in rails3• in dev mode, rails3 is leaking 10mb per request

# in environment.rbrequire `gem which memprof/signal`.strip

let’s use memprof to find it!

Page 79: Debugging Ruby with MongoDB

plugging a leak in rails3

tell memprof to dump out the entire heap to json

$ memprof --pid <pid> --name <dump name> --key <api key>

send the app some requests so it leaks

$ ab -c 1 -n 30 http://localhost:3000/

Page 80: Debugging Ruby with MongoDB

2519 classes

30 copies of TestController

Page 81: Debugging Ruby with MongoDB

2519 classes

30 copies of TestController

mongo query for all TestController classes

details for one copy of TestController

Page 82: Debugging Ruby with MongoDB

find references to object

Page 83: Debugging Ruby with MongoDB

find references to object

Page 84: Debugging Ruby with MongoDB

find references to object

holding references to all controllers

“leak” is on line 178

Page 85: Debugging Ruby with MongoDB

• In development mode, Rails reloads all your application code on every request

• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization

• But.. it ends up holding a reference to every single reloaded version of those controllers

Page 86: Debugging Ruby with MongoDB

• In development mode, Rails reloads all your application code on every request

• ActionView::Partials::PartialRenderer is caching partials used by each controller as an optimization

• But.. it ends up holding a reference to every single reloaded version of those controllers

Page 87: Debugging Ruby with MongoDB

Questions?

Aman Gupta@tmm1