45
Scalable Event Analytics with Ruby on Rails & MongoDB Ruby Conf China 2010 Jared Rosoff (@forjared) [email protected]

Scalable Event Analytics with MongoDB & Ruby on Rails

Embed Size (px)

DESCRIPTION

Slides from my talk at RubyConfChina 2010.

Citation preview

Page 1: Scalable Event Analytics with MongoDB & Ruby on Rails

Scalable Event Analytics with Ruby on Rails & MongoDB

Ruby Conf China 2010Jared Rosoff (@forjared)

[email protected]

Page 2: Scalable Event Analytics with MongoDB & Ruby on Rails

Yottaa!!!! (www.yottaa.com)

Page 3: Scalable Event Analytics with MongoDB & Ruby on Rails

Overview

• Ruby at Scale

• What is Event Analytics?

• What are the different ways you could do it?

• How we did it

Page 4: Scalable Event Analytics with MongoDB & Ruby on Rails

Ruby At Scale?http://www.flickr.com/photos/laughingsquid

Page 5: Scalable Event Analytics with MongoDB & Ruby on Rails

Event Analytics

Event Analytics

Data Source

EventEvent

Event

Data Source

EventEvent

Event

Data Source

EventEvent

Event

Data Source

EventEvent

Event

UserQuery

Report

User

Query

Report

Page 6: Scalable Event Analytics with MongoDB & Ruby on Rails

High Write Volume•Each new data source adds X requests per second•Data never stops arriving

Continuous Data Growth•We only add more data•Historical data is valuable

Flexible Data Exploration•Ad hoc queries •Complex aggregations

Page 7: Scalable Event Analytics with MongoDB & Ruby on Rails

Oh and we are a startup

Page 8: Scalable Event Analytics with MongoDB & Ruby on Rails

Our requirements:On Launch Day

# of data sources 15# of events per minute 80# GBs data stored 20

3 months later (projected)# of data sources 45# of events per minute 5600# GBs data stored 100

Page 9: Scalable Event Analytics with MongoDB & Ruby on Rails

Rails default architecture

MySQL

Data Source Collection Server

User Reporting Server

Page 10: Scalable Event Analytics with MongoDB & Ruby on Rails

Rails default architecture

MySQL

Data Source Collection Server

User Reporting Server

“Just” a Rails App

Page 11: Scalable Event Analytics with MongoDB & Ruby on Rails

Rails default architecture

MySQL

Data Source Collection Server

User Reporting Server

“Just” a Rails App

Performance Bottleneck: Too much load

Page 12: Scalable Event Analytics with MongoDB & Ruby on Rails

Let’s add replication!

MySQLMasterMySQL

MasterMySQLMaster

MySQLMaster

Replication

Data Source Collection Server

User Reporting Server

Page 13: Scalable Event Analytics with MongoDB & Ruby on Rails

Let’s add replication!

MySQLMasterMySQL

MasterMySQLMaster

MySQLMaster

Replication

Data Source Collection Server

User Reporting Server

Off the shelf!Scalable Reads!

Page 14: Scalable Event Analytics with MongoDB & Ruby on Rails

Let’s add replication!

MySQLMasterMySQL

MasterMySQLMaster

MySQLMaster

Replication

Data Source Collection Server

User Reporting Server

Off the shelf!Scalable Reads!

Performance Bottleneck: Still can’t scale

writes

Page 15: Scalable Event Analytics with MongoDB & Ruby on Rails

What about sharding?

MySQLMasterMySQL

MasterMySQLMaster

Data Source Collection Server

User Reporting Server

Shar

ding

Shar

ding

Page 16: Scalable Event Analytics with MongoDB & Ruby on Rails

What about sharding?

MySQLMasterMySQL

MasterMySQLMaster

Data Source Collection Server

User Reporting Server

Shar

ding

Shar

ding

Scalable Writes!

Page 17: Scalable Event Analytics with MongoDB & Ruby on Rails

What about sharding?

MySQLMasterMySQL

MasterMySQLMaster

Data Source Collection Server

User Reporting Server

Shar

ding

Shar

ding

Scalable Writes!

Development Bottleneck:

Need to write custom code

Page 18: Scalable Event Analytics with MongoDB & Ruby on Rails

Key Value stores to the rescue?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Page 19: Scalable Event Analytics with MongoDB & Ruby on Rails

Key Value stores to the rescue?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Scalable Writes!

Page 20: Scalable Event Analytics with MongoDB & Ruby on Rails

Key Value stores to the rescue?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Scalable Writes!

Development Bottleneck:

Reporting is limited / hard

Page 21: Scalable Event Analytics with MongoDB & Ruby on Rails

Can I Hadoop my way out of this?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Hadoop

MySQLMasterMySQL

MasterMySQLSlave

MySQLMaster

Page 22: Scalable Event Analytics with MongoDB & Ruby on Rails

Can I Hadoop my way out of this?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Hadoop

MySQLMasterMySQL

MasterMySQLSlave

MySQLMaster

Scalable Writes!

Page 23: Scalable Event Analytics with MongoDB & Ruby on Rails

Can I Hadoop my way out of this?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Hadoop

MySQLMasterMySQL

MasterMySQLSlave

MySQLMaster

Scalable Writes!

Flexible Reports!

Page 24: Scalable Event Analytics with MongoDB & Ruby on Rails

Can I Hadoop my way out of this?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Hadoop

MySQLMasterMySQL

MasterMySQLSlave

MySQLMaster

Scalable Writes!

Flexible Reports!

“Just” a Rails App

Page 25: Scalable Event Analytics with MongoDB & Ruby on Rails

Can I Hadoop my way out of this?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Hadoop

MySQLMasterMySQL

MasterMySQLSlave

MySQLMaster

Scalable Writes!

Flexible Reports!

“Just” a Rails App

Development Bottleneck:

Too many systems!

Page 26: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoDB!

MySQLMasterMySQL

MasterMongoDB

Data Source Collection Server

User Reporting Server

Page 27: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoDB!

MySQLMasterMySQL

MasterMongoDB

Data Source Collection Server

User Reporting Server

Scalable Writes!

Page 28: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoDB!

MySQLMasterMySQL

MasterMongoDB

Data Source Collection Server

User Reporting Server

Scalable Writes!

Flexible Reporting!

Page 29: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoDB!

MySQLMasterMySQL

MasterMongoDB

Data Source Collection Server

User Reporting Server

Scalable Writes!“Just” a rails app

Flexible Reporting!

Page 30: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoD

MongoD

MongoD

Data Source

App Server

CollectionN

ginx

Pass

enge

r

Mon

gos

ReportingUser

LoadBalancer

Page 31: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoD

MongoD

MongoD

Data Source

App Server

CollectionN

ginx

Pass

enge

r

Mon

gos

ReportingUser

Sharding!

LoadBalancer

Page 32: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoD

MongoD

MongoD

Data Source

App Server

CollectionN

ginx

Pass

enge

r

Mon

gos

ReportingUser

Sharding!

High Concurrency

LoadBalancer

Page 33: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoD

MongoD

MongoD

Data Source

App Server

CollectionN

ginx

Pass

enge

r

Mon

gos

ReportingUser

Sharding!

High ConcurrencyScale-Out

LoadBalancer

Page 34: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoDB Sharding

Page 35: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoDB Sharding

Replica Sets let us scale storage &

transaction capacity for each shard

Page 36: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoDB Sharding

Replica Sets let us scale storage &

transaction capacity for each shard

Mongos routes transactions to shards based on “shard key”

Page 37: Scalable Event Analytics with MongoDB & Ruby on Rails

MongoDB Sharding

Replica Sets let us scale storage &

transaction capacity for each shard

Mongos routes transactions to shards based on “shard key”

Config servers store information about which shards exist

Page 38: Scalable Event Analytics with MongoDB & Ruby on Rails

Inserting

1 insert { ‘name’ : bob }

2Shard key == namebob Shard 2

3 Insert { ‘name’ : bob }

Page 39: Scalable Event Analytics with MongoDB & Ruby on Rails

Querying

1 Query { ‘name’ : bob }

2Shard key == namebob Shard 2

3 Query { ‘name’ : bob }

Page 40: Scalable Event Analytics with MongoDB & Ruby on Rails

Map Reduce

1 Map-reduce( … )

2

Map-reduce(…)

2 2 2

Page 41: Scalable Event Analytics with MongoDB & Ruby on Rails

Working with Mongo

• MongoMapper makes it look like ActiveRecord

• Documents are more natural than rows in many cases

• Map-Reduce rocks (but needs better support in rails)

http://www.flickr.com/photos/elhamalawy/2526783078/

Page 42: Scalable Event Analytics with MongoDB & Ruby on Rails
Page 43: Scalable Event Analytics with MongoDB & Ruby on Rails

Ruby

Mongo

Page 44: Scalable Event Analytics with MongoDB & Ruby on Rails

Runs over all the objects in the views table, counting how many times a page was viewed

Adds up all the counts for a unique url / date combination

Run the map reduce job and return a collection containing the results

Page 45: Scalable Event Analytics with MongoDB & Ruby on Rails

Results• Version 1 of our analytics system took 2 weeks with 1

engineer – We have since added a lot more complexity, but we did it

incrementally

• We replaced MySQL entirely with MongoDB – No need for joins, transactions – Every table is now a document collection

• It’s fast! – 63ms – Average response time for sending data to server– 93ms – Average response time for displaying reports