MongoATL: How Sourceforge is Using MongoDB

Preview:

DESCRIPTION

How Sourceforge is Using MongoDB

Citation preview

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 1

How SourceForge is Using MongoDB

Rick Copeland@rick446

rick@geek.net

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 2

SF.net “BlackOps”: FossFor.us

User Editable!

Web 2.0!(ish)

Not Ugly!

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 3

Moving to NoSQL FossFor.us used CouchDB (NoSQL) “Just adding new fields was trivial, and was

happening all the time” – Mark Ramm Scaling up to the level of SF.net needs

research CouchDB MongoDB Tokyo Cabinet/Tyrant Cassandra... and others

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 4

Rewriting “Consume” Most traffic on SF.net hits 3 types of pages:

Project Summary File Browser Download

Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net

Original goal is 1 MongoDB document per project Later split release data because some projects have lots of releases

Periodic updates via RSS and AMQP from “Develop”

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 5

Deployment ArchitectureLoad Balancer / Proxy

Master DB Server

MongoDBMaster

Apachemod_wsgi / TG 2.0

MongoDBSlave

Apachemod_wsgi / TG 2.0

MongoDBSlave

Apachemod_wsgi / TG 2.0

MongoDBSlave

Gobble Server

Develop

Apachemod_wsgi / TG 2.0

MongoDBSlave

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 6

Deployment Architecture (revised)

Load Balancer / Proxy

Master DB Server

MongoDBMaster

Apachemod_wsgi / TG 2.0

Gobble Server

DevelopApachemod_wsgi / TG 2.0

Apachemod_wsgi / TG 2.0

Apachemod_wsgi / TG 2.0

Scalability is good

Single-node performance is

good, too

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 7

SF.net Downloads Allow non-sf.net projects to use SourceForge mirror network

Stats calculated in Hadoop and stored/served from MongoDB

Same deployment architecture as Consume (4 web, 1 db)

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 8

Allura (SF.net “beta” devtools)

Rewrite developer tools with new architecture

Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come

Single MongoDB replica set manually sharded by project

Release early & often

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 9

What We Liked Performance, performance, performance – Easily handle

90% of SF.net traffic from 1 DB server, 4 web servers

Schemaless server allows fast schema evolution in development, making many migrations unnecessary

Replication is easy, making scalability and backups easy Keep a “backup slave” running

Kill backup slave, copy off database, bring back up the slave

Automatic re-sync with master

Query Language You mean I can have performance without map-reduce?

GridFS

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 10

Pitfalls Too-large documents

Store less per document Return only a few fields

Ignoring indexing Watch your server log; bad queries show up

there Ignoring your data’s schema Using many databases when one will do Using too many queries

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 11

Ming – an “Object-Document

Mapper?” Your data has a schema Your database can define and enforce it

It can live in your application (as with MongoDB)

Nice to have the schema defined in one place in the code

Sometimes you need a “migration” Changing the structure/meaning of fields

Adding indexes

Sometimes lazy, sometimes eager

Queuing up all your updates can be handy

Python dicts are nice; objects are nicer

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 12

Ming Concepts Inspired by SQLAlchemy

Group of classes to which you map your collections

Each class defines its schema, including indexes

Convenience methods for loading/saving objects and ensuring indexes are created

Migrations

Unit of Work – great for web applications

MIM – “Mongo in Memory” nice for unit tests

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 13

Ming Example

from ming import schemafrom ming.orm import MappedClassfrom ming.orm import (FieldProperty, ForeignIdProperty, RelationProperty)

class WikiPage(MappedClass):

class __mongometa__: session = session name = 'wiki_page'

_id = FieldProperty(schema.ObjectId) title = FieldProperty(str) text = FieldProperty(str) comments=RelationProperty('WikiComment')

MappedClass.compile_all() # Lets ming know about the mapping

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 14

Open Source

Minghttp://sf.net/projects/merciless/

MIT License

Allurahttp://sf.net/p/allura/

Apache License

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 15

Future Work

mongos New Allura Tools Migrating legacy SF.net projects to Allura Stats all in MongoDB rather than Hadoop? Better APIs to access your project data

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatConfidential Geeknet, page 16

Questions?

SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 17

Rick Copeland@rick446

rick@geek.net