30
© 2011Geeknet Inc Rapid, Scalable Web Development with MongoDB, Ming, and Python Rick Copeland @rick446 [email protected]

Rapid, Scalable Web Development with MongoDB, Ming, and Python

Embed Size (px)

DESCRIPTION

In 2009, SourceForge embarked on a quest to modernize our websites, converting a site written for a hodge-podge of relational databases in PHP to a MongoDB and Python-powered site, with a small development team and a tight deadline. We have now completely rewritten both the consumer and producer parts of the site with better usability, more functionality and better performance. This talk focuses on how we're using MongoDB, the pymongo driver, and Ming, an ORM-like library implemented at SourceForge, to continually improve and expand our offerings, with a special focus on how3 anyone can quickly become productive with Ming and pymongo without having to apologize for poor performance.

Citation preview

Page 1: Rapid, Scalable Web Development with MongoDB, Ming, and Python

© 2011Geeknet Inc

Rapid, Scalable Web Development with MongoDB,

Ming, and Python

Rick Copeland@rick446

[email protected]

Page 2: Rapid, Scalable Web Development with MongoDB, Ming, and Python

© 2011Geeknet Inc

?

Page 3: Rapid, Scalable Web Development with MongoDB, Ming, and Python

- NoSQL at SourceForge

- Rewriting Consume

- Introducing Ming

- Allura – Open-Sourcing Open Source

- Zarkov – MongoDB-based (near) real-time analytics

Page 4: Rapid, Scalable Web Development with MongoDB, Ming, and Python

© 2011Geeknet Inc

SF.net “BlackOps”: FossFor.us

User Editable!

Web 2.0!(ish)

Not Ugly!

Page 5: Rapid, Scalable Web Development with MongoDB, Ming, and Python

- FossFor.us used CouchDB (NoSQL)- “Just adding new fields was trivial, and was happening all the time” – Mark Ramm

- Scaling up to the level of SF.net needs research- CouchDB- MongoDB- Tokyo Cabinet/Tyrant- Cassandra... and others

Moving to NoSQL

Page 6: Rapid, Scalable Web Development with MongoDB, Ming, and Python

What we were looking for

Performance – how does a single node perform?

Scalability – needs to support simple replication

Ability to handle complex data and queries Ease of development

Page 7: Rapid, Scalable Web Development with MongoDB, Ming, and Python

- NoSQL at SourceForge

- Rewriting Consume

- Introducing Ming

- Allura – Open-Sourcing Open Source

- Zarkov – MongoDB-based (near) real-time analytics

Page 8: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Rewriting “Consume”

Most traffic on SF.net hits 3 types of pages: Project Summary File Browser Download

Pages are read-mostly, with infrequent updates from the “Develop” side of sf.net

Original goal is 1 MongoDB document per project

Later split release data because some projects have lots of releases

Periodic updates via RSS and AMQP from “Develop”

Page 9: Rapid, Scalable Web Development with MongoDB, Ming, and Python

© 2011Geeknet Inc

Load Balancer / Proxy

Master DB Server

MongoDBMaster

Apachemod_wsgi / TG 2.0

MongoDBSlave

Apachemod_wsgi / TG 2.0

MongoDBSlave

Apachemod_wsgi / TG 2.0

MongoDBSlave

Gobble Server

Develop

Apachemod_wsgi / TG 2.0

MongoDBSlave

Deployment Architecture

Page 10: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Load Balancer / Proxy

Master DB Server

MongoDBMaster

Apachemod_wsgi / TG 2.0

Gobble Server

DevelopApachemod_wsgi / TG 2.0

Apachemod_wsgi / TG 2.0

Apachemod_wsgi / TG 2.0

Scalability is good

Single-node performance is

good, too

Deployment Architecture (revised)

Page 11: Rapid, Scalable Web Development with MongoDB, Ming, and Python

- NoSQL at SourceForge

- Rewriting Consume

- Introducing Ming

- Allura – Open-Sourcing Open Source

- Zarkov – MongoDB-based (near) real-time analytics

Page 12: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Ming – an “Object-Document Mapper?” Your data has a schema

Your database can define and enforce it It can live in your application (as with MongoDB) Nice to have the schema defined in one place in the code

Sometimes you need a “migration” Changing the structure/meaning of fields Adding indexes, particularly unique indexes Sometimes lazy, sometimes eager

“Unit of work:” Queuing up all your updates can be handy

Python dicts are nice; objects are nicer

Page 13: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Ming Concepts

Inspired by SQLAlchemy Group of collection objects with schemas defined Group of classes to which you map your collections Use collection-level operations for performance Use class-level operations for abstraction Convenience methods for loading/saving objects and

ensuring indexes are created Migrations Unit of Work – great for web applications MIM – “Mongo in Memory” nice for unit tests

Page 14: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Ming Examplefrom ming import schema, Fieldfrom ming.orm import (mapper, Mapper, RelationProperty,

ForeignIdProperty)

WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))

class WikiPage(object): passclass Comment(object): pass

ormsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('WikiComment')))ormsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage')))

Mapper.compile_all()

Page 15: Rapid, Scalable Web Development with MongoDB, Ming, and Python

- NoSQL at SourceForge

- Rewriting Consume

- Introducing Ming

- Allura – Open-Sourcing Open Source

- Zarkov – MongoDB-based (near) real-time analytics

Page 16: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Load Balancer / Proxy

Master DB Server

MongoDBMaster

Apachemod_wsgi / TG 2.0

Gobble Server

DevelopApachemod_wsgi / TG 2.0

Apachemod_wsgi / TG 2.0

Apachemod_wsgi / TG 2.0

Scalability is good

Single-node performance is

good, too

Python / MongoDB Taking Over….

Allura

Page 17: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Web-facing App Server

Task Daemon

SMTPServer

FUSE Filesystem(repository

hosting)

Allura Architecture

Page 18: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Allura Threaded Discussions

MessageDoc = collection( 'message', project_doc_session, Field('_id', str, if_missing=h.gen_message_id), Field('slug', str, if_missing=h.nonce), Field('full_slug', str), Field('parent_id', str),…)

_id – use an email Message-ID compatible key

slug – threaded path of random 4-digit hex numbers prefixed by parent (e.g. dead/beef/f00d dead/beef dead)

full_slug – slug interspersed with ISO-formatted message datetime (20110627…dead/20110627…beef….)

Easy queries for hierarchical data

Find all descendants of a message – slug prefix search “dead/.*”

Sort messages by thread, then by date – full_slug sort

Page 19: Rapid, Scalable Web Development with MongoDB, Ming, and Python

MonQ: Async Queueing in MongoDB

states = ('ready', 'busy', 'error', 'complete')result_types = ('keep', 'forget')

MonQTaskDoc = collection( 'monq_task', main_doc_session, Field('_id', schema.ObjectId()), Field('state', schema.OneOf(*states)), Field('result_type', Schema.OneOf(*result_types)), Field('time_queue', datetime), Field('time_start', datetime), Field('time_stop', datetime), # dotted path to function Field('task_name', str), Field('process', str), # worker process name: “locks” the task Field('context', dict( project_id=schema.ObjectId(), app_config_id=schema.ObjectId(), user_id=schema.ObjectId())), Field('args', list), Field('kwargs', {None:None}), Field('result', None, if_missing=None))

Page 20: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Repository Cache Objects

On commit to a repo (Hg, SVN, or Git)

• Build commit graph in MongoDB for new commits

• Build auxiliary structures • tree structure, including all trees in a commit & last commit to modify

• linear commit runs (useful for generating history)

• commit difference summary (must be computed in Hg and Git)

• Note references to other artifacts and commits

- Repo browser uses cached structure to serve pages

DiffInfo

Tree TreesCommitR

un

LastCommit

Commit

Page 21: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Repository Cache Lessons Learned

Using MongoDB to represent graph structures (commit graph, commit trees) requires careful query planning. Pointer-chasing is no fun!

Sometimes Ming validation and ORM overhead can be prohibitively expensive – time to drop down a layer.

Benchmarking and profiling are your friends, as are queries like {‘_id’: {‘$in’:[…]}} for returning multiple objects

Page 22: Rapid, Scalable Web Development with MongoDB, Ming, and Python

- NoSQL at SourceForge

- Rewriting Consume

- Introducing Ming

- Allura – Open-Sourcing Open Source

- Zarkov – MongoDB-based (near) real-time analytics

Page 23: Rapid, Scalable Web Development with MongoDB, Ming, and Python

And now, for something completely different…

Business: we need more visibility into what users are doing

- Low overhead- Near real-time- Unified view of lots of systems

- Python- PHP- Perl

Page 24: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Introducing Zarkov

Asynchronous TCP server for event logging with gevent

Turn OFF “safe” writes, turn OFF Ming validation (or do it in the client)

Incrementally calculate aggregate stats based on event log using mapreduce with {‘out’:’reduce’}

Page 25: Rapid, Scalable Web Development with MongoDB, Ming, and Python

© 2011Geeknet Inc

Lessons learned at

Page 26: Rapid, Scalable Web Development with MongoDB, Ming, and Python

What We Liked Performance, performance, performance – Easily

handle 90% of SF.net traffic from 1 DB server, 4 web servers

Dynamic schema allows fast schema evolution in development, making many migrations unnecessary

Replication is easy, making scalability and backups easy

Query Language You mean I can have performance without map-reduce?

GridFS

Page 27: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Pitfalls

Too-large documents Store less per document Return only a few fields

Ignoring indexing Watch your server log; bad queries show up there

Too much denormalization Try to use an index if all you need is a backref Stale data is a tricky problem

Using many databases when one will do Using too many queries

Page 28: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Open Source

Minghttp://sf.net/projects/

merciless/MIT License

Allurahttp://sf.net/p/allura/

Apache License

Zarkovhttp://sf.net/p/zarkov/

Apache License

Page 29: Rapid, Scalable Web Development with MongoDB, Ming, and Python

Future Work

mongos New Allura Tools Migrating legacy SF.net projects to

Allura Continue to optimize stats & analytics

(Zarkov and others) Better APIs to access your project data

Page 30: Rapid, Scalable Web Development with MongoDB, Ming, and Python

© 2011Geeknet Inc

Rick Copeland@rick446

[email protected]