ConFoo - Migrating To Mongo Db

Preview:

DESCRIPTION

While MySQL and PostgreSQL are the usual options for database-driven web applications, developers can now consider non-relational databases as serious alternatives.This session will present a case study of why and how we migrated from a backend built on a mix of MySQL and SQLite to MongoDB. The session will cover the following points:- Key differences between an SQL RDBMS and Mongo, - What made it a better fit in our case, - Hands-on technical examples of using MongoDB from PHP5.

Citation preview

Migrating to MongoDBWhy we moved from MySQL to Mongo

Getting to know Mongo

Demo app using Mongo with PHP

Reasons we looked for alternative to RDBM setup

Issues with our RDBM setup

Architecture was highly distributed, number of databases was becoming an issue

Storing similar objects with different structure

Options for scalability

Storing files

Many DBs

In a MySQL server (with MyISAM)...

1 database = 1 directory

1 table = more than 1 file in DB directory

Filesystem limits number of inodes per directory and it’s not that big

Had a mix of MySQL with SQLite databases spreaded across directory hierarchy

Many DBs

In a Mongo server ...

No 1:1 relation between databases and files

Stores data set of files pre-allocated with increasing size

Number of files grows as needed

Using many collections within a single database allowed to move everything in DB server

A “collection”?

RDBM model:

Database has tables which hold records

Records in a table are identical

Document-oriented storage

Database has collections which hold documents

Obj. with differing structure

For example, events where attributes vary based on type of event

Event A: from, att1

Event B: from, att1, att2

Event C: from, att3, att4

What’s your schema for this?

tbl_events_Atbl_events_Atbl_events_A

id from Att1

1 Jim 1237

2 Dave 362

3 Bob 9283

tbl_events_Btbl_events_Btbl_events_Btbl_events_B

id from Att1 Att2

1 Bill 2938 23

2 Jim 632 9

3 Hugh 12832 14

tbl_events_Ctbl_events_Ctbl_events_Ctbl_events_C

id from Att3 Att4

1 Bob hello 7249

2 Bill goodbye 23091

3 Jim testing 2334

tbl_eventstbl_eventstbl_eventstbl_eventstbl_eventstbl_eventstbl_eventsid type from Att1 Att2 Att3 Att4

1 A Jim 1237 NULL NULL NULL

2 A Dave 362 NULL NULL NULL

3 B Bill 2938 23 NULL NULL

4 C Bob NULL NULL hello 7249

5 A Bob 9283 NULL NULL NULL

6 C Bill NULL NULL goodbye 23091

7 B Jim 632 9 NULL NULL

8 B Hugh 12832 14 NULL NULL

9 C Jim NULL NULL testing 2334

tbl_eventstbl_eventstbl_eventstbl_eventsid type from Attributes

1 A Jim “{‘att1’:1237}”

2 A Dave “{‘att1’:362}”

3 B Bill “{‘att1’:2938, ‘att2’:23}”

4 C Bob “{‘att3’:‘hello’, ‘att4’:7249}”

5 A Bob “{‘att1’:9283}”

6 C Bill “{‘att3’:‘goodbye’, ‘att4’:2391}”

7 B Jim “{‘att1’:632, ‘att2’:9}”

8 B Hugh “{‘att1’:12832, ‘att2’:14}”

9 C Jim “{‘att3’:‘testing’, ‘att4’:2334}”

tbl_eventstbl_eventstbl_eventsid type from

1 A Jim

2 A Dave

3 B Bill

4 C Bob

5 A Bob

6 C Bill

7 B Jim

8 B Hugh

9 C Jim

tbl_events_attributestbl_events_attributestbl_events_attributestbl_events_attributesid eventId name value

1 1 att1 1237

2 2 att1 362

3 3 att1 2938

4 3 att2 23

5 4 att3 hello

6 4 att4 7249

7 5 att1 9283

8 6 att3 goodbye

9 6 att4 2391

10 7 att1 632

11 7 att2 9

............

Obj. with differing structure

Document-oriented storage link Mongo is schema-less

1 collection for all events

Each document has the structure applicable for its type

Can index common attributes for queries

events collection :

{id:1, type:’A’, from:‘Jim’, att1:1237}{id:2, type:’A’, from:‘Dave’, att1:362}{id:5, type:’A’, from:‘Bob’, att1:9238}{id:3, type:’B’, from:‘Bill’, att1:2938, att2:23}{id:7, type:’B’, from:‘Jim’, att1:632, att2:9}{id:8, type:’B’, from:‘Hugh’, att1:12832, att2:14}{id:4, type:’C’, from:‘Bill’, att3:‘hello’, att4:7249}{id:6, type:’C’, from:‘Jim’, att3:‘goodbye’, att4:23091}{id:9, type:’C’, from:‘Hugh’, att3:‘testing’, att4:2334}

Options for scalability

MySQL - Master-slave replication

Mongo - Support master slave, replica pairs, master master and ... auto-sharding

Storing files

In MySQL, you can use a table with BLOB field and other field for file meta data

Mongo has GridFS

Built for storage of large objects

Split into chunks, also stores metadata

> db.fs.files.findOne();{! "_id" : ObjectId("4b9525096b00bd59b95f791f"),! "filename" : "user.png",! "length" : 43717,! "chunkSize" : 262144,! "uploadDate" : "Mon Mar 08 2010 11:25:45 GMT-0500 (EST)",! "md5" : "3f6fcd4c0a51655d392fe95a99c29140",! "mimeType" : "image/png"}> db.fs.chunks.findOne();{! "_id" : ObjectId("4b952509c568bb9fc8e3cddb"),! "files_id" : ObjectId("4b9525096b00bd59b95f791f"),! "n" : 0,! "data" : BinData type: 2 len: 43721}

Getting to know MongoDB

Basic concepts

A database has collections which holds documents

Documents in a collection can have any structure

Documents are JSON objects, stored as BSON

Data types:

all basic JSON types: string, integer, boolean, double, null, array, object

Special types: date, object id, binary, regexp, code

Important differences

Collections instead of tables

ObjectID instead of primary keys

References instead of foreign keys

JavaScript code execution instead of stored procedures

[NULL] instead of joins

Inserting data> doc = { author: 'joe', created : new Date('03-28-2009'), title : 'Yet another blog post', text : 'Here is the text...', tags : [ 'example', 'joe' ], comments : [ { author: 'jim', comment: 'I disagree' }, { author: 'nancy', comment: 'Good post' } ]}> db.posts.insert(doc);

Querying data

> db.posts.find();> db.posts.find({‘author’:‘joe’});> db.posts.find({‘comments.author’:‘nancy’});> db.posts.find({‘comments.comment’: /disagree/i });

> db.posts.findOne({‘comment.author’:‘nancy’});> db.posts.find({‘comment.author’:‘nancy’}).limit(5);

> db.posts.find({},{‘author’:true, ‘tags’:true});

> db.posts.find({‘author’:‘nancy’}).sort({‘created’:1});

Querying - advanced features

Support of OR conditions

$ modifiers to introduce conditions

> db.posts.find({timestamp: {$gte:1268149684}});

$where modifiers

> db.pictures.find({$where: function() { return (this.creationTimestamp >= 1268149684) }})

MapReduce

Server-side code execution

> function getUniques() {... var uniques = [];... db.pictures.find({},{tags:true}).forEach(function(pic) {... pic.tags.forEach(function(tag) {... if (uniques.indexOf(tag) == -1) uniques.push(tag);... });... });... return uniques;... }> db.eval(getUniques); [! "firstTag",! "thirdTag",! "toto",! "test",! "comic",! "secondTag"]

Updating data

update( criteria, objNew, upsert, multi )

> db.myColl.update( { name: "Joe" }, { name: "Joe", age: 20 }, true, false );

save(object) - insert or update if _id exists

Update modifier operators

$inc, $set, $unset, $push, $pushAll, $addToSet, $pop, $pull, $pullAll

> db.myColl.update({name:"Joe"}, { $set:{age:20}});

> db.posts.update({author:”Joe”},{$push:{tags:‘hockey’}});

> db.posts.update({},{$addToSet:{tags:‘hockey’}});

Removing data> db.things.remove({}); // removes all> db.things.remove({n:1}); // removes all where n == 1> db.things.remove({_id: myobject._id});

References> p = db.postings.findOne();{! "_id" : ObjectId("4b866f08234ae01d21d89604"),! "author" : "jim",! "title" : "Brewing Methods"}> // get more info on author> db.users.findOne( { _id : p.author } ){ "_id" : "jim", "email" : "jim@gmail.com" }

> x = { name : 'Biology' }{ "name" : "Biology" }> db.courses.save(x)> x{ "name" : "Biology", "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1") }

> stu = { name : 'Joe', classes : [ new DBRef('courses', x._id) ] }> db.students.save(stu)> stu{ "name" : "Joe", "classes" : [ { "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") } ], "_id" : ObjectId("4b0552e4f0da7d1eb6f126a2")}> stu.classes[0]{ "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") }

> stu.classes[0].fetch(){ "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1"), "name" : "Biology" }

Limitations to keep in mind

Namespace limit (24 000 collections and indexes)

Database size maxed to 2GB on 32-bit systems ... use a 64-bit production system!

Licensing

MongoDB is GNU AGPL 3.0, supported drivers re Apache License v2.0

From www.mongodb.org/display/DOCS/Licensing :

If you are using a vanilla MongoDB server from either source or binary packages you have NO obligations. You can ignore the rest of this page.

Hands-on example

SQL schema

blobcontent

creationTimestamp int

title varchar

pictureId int

pictures

name varchar

userId int

users

intcreationTimestamp

varchartxt

userId int

pictureId int

comments

tag varchar

pictureId int

tags

let’s see some code ...