Upload
ngotram
View
217
Download
1
Embed Size (px)
Citation preview
1
July 2012
Open source, high performance database
2
• Quick introduction to mongoDB
• Data modeling in mongoDB, queries, geospatial, updates and map reduce.
• Using a location-based app as an example
• Example works in mongoDB JS shell
3
4
MongoDB is a scalable, high-performance, open source, document-oriented database.
• Fast Querying • In-place updates• Full Index Support• Replication /High Availability• Auto-Sharding• Aggregation; Map/Reduce• GridFS
5
Better data locality
Relational MongoDB
In-Memory Caching
Auto-Sharding
Write scalingRea
d sc
alin
g
We just can't get any faster than the way MongoDB handles our data.
Tony TamCTO, Wordnik
6
MongoDB is Implemented in C++• Windows, Linux, Mac OS-X, Solaris
Drivers are available in many languages10gen supported• C, C# (.Net), C++, Erlang, Haskell, Java,
JavaScript, Perl, PHP, Python, Ruby, Scala, nodejs
• Multiple community supported drivers
7
RDBMS MongoDBTable CollectionRow(s) JSON DocumentIndex IndexPartition ShardJoin Embedding/Linking
Schema (implied Schema)
8
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Asya",date : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...",tags : [ "tech", "databases" ],comments : [{
author : "Fred",date : ISODate("2012-02-03T17:22:21.124Z"), text : "Best
Post Ever!"}],comment_count : 1
}
9
•JSON has powerful, limited set of datatypes– Mongo extends datatypes with Date, Int types, Id, …
•MongoDB stores data in BSON
•BSON is a binary representation of JSON– Optimized for performance and navigational abilities– Also compression
See: bsonspec.org
10
MySQLSTART TRANSACTION;INSERT INTO contacts VALUES
(NULL, ‘joeblow’);INSERT INTO contact_emails VALUES
( NULL, ”[email protected]”,LAST_INSERT_ID() ),
( NULL, “[email protected]”, LAST_INSERT_ID() );
COMMIT;
MongoDBdb.contacts.save( {
userName: “joeblow”,emailAddresses: [
“[email protected]”,“[email protected]” ] } );
• Native drivers for dozens of languages• Data maps naturally to OO data
structures
11
•Want to build an app where users can check in to a location
•Leave notes or comments about that location
12
"As a user I want to be able to find other locations nearby"
• Need to store locations (Offices, Restaurants, etc)
– name, address, tags– coordinates– User generated content e.g. tips / notes
13
"As a user I want to be able to 'checkin' to a location"
Checkins– User should be able to 'check in' to a location– Want to be able to generate statistics:
• Recent checkins• Popular locations
14
users
user1, user2loc1, loc2, loc3
locations checkins
checkin1, checkin2
15
> location_1 = {name: "Lotus Flower",address: "123 University Ave",city: "Palo Alto",post_code: 94012
}
16
> location_1 = {name: "Lotus Flower",address: "123 University Ave",city: "Palo Alto",post_code: 94012
}
> db.locations.find({name: "Lotus Flower"})
17
> location_1 = {name: "Lotus Flower",address: "123 University Ave",city: "Palo Alto",post_code: 94012
}
> db.locations.ensureIndex({name: 1})> db.locations.find({name: "Lotus Flower"})
18
> location_2 = {name: "Lotus Flower",address: "123 University Ave",city: "Palo Alto",post_code: 94012,
tags: ["restaurant", "dumplings"]}
19
> location_2 = {name: "Lotus Flower",address: "123 University Ave",city: "Palo Alto",post_code: 94012,
tags: ["restaurant", "dumplings"]}
> db.locations.ensureIndex({tags: 1})
20
> location_2 = {name: "Lotus Flower",address: "123 University Ave",city: "Palo Alto",post_code: 94012,
tags: ["restaurant", "dumplings"]}
> db.locations.ensureIndex({tags: 1})> db.locations.find({tags: "dumplings"})
21
> location_3 = {name: "Lotus Flower",address: "123 University Ave",city: "Palo Alto",post_code: 94012, tags: ["restaurant", "dumplings"],
lat_long: [52.5184, 13.387]}
22
> location_3 = {name: "Lotus Flower",address: "123 University Ave",city: "Palo Alto",post_code: 94012, tags: ["restaurant", "dumplings"],
lat_long: [52.5184, 13.387]}
> db.locations.ensureIndex({lat_long: "2d"})
23
> location_3 = {name: "Lotus Flower",address: "123 University Ave",city: "Palo Alto",post_code: 94012, tags: ["restaurant", "dumplings"],
lat_long: [52.5184, 13.387]}
> db.locations.ensureIndex({lat_long: "2d"})> db.locations.find({lat_long: {$near:[52.53, 13.4]}})
24
// creating your indexes:> db.locations.ensureIndex({tags: 1})> db.locations.ensureIndex({name: 1})> db.locations.ensureIndex({lat_long: "2d"})
// finding places:> db.locations.find({lat_long: {$near:[52.53, 13.4]}})
// with regular expressions:> db.locations.find({name: /^Din/})
// by tag:> db.locations.find({tag: "dumplings"})
25
Atomic operators:
$set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit
26
// initial data load:> db.locations.insert(location_3)
// adding a tip with update:> db.locations.update(
{name: "Lotus Flower"},{$push: {
tips: {user: "Asya",date: "28/03/2012",tip: "The hairy crab dumplings are awesome!"}
}})
27
> db.locations.findOne(){ name: "Lotus Flower",
address: "123 University Ave",city: "Palo Alto",post_code: 94012, tags: ["restaurant", "dumplings"],lat_long: [52.5184, 13.387],tips:[{
user: "Asya",date: "28/03/2012",tip: "The hairy crab dumplings are awesome!"
}]
}
28
"As a user I want to be able to 'checkin' to a location"
Checkins
– User should be able to 'check in' to a location– Want to be able to generate statistics:
• Recent checkins• Popular locations
29
> user_1 = {_id: "[email protected]",name: "Asya",twitter: "asya999",checkins: [{location: "Lotus Flower", ts: "28/03/2012"},{location: "Meridian Hotel", ts: "27/03/2012"}]
}
> db.users.ensureIndex({checkins.location: 1})> db.users.find({checkins.location: "Lotus Flower"})
30
// find all users who've checked in here:> db.users.find({"checkins.location":"Lotus Flower"})
31
// find all users who've checked in here:> db.users.find({"checkins.location":"Lotus Flower"})
// find the last 10 checkins here?> db.users.find({"checkins.location":"Lotus Flower"})
.sort({"checkins.ts": -1}).limit(10)
32
// find all users who've checked in here:> db.users.find({"checkins.location":"Lotus Flower"})
// find the last 10 checkins here: - Warning!> db.users.find({"checkins.location":"Lotus Flower"})
.sort({"checkins.ts": -1}).limit(10)
Hard to query for last 10
33
> user_2 = {_id: "[email protected]",name: "Asya",twitter: "asya999",
}
> checkin_1 = {location: location_id,user: user_id,ts: "20/03/2010"
}
> db.checkins.ensureIndex({user: 1})> db.checkins.find({user: user_id})
34
// find all users who've checked in here:> location_id = db.checkins.find({"name":"Lotus Flower"})> u_ids = db.checkins.find({location: location_id},
{_id: -1, user: 1})> users = db.users.find({_id: {$in: u_ids}})
// find the last 10 checkins here:> db.checkins.find({location: location_id})
.sort({ts: -1}).limit(10)
// count how many checked in today:> db.checkins.find({location: location_id,
ts: {$gt: midnight}}).count()
35
// Find most popular locations> agg = db.checkins.aggregate(
{$match: {ts: {$gt: now_minus_3_hrs}}},{$group: {_id: "$location",numEntries: {$sum: 1}}}
)
> agg.result[{"_id": "Lotus Flower", "numEntries" : 17}]
36
// Find most popular locations> map_func = function() {
emit(this.location, 1);}
> reduce_func = function(key, values) {return Array.sum(values);
}
> db.checkins.mapReduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}},out: "result"})
> db.result.findOne(){"_id": "Lotus Flower", "value" : 17}
37
DeploymentDeployment
38
P• Single server- need a strong backup plan
39
• Single server- need a strong backup plan
• Replica sets- High availability- Automatic failover
P
P S S
40
• Single server- need a strong backup plan
• Replica sets- High availability- Automatic failover
• Sharded- Horizontally scale- Auto balancing
P S S
P S S
P
P S S
41
User Data Management High Volume Data Feeds
Content Management Operational Intelligence E‐Commerce
42
43
@mongodb
conferences, appearances, and meetupshttp://www.10gen.com/events
http://bit.ly/mongofb
Facebook | Twitter | LinkedInhttp://linkd.in/joinmongo
download at mongodb.org
support, training, and this talk brought to you by