Upload
joergreichert
View
176
Download
4
Embed Size (px)
Citation preview
MongoDBPRESENTED BY
Jörg Reichert
Licensed under cc-by v3.0 (any jurisdiction)
Introduction
● Name derived from humongous (= gigantic)
● NoSQL (= not only SQL) database
● Document oriented database
– documents stored as binary JSON (BSON)
● Ad-hoc queries
● Server side Javascript execution
● Aggregation / MapReduce
● High performance, availability, scalability
MongoDBRelational vs. document based: conceptsSQL
Person
Name AddressId
MongoDB
12
Mueller 1
Id
Address
City Street12
<null> 2
Leipzig Burgstr. 1Dresden <null>
Person
{ _id: ObjectId(“...“), Name: “Mueller“, Address: { City: “Leipzig“, Street: “Burgstr. 1“, },}, { _id: ObjectId(“...“), Address: { City: “Leipzig“, },}
DB DBTable CollectionColumn
RowDocument
Key: ValueFieldPK
FK
Relation
Embedded document
PK
PK: primary key, FK: foreign key
MongoDB
SELECT * FROM Person;
SELECT * FROM Person WHERE name = “Mueller“;
SELECT * FROM PersonWHERE name like “M%“;
SELECT name FROM Person;
SELECT distinct(name) FROM PersonWHERE name = “Mueller“;
Relational vs. document based: syntax (1/3)
db.getCollection(“Person“).find();
db.Person.find({ “name“: "Mueller“ });
db.Person.find({ “name“: /M.*/ });
db.Person.find({}, {name: 1, _id: 0});
db.Person.distinct( “name“, { “name“: "Mueller“ });
MongoDB
SELECT * FROM Person WHERE id > 10 AND name <> “Mueller“;
SELECT p.name FROM Person p JOIN Address a ON p.address = a.idWHERE a.city = “Leipzig“ORDER BY p.name DESC;
SELECT * FROMWHERE name IS NOT NULL;
SELECT COUNT(*) FROM PERSONWHERE name = “Mueller“;
Relational vs. document based: syntax (2/3)
db.Person.find({ $and: [ { _id: { $gt: ObjectId("...") }}, { name: { $ne: "Mueller" }}]});
db.Person.find( { Address.city: “Leipzig“ }, { name: 1, _id: 0 }).sort({ name: -1 });
db.Person.find( { name: { $not: { $type: 10 }, $exists: true }});
db.Person.count({ name: “Mueller“ });db.Person.find( { name: “Mueller“ }).count();
MongoDB
UPDATE Person SET name = “Müller“WHERE name = “Mueller“;
DELETE Person WHERE name = “Mueller“;
INSERT Person (name, address)VALUES (“Mueller“, 3);
ALTER TABLE PERSONDROP COLUMN name;
DROP TABLE PERSON;
Relational vs. document based: syntax (3/3)
db.Person.updateMany( { name: “Mueller“ }, { $set: { name: “Müller“} });
db.Person.remove( { name: “Mueller“ } );
db.Person.insert( { name: “Mueller“, Address: { … } });
db.Person.updateMany( {}, { $unset: { name: 1 }} );
db.Person.drop();
MongoDB
● principle of least cardinality● Store what you query for
schema design principles
MongoDB
● applicable for 1:1 and 1:n when n can‘t get to large
● Embedded document cannot get too large
● Embedded document not very likely to change
● arrays that grow without bound should never be embedded
schema design: embedded document
{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, Person: [ { Name: “Mueller“, }, { Name: “Schneider“, }, ]}
Address
MongoDB
● applicable for :n when n can‘t get to large
● Referenced document likely to change often in future
● there are many referenced documents expected, so storing only the reference is cheaper
● there are large referenced documents expected, so storing only the reference is cheaper
● arrays that grow without bound should never be embedded
● Address should be accessible on its own
schema design: referencing
{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, Person: [ ObjectId(“...“), ObjectId(“...“), ]}
{ _id: ObjectId(“...“), Name: “Mueller“,}
Address
Person
MongoDB
● applicable for :n relations when n can get very large (note: a MongoDB document isn‘t allowed to exceed 16MB)
● Joins are done on application level
schema design: parent-referencing
{ _id: ObjectId(“...“), City: “Dubai“, Street: “1 Sheikh Mohammed bin Rashid Blvd“,}
{ _id: ObjectId(“...“), Name: “Mueller“, Address: ObjectId(“...“),}
Address
Person
MongoDB
● applicable for m:n when n and m can‘t get to large and application requires to navigate both ends
● disadvantage: need to update operations when changing references
schema design: two way referencing
{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, Person: [ ObjectId(“...“), ObjectId(“...“), ]}
{ _id: ObjectId(“...“), Name: “Mueller“, Address: [ ObjectId(“...“), ObjectId(“...“), ]}
Address
Person
MongoDB
● queries expected to filter by certain fields of the referenced document, so including this field already in the hosts saves an additional query at application level
● disadvantage: two update operations for duplicated field
● disadvantage: additional memory consumption
schema design: denormalization
{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“,}
{ _id: ObjectId(“...“), Name: “Mueller“, Address: [ { id: ObjectId(“...“), city: “Leipzig“, }, ... ]}
Address
Person
MongoDB
● applicable for :n relations when n can get very large and it‘s expected that application will use pagination anyway
● DB schema will already create the chunks, the application will later query for
schema design: bucketing
{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“,}
{ _id: ObjectId(“...“), Address: ObjectId(“...“), Page: 13, Count: 50, Persons: [ { Name: “Mueller“ }, ... ]}
Address
Person
MongoDBAggregation Framework
● Aggregation pipeline consisting of (processing) stages
– $match, $group, $project, $redact, $unwind, $lookup, $sort, ... ● Aggregation operators
– Boolean: $and, $or, $not
– Aggregation: $eq, $lt, $lte, $gt, $gte, $ne, $cmp
– Arithmetic: $add, $substract, $multiply, $divide, ...
– String: $concat, $substr, …
– Array: $size, $arrayElemAt, ...
– Aggregation variable: $map, $let
– Group Accumulator: $sum, $avg, $addToSet, $push, $min, $max
$first, $last, …
– ...
MongoDBAggregation Framework
db.Person.aggregate( [ { $match: { name: { $ne: "Fischer" } } }, { $group: { _id: "$name", city_occurs: { $addToSet: "$Address.city" } } }, { $project: { _id: "$_id", city_count: { $size: "$city_occurs" } }}, { $sort: { name: 1 } } { $match: { city_count: { $gt: 1 } }}, { $out: "PersonCityCount"}] );
PersonCityCount
{ _id: Mueller, city_count: 2,},{ _id: Schmidt, city_count: 3,}, ...
MongoDBMap-Reduce
● More control than aggregation framework, but slower
var map = function() { if(this.name != "Fischer") emit(this.name, this.Address.city);}var reduce = function(key, values) { var distinct = []; for(value in values) { if(distinct.indexOf(value) == -1) distinct.push(value); } return distinct.length;}db.Person.mapReduce(map, reduce, { out: "PersonCityCount2" });
MongoDB
● Default _id index, assuring uniqueness● Single field index: db.Person.createIndex( { name: 1 } );● Compound index: db.Address.createIndex( { city: 1, street: -1 } );
– index sorts first asc. by city then desc. by street
– Index will also used when query only filters by one of the fields● Multikey index: db.Person.createIndex( { Address.city: 1 } )
– Indexes content stored in arrays, an index entry is created foreach● Geospatial index● Text index● Hashed index
Indexes
MongoDB
● uniqueness: insertion of duplicate field value will be rejected● partial index: indexes only documents matching certain filter criteria● sparse index: indexes only documents having the indexed field● TTL index: automatically removes documents after certain time
● Query optimization: use db.MyCollection.find({ … }).explain() to check whether query is answered using an index, and how many documents had still to be scanned
● Covered queries: if a query only contains indexed fields, the results will delivered directly from index without scanning or materializing any documents
● Index intersection: can apply different indexes to cover query parts
Index properties
MongoDB
● Since MongoDB 3.0 WiredTiger is the default storage engine
– locking at document level enables concurrent writes on collection
– durability ensured via write-ahead transaction log and checkpoints (Journaling)
– supports compression of collections and indexes (via snappy or zlib)● MMAPv1 was the default storage until MongoDB 3.0
– since MongoDB 3.0 supports locking at collection level, before only database level
– useful for selective updates, as WiredTiger always replace the hole document in a update operation
Storage engines
MongoDBClustering, Sharding, Replication
Shard 1
Primary(mongod)
Secondary(mongod)
Secondary(mongod)
Config server(replica set)
App server(mongos)
Client app(driver)
Heartbeat
Replication Replication
writes
reads
MongoDBShard key selection
Shard 1 Shard 2 Shard 3
{ key: 12, ...}
{ key: 21, ...}
{ key: 35, ...}
min <= key < 15 15 <= key < 30 30 <= key < max
Sharded Collection
(Hash function)
MongoDB
● ACID → MongoDB is compliant to this only at document level
– Atomicity
– Consistency
– Isolation
– Durability
● CAP → MongoDB assures CP
– Consistency
– Availability
– Partition tolerance
transactions
BASE: Basically Available, Soft state, Eventual consistency
MongoDB doesn't support transactions
multi document updates can be performed via Two-Phase-Commit
MongoDB
● Javascript: Mongo Node.js driver● Java: Java MongoDB Driver ● Python: PyMongo, Motor (async)● Ruby: MongoDB Ruby Driver● C#: Mongo Csharp Driver● ...
Driver
Object-document mappers● Javascript: mongoose, Camo, MEAN.JS● Java: Morphia, SpringData MongoDB ● Python: Django MongoDB engine● Ruby: MongoMapper, Mongoid ● C#: LinQ● ...
MongoDB
● CKAN● MongoDB-Hadoop connector● MongoDB Spark connector● MongoDB ElasticSearch/Solr connector● ...
Extensions and connectors
Tool support
● Robomongo● MongoExpress● ...
MongoDB
● Who uses MongoDB● Case studies● Arctic TimeSeries and Tick store● uptime
Real world examples
MongoDB in Code For Germany projects● Politik bei uns (Offenes Ratsinformationssystem), gescrapte Stadtratsdaten
werden gemäß dem OParl-Format in einer MongoDB gespeichert, siehe auch Daten, Web-API und Oparl-Client
MongoDB
● Choose
– mass data processing, like event data
– dynamic scheme
● Not to choose
– static scheme with lot of relations
– strict transaction requirements
When to choose, when not to choose
MongoDB
● MongoDB Schema Simulation
● 6 Rules of Thumb for MongoDB Schema Design
● MongoDB Aggregation
● MongoDB Indexes
● Sharding
● MongoDB University
● Why Relational Databases are not the Cure-All
Links