Mongo DB schema design patterns

MongoDBPRESENTED BY

Jörg Reichert

Licensed under cc-by v3.0 (any jurisdiction)

Introduction

● Name derived from humongous (= gigantic)

● NoSQL (= not only SQL) database

● Document oriented database

– documents stored as binary JSON (BSON)

● Ad-hoc queries

● Server side Javascript execution

● Aggregation / MapReduce

● High performance, availability, scalability

MongoDBRelational vs. document based: conceptsSQL

Person

Name AddressId

MongoDB

12

Mueller 1

Id

Address

City Street12

<null> 2

Leipzig Burgstr. 1Dresden <null>

Person

{ _id: ObjectId(“...“), Name: “Mueller“, Address: { City: “Leipzig“, Street: “Burgstr. 1“, },}, { _id: ObjectId(“...“), Address: { City: “Leipzig“, },}

DB DBTable CollectionColumn

RowDocument

Key: ValueFieldPK

FK

Relation

Embedded document

PK

PK: primary key, FK: foreign key

MongoDB

SELECT * FROM Person;

SELECT * FROM Person WHERE name = “Mueller“;

SELECT * FROM PersonWHERE name like “M%“;

SELECT name FROM Person;

SELECT distinct(name) FROM PersonWHERE name = “Mueller“;

Relational vs. document based: syntax (1/3)

db.getCollection(“Person“).find();

db.Person.find({ “name“: "Mueller“ });

db.Person.find({ “name“: /M.*/ });

db.Person.find({}, {name: 1, _id: 0});

db.Person.distinct( “name“, { “name“: "Mueller“ });

MongoDB

SELECT * FROM Person WHERE id > 10 AND name <> “Mueller“;

SELECT p.name FROM Person p JOIN Address a ON p.address = a.idWHERE a.city = “Leipzig“ORDER BY p.name DESC;

SELECT * FROMWHERE name IS NOT NULL;

SELECT COUNT(*) FROM PERSONWHERE name = “Mueller“;


db.Person.find({ $and: [ { _id: { $gt: ObjectId("...") }}, { name: { $ne: "Mueller" }}]});

db.Person.find( { Address.city: “Leipzig“ }, { name: 1, _id: 0 }).sort({ name: -1 });

db.Person.find( { name: { $not: { $type: 10 }, $exists: true }});

db.Person.count({ name: “Mueller“ });db.Person.find( { name: “Mueller“ }).count();

MongoDB

UPDATE Person SET name = “Müller“WHERE name = “Mueller“;

DELETE Person WHERE name = “Mueller“;

INSERT Person (name, address)VALUES (“Mueller“, 3);

ALTER TABLE PERSONDROP COLUMN name;

DROP TABLE PERSON;


db.Person.updateMany( { name: “Mueller“ }, { $set: { name: “Müller“} });

db.Person.remove( { name: “Mueller“ } );

db.Person.insert( { name: “Mueller“, Address: { … } });

db.Person.updateMany( {}, { $unset: { name: 1 }} );

db.Person.drop();

MongoDB

● principle of least cardinality● Store what you query for

schema design principles

MongoDB

● applicable for 1:1 and 1:n when n can‘t get to large

● Embedded document cannot get too large

● Embedded document not very likely to change

● arrays that grow without bound should never be embedded

schema design: embedded document

{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, Person: [ { Name: “Mueller“, }, { Name: “Schneider“, }, ]}

Address

MongoDB

● applicable for :n when n can‘t get to large

● Referenced document likely to change often in future

● there are many referenced documents expected, so storing only the reference is cheaper

● there are large referenced documents expected, so storing only the reference is cheaper

● arrays that grow without bound should never be embedded

● Address should be accessible on its own

schema design: referencing

{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, Person: [ ObjectId(“...“), ObjectId(“...“), ]}

{ _id: ObjectId(“...“), Name: “Mueller“,}

Address

Person

MongoDB

● applicable for :n relations when n can get very large (note: a MongoDB document isn‘t allowed to exceed 16MB)

● Joins are done on application level

schema design: parent-referencing

{ _id: ObjectId(“...“), City: “Dubai“, Street: “1 Sheikh Mohammed bin Rashid Blvd“,}

{ _id: ObjectId(“...“), Name: “Mueller“, Address: ObjectId(“...“),}

Address

Person

MongoDB

● applicable for m:n when n and m can‘t get to large and application requires to navigate both ends

● disadvantage: need to update operations when changing references

schema design: two way referencing

{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“, Person: [ ObjectId(“...“), ObjectId(“...“), ]}

{ _id: ObjectId(“...“), Name: “Mueller“, Address: [ ObjectId(“...“), ObjectId(“...“), ]}

Address

Person

MongoDB

● queries expected to filter by certain fields of the referenced document, so including this field already in the hosts saves an additional query at application level

● disadvantage: two update operations for duplicated field

● disadvantage: additional memory consumption

schema design: denormalization

{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“,}

{ _id: ObjectId(“...“), Name: “Mueller“, Address: [ { id: ObjectId(“...“), city: “Leipzig“, }, ... ]}

Address

Person

MongoDB

● applicable for :n relations when n can get very large and it‘s expected that application will use pagination anyway

● DB schema will already create the chunks, the application will later query for

schema design: bucketing

{ _id: ObjectId(“...“), City: “Leipzig“, Street: “Burgstr. 1“,}

{ _id: ObjectId(“...“), Address: ObjectId(“...“), Page: 13, Count: 50, Persons: [ { Name: “Mueller“ }, ... ]}

Address

Person

MongoDBAggregation Framework

● Aggregation pipeline consisting of (processing) stages

– $match, $group, $project, $redact, $unwind, $lookup, $sort, ... ● Aggregation operators

– Boolean: $and, $or, $not

– Aggregation: $eq, $lt, $lte, $gt, $gte, $ne, $cmp

– Arithmetic: $add, $substract, $multiply, $divide, ...

– String: $concat, $substr, …

– Array: $size, $arrayElemAt, ...

– Aggregation variable: $map, $let

– Group Accumulator: $sum, $avg, $addToSet, $push, $min, $max

$first, $last, …

– ...

MongoDBAggregation Framework

db.Person.aggregate( [ { $match: { name: { $ne: "Fischer" } } }, { $group: { _id: "$name", city_occurs: { $addToSet: "$Address.city" } } }, { $project: { _id: "$_id", city_count: { $size: "$city_occurs" } }}, { $sort: { name: 1 } } { $match: { city_count: { $gt: 1 } }}, { $out: "PersonCityCount"}] );

PersonCityCount

{ _id: Mueller, city_count: 2,},{ _id: Schmidt, city_count: 3,}, ...

MongoDBMap-Reduce

● More control than aggregation framework, but slower

var map = function() { if(this.name != "Fischer") emit(this.name, this.Address.city);}var reduce = function(key, values) { var distinct = []; for(value in values) { if(distinct.indexOf(value) == -1) distinct.push(value); } return distinct.length;}db.Person.mapReduce(map, reduce, { out: "PersonCityCount2" });

MongoDB

● Default _id index, assuring uniqueness● Single field index: db.Person.createIndex( { name: 1 } );● Compound index: db.Address.createIndex( { city: 1, street: -1 } );

– index sorts first asc. by city then desc. by street

– Index will also used when query only filters by one of the fields● Multikey index: db.Person.createIndex( { Address.city: 1 } )

– Indexes content stored in arrays, an index entry is created foreach● Geospatial index● Text index● Hashed index

Indexes

https://docs.mongodb.com/manual/applications/geospatial-indexes/

https://docs.mongodb.com/manual/core/index-text/

https://docs.mongodb.com/manual/core/index-hashed/

MongoDB

● uniqueness: insertion of duplicate field value will be rejected● partial index: indexes only documents matching certain filter criteria● sparse index: indexes only documents having the indexed field● TTL index: automatically removes documents after certain time

● Query optimization: use db.MyCollection.find({ … }).explain() to check whether query is answered using an index, and how many documents had still to be scanned

● Covered queries: if a query only contains indexed fields, the results will delivered directly from index without scanning or materializing any documents

● Index intersection: can apply different indexes to cover query parts

Index properties

MongoDB

● Since MongoDB 3.0 WiredTiger is the default storage engine

– locking at document level enables concurrent writes on collection

– durability ensured via write-ahead transaction log and checkpoints (Journaling)

– supports compression of collections and indexes (via snappy or zlib)● MMAPv1 was the default storage until MongoDB 3.0

– since MongoDB 3.0 supports locking at collection level, before only database level

– useful for selective updates, as WiredTiger always replace the hole document in a update operation

Storage engines

https://docs.mongodb.com/v3.2/core/wiredtiger/

https://docs.mongodb.com/v3.2/core/journaling/

https://github.com/google/snappy

http://www.zlib.net/

https://docs.mongodb.com/v3.2/core/mmapv1/

MongoDBClustering, Sharding, Replication

Shard 1

Primary(mongod)

Secondary(mongod)

Secondary(mongod)

Config server(replica set)

App server(mongos)

Client app(driver)

Heartbeat

Replication Replication

writes

reads

MongoDBShard key selection

Shard 1 Shard 2 Shard 3

{ key: 12, ...}

{ key: 21, ...}

{ key: 35, ...}

min <= key < 15 15 <= key < 30 30 <= key < max

Sharded Collection

(Hash function)

MongoDB

● ACID → MongoDB is compliant to this only at document level

– Atomicity

– Consistency

– Isolation

– Durability

● CAP → MongoDB assures CP

– Consistency

– Availability

– Partition tolerance

transactions

BASE: Basically Available, Soft state, Eventual consistency

MongoDB doesn't support transactions

multi document updates can be performed via Two-Phase-Commit

https://msdn.microsoft.com/en-us/library/aa719484(v=vs.71).aspx

https://en.wikipedia.org/wiki/CAP_theorem

https://en.wikipedia.org/wiki/Eventual_consistency

https://en.wikipedia.org/wiki/Two-phase_commit_protocol

MongoDB

● Javascript: Mongo Node.js driver● Java: Java MongoDB Driver ● Python: PyMongo, Motor (async)● Ruby: MongoDB Ruby Driver● C#: Mongo Csharp Driver● ...

Driver

Object-document mappers● Javascript: mongoose, Camo, MEAN.JS● Java: Morphia, SpringData MongoDB ● Python: Django MongoDB engine● Ruby: MongoMapper, Mongoid ● C#: LinQ● ...

https://mongodb.github.io/node-mongodb-native/

https://docs.mongodb.com/ecosystem/drivers/java/

http://api.mongodb.com/python/current/tutorial.html

https://motor.readthedocs.io/en/stable/

https://github.com/mongodb/mongo-ruby-driver

http://mongodb.github.io/mongo-csharp-driver/

https://docs.mongodb.com/ecosystem/drivers/

http://mongoosejs.com/

https://github.com/scottwrobinson/camo

https://meanjs.org/

https://github.com/mongodb/morphia

https://github.com/spring-projects/spring-data-mongodb

https://django-mongodb-engine.readthedocs.io/en/latest/

http://mongomapper.com/

https://github.com/mongodb/mongoid

http://cdn.cdata.com/help/DGB/ado/pg_Updates.htm

MongoDB

● CKAN● MongoDB-Hadoop connector● MongoDB Spark connector● MongoDB ElasticSearch/Solr connector● ...

Extensions and connectors

Tool support

● Robomongo● MongoExpress● ...

http://extensions.ckan.org/extension/mongodb/

https://github.com/mongodb/mongo-hadoop

https://docs.mongodb.com/spark-connector/

https://github.com/mongodb-labs/mongo-connector

https://robomongo.org/

https://github.com/mongo-express/mongo-express

MongoDB

● Who uses MongoDB● Case studies● Arctic TimeSeries and Tick store● uptime

Real world examples

MongoDB in Code For Germany projects● Politik bei uns (Offenes Ratsinformationssystem), gescrapte Stadtratsdaten

werden gemäß dem OParl-Format in einer MongoDB gespeichert, siehe auch Daten, Web-API und Oparl-Client

https://www.mongodb.com/who-uses-mongodb

http://www.featuredcustomers.com/vendor/mongodb/case-studies

https://github.com/manahl/arctic

http://www.redotheweb.com/uptime/

https://politik-bei-uns.de/

https://oparl.org/

https://politik-bei-uns.de/daten

https://politik-bei-uns.de/api

https://github.com/stadt-karlsruhe/python-oparl

MongoDB

● Choose

– mass data processing, like event data

– dynamic scheme

● Not to choose

– static scheme with lot of relations

– strict transaction requirements

When to choose, when not to choose

MongoDB

● MongoDB Schema Simulation

● 6 Rules of Thumb for MongoDB Schema Design

● MongoDB Aggregation

● MongoDB Indexes

● Sharding

● MongoDB University

● Why Relational Databases are not the Cure-All

Links

https://github.com/christkv/mongodb-schema-simulator

http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1

https://docs.mongodb.com/manual/aggregation/

https://docs.mongodb.com/manual/indexes/

https://docs.mongodb.com/manual/sharding/

https://university.mongodb.com/

https://blog.philipphauer.de/relational-databases-strength-weaknesses-mongodb/

Software

Mongo DB schema design patterns