51
Technical Director, 10gen @jonnyeight [email protected] alvinonmongodb.com Alvin Richards #MongoDBdays Schema Design 4 Real World Use Cases

Schema Tricks & Tips

Embed Size (px)

DESCRIPTION

In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.

Citation preview

Page 1: Schema Tricks & Tips

Technical Director, 10gen

@jonnyeight [email protected] alvinonmongodb.com

Alvin Richards

#MongoDBdays

Schema Design 4 Real World Use Cases

Page 2: Schema Tricks & Tips

One size fits all?

Page 3: Schema Tricks & Tips

Single Table En

Agenda

•  Why is schema design important

•  4 Real World Schemas –  Inbox –  History –  Indexed Attributes –  Multiple Identities

•  Conclusions

Page 4: Schema Tricks & Tips

Why is Schema Design important?

•  Largest factor for a performant system

•  Schema design with MongoDB is different •  RBMS – "What answers do I have?" •  MongoDB – "What question will I have?"

Page 5: Schema Tricks & Tips

#1 - Message Inbox

Page 6: Schema Tricks & Tips

Let’s get Social

Page 7: Schema Tricks & Tips

Sending Messages

?

Page 8: Schema Tricks & Tips

Design Goals

•  Efficiently send new messages to recipients

•  Efficiently read inbox

Page 9: Schema Tricks & Tips

Reading my Inbox

?

Page 10: Schema Tricks & Tips

3 Approaches (there are more)

•  Fan out on Read

•  Fan out on Write

•  Fan out on Write with Bucketing

Page 11: Schema Tricks & Tips

// Shard on "from" db.shardCollection( "mongodbdays.inbox", { from: 1 } ) // Make sure we have an index to handle inbox reads db.inbox.ensureIndex( { to: 1, sent: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

} // Send a message db.inbox.save( msg ) // Read my inbox db.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )

Fan out on read

Page 12: Schema Tricks & Tips

Fan out on read – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Page 13: Schema Tricks & Tips

Fan out on read – Inbox Read

Shard 1 Shard 2 Shard 3

Read Inbox

Page 14: Schema Tricks & Tips

Considerations

•  1 document per message sent

•  Multiple recipients in an array key

•  Reading an inbox is finding all messages with my own name in the recipient field

•  Requires scatter-gather on sharded cluster

•  Then a lot of random IO on a shard to find everything

Page 15: Schema Tricks & Tips

// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } ) msg = { from: "Joe”, to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

} // Send a message for ( recipient in msg.to ) {

msg.recipient = recipient db.inbox.save( msg );

} // Read my inbox db.inbox.find( { recipient: "Joe" } ).sort( { sent: -1 } )

Fan out on write

Page 16: Schema Tricks & Tips

Fan out on write – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Page 17: Schema Tricks & Tips

Fan out on write– Read Inbox

Shard 1 Shard 2 Shard 3

Read Inbox

Page 18: Schema Tricks & Tips

Considerations

•  1 document per recipient

•  Reading my inbox is just finding all of the messages with me as the recipient

•  Can shard on recipient, so inbox reads hit one shard

•  But still lots of random IO on the shard

Page 19: Schema Tricks & Tips

Fan out on write with buckets

•  Each “inbox” document is an array of messages

•  Append a message onto “inbox” of recipient

•  Bucket inbox documents so there’s not too many per document

•  Can shard on recipient, so inbox reads hit one shard

•  A few documents to read the whole inbox

Page 20: Schema Tricks & Tips

// Shard on “owner / sequence” db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } ) db.shardCollection( "mongodbdays.users", { user_name: 1 } ) msg = { from: "Joe", to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

} // Send a message for( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50);

db.inbox.update( { owner: msg.to[recipient], sequence: sequence }, { $push: { "messages": msg } }, { upsert: true } );

} // Read my inbox db.inbox.find( { owner: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )

Fan out on write – with buckets

Page 21: Schema Tricks & Tips

Bucketed fan out on write - Send

Shard 1 Shard 2 Shard 3

Send Message

Page 22: Schema Tricks & Tips

Bucketed fan out on write - Read

Shard 1 Shard 2 Shard 3

Read Inbox

Page 23: Schema Tricks & Tips

#2 – History

Page 24: Schema Tricks & Tips
Page 25: Schema Tricks & Tips

Design Goals

•  Need to retain a limited amount of history e.g. –  Hours, Days, Weeks –  May be legislative requirement (e.g. HIPPA, SOX, DPA)

•  Need to query efficiently by –  match –  ranges

Page 26: Schema Tricks & Tips

3 Approaches (there are more)

•  Bucket by Number of messages

•  Fixed size Array

•  Bucket by Date + TTL Collections

Page 27: Schema Tricks & Tips

db.inbox.find() { owner: "Joe", sequence: 25, messages: [ { from: "Joe", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, … ] } // Query with a date range db.inbox.find ( { owner: "friend1", messages: { $elemMatch: { sent: { $gte: ISODate("2013-04-04…") }}}}) // Remove elements based on a date db.inbox.update( { owner: "friend1" }, { $pull: { messages: { sent: { $gte: ISODate("2013-04-04…") } } } } )

Inbox – Bucket by # messages

Page 28: Schema Tricks & Tips

Considerations

•  Shrinking documents, space can be reclaimed with –  db.runCommand ( { compact: '<collection>' } )

•  Removing the document after the last element in the array as been removed –  { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }

Page 29: Schema Tricks & Tips

msg = { from: "Your Boss", to: [ "Bob" ],

sent: new Date(), message: "CALL ME NOW!"

} // 2.4 Introduces $each, $sort and $slice for $push db.messages.update(

{ _id: 1 }, { $push: { messages: { $each: [ msg ],

$sort: { sent: 1 }, $slice: -50 } }

} )

Maintain the latest – Fixed Size Array

Page 30: Schema Tricks & Tips

Considerations

•  Need to compute the size of the array based on retention period

Page 31: Schema Tricks & Tips

// messages: one doc per user per day

db.inbox.findOne() {

_id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] }

// Auto expires data after 31536000 seconds = 1 year db.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )

TTL Collections

Page 32: Schema Tricks & Tips

#3 – Indexed Attributes

Page 33: Schema Tricks & Tips

Design Goal

•  Application needs to stored a variable number of attributes e.g. –  User defined Form –  Meta Data tags

•  Queries needed –  Equality –  Range based

•  Need to be efficient, regardless of the number of attributes

Page 34: Schema Tricks & Tips

2 Approaches (there are more)

•  Attributes

•  Attributes as Objects in an Array

Page 35: Schema Tricks & Tips

db.files.insert( { _id: "local.0", attr: { type: "text", size: 64, created: ISODate("2013-03-01T09:59:42.689Z" } } )

db.files.insert( { _id:"local.1", attr: { type: "text", size: 128} } )

db.files.insert( { _id:"mongod", attr: { type: "binary", size: 256, created: ISODate("2013-04-01T18:13:42.689Z") } } )

// Need to create an index for each item in the sub-document db.files.ensureIndex( { "attr.type": 1 } ) db.files.find( { "attr.type": "text"} )

// Can perform range queries db.files.ensureIndex( { "attr.size": 1 } ) db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )

Attributes as a Sub-Document

Page 36: Schema Tricks & Tips

Considerations

•  Each attribute needs an Index

•  Each time you extend, you add an index

•  Lots and lots of indexes

Page 37: Schema Tricks & Tips

db.files.insert( { _id: "local.0", attr: [ { type: "text" }, { size: 64 }, { created: ISODate("2013-03-01T09:59:42.689Z" } ] } )

db.files.insert( { _id: "local.1", attr: [ { type: "text" }, { size: 128 } ] } )

db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("2013-04-01T18:13:42.689Z") } ] } )

db.files.ensureIndex( { attr: 1 } )

Attributes as Objects in Array

Page 38: Schema Tricks & Tips

// Range queries db.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } )

db.files.find( { attr: { $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } )

// Multiple condition – Only the first predicate on the query can use the Index // ensure that this is the most selective. // Index Intersection will allow multiple indexes, see SERVER-3071

db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )

// Each $or can use an index db.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )

Queries

Page 39: Schema Tricks & Tips

#4 – Multiple Identities

Page 40: Schema Tricks & Tips

Design Goal

•  Ability to look up by a number of different identities e.g. •  Username •  Email address •  FB Handle •  LinkedIn URL

Page 41: Schema Tricks & Tips

2 Approaches (there are more)

•  Identifiers in a single document

•  Separate Identifiers from Content

Page 42: Schema Tricks & Tips

db.users.findOne() { _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…} }

// Shard collection by _id db.shardCollection("mongodbdays.users", { _id: 1 } )

// Create indexes on each key db.users.ensureIndex( { email: 1} ) db.users.ensureIndex( { fb: 1 } ) db.users.ensureIndex( { li: 1 } )

Single Document by User

Page 43: Schema Tricks & Tips

Read by _id (shard key)

Shard 1 Shard 2 Shard 3

find( { _id: "joe"} )

Page 44: Schema Tricks & Tips

Read by email (non-shard key)

Shard 1 Shard 2 Shard 3

find ( { email: [email protected] } )

Page 45: Schema Tricks & Tips

Considerations

•  Lookup by shard key is routed to 1 shard

•  Lookup by other identifier is scatter gathered across all shards

•  Secondary keys cannot have a unique index

Page 46: Schema Tricks & Tips

// Create unique index db.identities.ensureIndex( { identifier : 1} , { unique: true} ) // Create a document for each users document db.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } ) db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } ) db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } ) // Shard collection by _id db.shardCollection( "mongodbdays.identities", { identifier : 1 } )

// Create unique index db.users.ensureIndex( { _id: 1} , { unique: true} )

// Create a docuemnt that holds all the other user attributes db.users.save( { _id: "1200-42", ... } )

// Shard collection by _id db.shardCollection( "mongodbdays.users", { _id: 1 } )

Document per Identity

Page 47: Schema Tricks & Tips

Read requires 2 reads

Shard 1 Shard 2 Shard 3

db.identities.find({"identifier" : { "hndl" : "joe" }})

db.users.find( { _id: "1200-42"} )

Page 48: Schema Tricks & Tips

Solution

•  Lookup to Identities is a routed query

•  Lookup to Users is a routed query

•  Unique indexes available

Page 49: Schema Tricks & Tips

Conclusion

Page 50: Schema Tricks & Tips

Summary

•  Multiple ways to model a domain problem

•  Understand the key uses cases of your app

•  Balance between ease of query vs. ease of write

•  Random IO should be avoided

Page 51: Schema Tricks & Tips

Technical Director, 10gen

@jonnyeight [email protected] alvinonmongodb.com

Alvin Richards

#MongoDBdays

Thank You