35
Consulting Engineer, MongoDB Matias Cascallares #MongoDBDays Schema Design Real World Use Case [email protected]

Schema Design - Real world use case

Embed Size (px)

DESCRIPTION

Implementing social inbox or chronological feeds using MongoDB

Citation preview

Page 1: Schema Design - Real world use case

Consulting Engineer, MongoDB

Matias Cascallares

#MongoDBDays

Schema DesignReal World Use Case

[email protected]

Page 2: Schema Design - Real world use case

Agenda

• Why is schema design important

• A real world use case– Social Inbox– History

• Conclusions

Page 3: Schema Design - Real world use case

Why is Schema Design important?

• Largest factor for a performant system

• Schema design with MongoDB is different• RDBMS – "What answers do I have?"• MongoDB – "What question will I have?"

Page 4: Schema Design - Real world use case

#1 – Message Inbox

Page 5: Schema Design - Real world use case

• Let’s get

• Social

Page 6: Schema Design - Real world use case

Sending Messages

?

Page 7: Schema Design - Real world use case

Reading my Inbox

?

Page 8: Schema Design - Real world use case

Design Goals

• Efficiently send new messages to recipients

• Efficiently read inbox

Page 9: Schema Design - Real world use case

3 Approaches (there are more)

• Fan out on Read

• Fan out on Write

• Fan out on Write with Bucketing

Page 10: Schema Design - Real world use case

// Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )

// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )

msg = {from: ”Matias",to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

// Send a messagedb.inbox.save( msg )

// Read my inboxdb.inbox.find( { to: ”Matias" } ).sort( { sent: -1 } )

Fan out on read

Schema Design, Matias Cascallares

Page 11: Schema Design - Real world use case

Fan out on read – IO

Shard 1 Shard 2 Shard 3

Send Message

Page 12: Schema Design - Real world use case

Fan out on read – IO

Shard 1 Shard 2 Shard 3

Read Inbox

Page 13: Schema Design - Real world use case

Considerations

• Write: one document per message sent

• Reading my inbox means finding all messages with my own name in the recipient field

• Read: requires scatter-gather on sharded cluster

• Then a lot of random IO on a shard to find everything

Page 14: Schema Design - Real world use case

// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )

msg = {from: ”Matias",to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

// Send a messagefor ( recipient in msg.to ) {

msg.recipient = recipientdb.inbox.save( msg );

}

// Read my inboxdb.inbox.find( { recipient: "Matias" } ).sort( { sent: -1 } )

Fan out on write

Schema Design, Matias Cascallares

Page 15: Schema Design - Real world use case

Fan out on write – IO

Shard 1 Shard 2 Shard 3

Send Message

Page 16: Schema Design - Real world use case

Fan out on write – IO

Shard 1 Shard 2 Shard 3

Read Inbox

Page 17: Schema Design - Real world use case

Considerations

• Write: one document per recipient

• Reading my inbox is just finding all of the messages with me as the recipient

• Can shard on recipient, so inbox reads hit one shard

• But still lots of random IO on the shard

Page 18: Schema Design - Real world use case

// Shard on “owner / sequence”db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )

msg = {from: ”Matias",to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

Fan out on write with buckets

Schema Design, Matias Cascallares

Page 19: Schema Design - Real world use case

// Send a messagefor( recipient in msg.to ) { count = db.users.findAndModify({

query: { user_name: recipient }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count;

sequence = Math.floor(count / 50);

db.inbox.update({ owner: recipient, sequence: sequence }, { $push: { "messages": msg } },{ upsert: true }

);}

// Read my inboxdb.inbox.find( { owner: "Matias" } ).sort ( { sequence: -1 } ).limit( 2 )

Fan out on write with buckets

Schema Design, Matias Cascallares

Page 20: Schema Design - Real world use case

Fan out on write with buckets

• Each “inbox” document is an array of messages

• Append a message onto “inbox” of recipient

• Bucket inboxes so there’s not too many messages per document

• Can shard on recipient, so inbox reads hit one shard

• 1 or 2 documents to read the whole inbox

Page 21: Schema Design - Real world use case

Fan out on write with buckets - IO

Shard 1 Shard 2 Shard 3

Send Message

Page 22: Schema Design - Real world use case

Fan out on write with buckets - IO

Shard 1 Shard 2 Shard 3

Read Inbox

Page 23: Schema Design - Real world use case

#2 – History

Page 24: Schema Design - Real world use case
Page 25: Schema Design - Real world use case

Design Goals

Need to retain a limited amount of history e.g.

– Number of items– Hours, Days, Weeks– May be legislative requirement (e.g. HIPPA, SOX,

DPA)

Need to query efficiently by – match– ranges

Page 26: Schema Design - Real world use case

3 Approaches (there are more)

• Bucket by number of messages

• Fixed size array

• Bucket by date + TTL Collections

Page 27: Schema Design - Real world use case

db.inbox.find() { owner: "Matias", sequence: 25, messages: [ { from: "Matias", to: [ "Bob", "Jane" ], sent: ISODate("2013-03-01T09:59:42.689Z"), message: "Hi!" }, …] }

// Query with a date rangedb.inbox.find({ owner: "Matias", messages: { $elemMatch: {sent:{$gt: ISODate("…") }}}})

// Remove elements based on a datedb.inbox.update({ owner: "Matias" }, { $pull: { messages: { sent: { $lt: ISODate("…") } } } } )

Bucket by number of messages

Schema Design, Matias Cascallares

Page 28: Schema Design - Real world use case

Considerations

• Shrinking documents, space can be reclaimed with– db.runCommand ( { compact: '<collection>' } )

• Removing the document after the last element in the array as been removed– { "_id" : …, "messages" : [ ], "owner" : ”Bob", "sequence" : 0 }

Page 29: Schema Design - Real world use case

msg = { from: "Your Boss", to: [ "Bob" ],

sent: new Date(), message: "CALL ME NOW!"

}

// 2.4 Introduces $each, $sort and $slice modifiers for $pushdb.messages.update(

{ _id: 1 }, { $push: { messages: { $each: [ msg ],

$sort: { sent: 1 },

$slice: -50 }

} })

Maintain the latest – Fixed size array

Schema Design, Matias Cascallares

Page 30: Schema Design - Real world use case

Considerations

• Need to compute the size of the array based on retention period

Page 31: Schema Design - Real world use case

// messages: one doc per user per day

db.inbox.findOne(){

_id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] }

// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 }

)

TTL Collections

Schema Design, Matias Cascallares

Page 32: Schema Design - Real world use case

Conclusion

Page 33: Schema Design - Real world use case

Summary

• Multiple ways to model a domain problem

• Understand the key uses cases of your app

• Balance between ease of query vs. ease of write

• Random IO should be avoided

• Scatter/gatter should be avoided

Page 34: Schema Design - Real world use case

Questions?

Page 35: Schema Design - Real world use case

Thank You

Consulting Engineer, MongoDB

Matias Cascallares

#MongoDBDays

[email protected]