MongoDB Schema Design -- Inboxes

Preview:

DESCRIPTION

by Jared Rosoff. Dec 2012

Citation preview

Technical Director, 10gen

@forjared

Jared Rosoff

#MongoSV 2012

Schema Design-- Inboxes!

Single Table En

Agenda

• Problem overview

• Design Options – Fan out on Read– Fan out on Write– Fan out on Write with Bucketing

• Conclusions

Problem Overview

Let’s getSocial

Sending Messages

?

Reading my Inbox

?

Design Options

3 Approaches (there are more)• Fan out on Read

• Fan out on Write

• Fan out on Write with Bucketing

Fan out on read

• Generally, not the right approach

• 1 document per message sent

• Multiple recipients in an array key

• Reading an inbox is finding all messages with my own name in the recipient field

• Requires scatter-gather on sharded cluster

• Then a lot of random IO on a shard to find everything

// Shard on “from”db.shardCollection(”myapp.messages”, { ”from”: 1} )

// Make sure we have an index to handle inbox readsdb.messages.ensureIndex( { ”to”: 1, ”sent”: 1 } )

msg = { from: "Joe”, to: [ ”Bob”, “Jane” ],

sent: new Date(), message: ”Hi!”,

}

// Send a messagedb.messages.save(msg)

// Read my inboxdb.messages.find({ to: ”Joe” }).sort({ sent: -1 })

Fan out on Read

Fan out on read – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Fan out on read – Inbox Read

Shard 1 Shard 2 Shard 3

Read Inbox

Fan out on write

• Tends to scale better than fan out on read

• 1 document per recipient

• Reading my inbox is just finding all of the messages with me as the recipient

• Can shard on recipient, so inbox reads hit one shard

• But still lots of random IO on the shard

// Shard on “recipient” and “sent” db.shardCollection(”myapp.messages”, { ”recipient”: 1, ”sent”: 1 } )

msg = { from: "Joe”, to: [ ”Bob”, “Jane” ],

sent: new Date(), message: ”Hi!”,

}

// Send a messagefor( recipient in msg.to ) {

msg.recipient = recipientdb.messages.save(msg);

}

// Read my inboxdb.messages.find({ recipient: ”Joe” }).sort({ sent: -1 })

Fan out on Write

Fan out on write – Send Message

Shard 1 Shard 2 Shard 3

Send Message

Fan out on write– Read Inbox

Shard 1 Shard 2 Shard 3

Read Inbox

Fan out on write with bucketing• Generally the best approach

• Each “inbox” document is an array of messages

• Append a message onto “inbox” of recipient

• Bucket inbox documents so there’s not too many per document

• Can shard on recipient, so inbox reads hit one shard

• 1 or 2 documents to read the whole inbox

// Shard on “owner / sequence”db.shardCollection(”myapp.inbox”, { ”owner”: 1, ”sequence”: 1 } )db.shardCollection(”myapp.users”, { ”user_name”: 1 } )msg = { from: "Joe”, to: [ ”Bob”, “Jane” ],

sent: new Date(), message: ”Hi!”,

}// Send a messagefor( recipient in msg.to) { sequence = db.users.findAndModify({ query: { user_name: recipient}, update: { '$inc': { ’msg_count': 1 }}, upsert: true, new: true }).msg_count / 50

db.inbox.update({ owner: recipient, sequence: sequence},

{ $push: { ‘messages’: msg } },

{ upsert: true });}// Read my inboxdb.inbox.find({ owner: ”Joe” }).sort({ sequence: -1 }).limit(2)

Fan out on Write

Bucketed fan out on write - Send

Shard 1 Shard 2 Shard 3

Send Message

Bucketed fan out on write - Read

Shard 1 Shard 2 Shard 3

Read Inbox

Discussion

TradeoffsFan out on

ReadFan out on

WriteBucketed Fan out on Write

Send Message Performance

Best Single shardSingle write

GoodShard per recipientMultiple writes

WorstShard per recipientAppends (grows)

Read Inbox Performance

WorstBroadcast all shardsRandom reads

GoodSingle shardRandom reads

Best Single shardSingle read

Data Size Best Message stored once

WorstCopy per recipient

WorstCopy per recipient

Things to consider

• Lots of recipients

• Fan out on write might become prohibitive• Consider introducing a “Group”

• Very large message size

• Multiple copies of messages can be a burden• Consider single copy of message with a “pointer”

per inbox

• More writes than reads

• Fan out on read might be okay

Comments – where do they live?

Conclusion

Summary

• Multiple ways to model status updates

• Bucketed fan out on write is typically the better approach

• Think about how your model distributes across shards

• Think about how much random IO needs to happen on a shard

Technical Director, 10gen

Jared Rosoff

#MongoSV

Thank You

Recommended