Code JoeD gets you a 25% discount off the list price
Early Bird Registration Ends May 13, 2016
Back to Basics 2016 : Webinar 3
Thinking in Documents
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole
V1.2
4
Review
• Webinar 1 : Introduction to NoSQL
– Types of NoSQL database
– MongoDB is a document database
– Replica Sets and Shards
• Webinar 2
– Building a basic application
– Adding indexes
– Using Explain to measure database operators
5
Thinking in Documents
• Documents in MongoDB are Javascript Objects (JSON)
• Actually they are encoded as BSON
• BSON is “Binary JSON”
• BSON allows efficient encoding and decoding of JSON
• Required for efficient transmission and storage on disk
• Eliminates the need to “text parse” all the sub objects
• Full spec is online at http://bsonspec.org/
6
Example Document
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Fields
Typed field values
Fields can
contain arrays
7
Data Stores –Key Value
Key 1 Value
Key 1 Value
Key 1 Value
8
Data Stores - Relational
Key 1
Value 1
Value 1
Value 1
Value 1
Key 2
Value 1
Value 1
Value 1
Value 1
Key 3
Value 1
Value 1
Value 1
Value 1
Key 4
Value 1
Value 1
Value 1
Value 1
9
Data Stores - Document
Key3
Key4
Key5
Value 3
Value 5
Value 4Key6
Value 5Key7
Value 2
Value 1Key1
Key1
Key1
Key2
10
In Document Form
{ “key1” : “value 1” }
{ “key1” : { “key2” : “value 1”,
“key3” : { “key4” : “value 3”,
“key5” : “value 4” }
}
{ “key1” : { “key6” : “value 5”,
“key7” : “value 6” }
}
11
Some Example Queries
# Will find the first two documents
db.demo.find( { “key1” : “value” } )
# find the second document by nested value
db.demo.find( { "key1.key3.key4" : "value 3" } )
# will find the third document
db.demo.find( { "key1.key6" : "value 4" } )
12
Modelling and Cardinality
• One to One
–Title to blog post
• One to Many
–Blog post to comments
• One to Millions
–Blog post to site views (e.g. Huffington Post)
13
One To One
{
“Title” : “This is a blog post”,
“Body” : “This is the body text of a very
short blog post”,
…
}
We can index on “Title” and “Body”.
14
One to Many
{
“Title” : “This is a blog post”,
“Body” : “This is the body text”,
“Comments” : [ { “name” : “Joe Drumgoole”,
“email” : “[email protected]”,
“comment” : “I love your writing style” },
{ “name” : “John Smith”,
“email” : “[email protected]”,
“comment” : “I hate your writing style” }]
}
Where we expect a small number of comments we can embed them
in the main document
15
Key Concerns
• What are the write patterns?
– Comments are added more frequently than posts
– Comments may have images, tags, large bodies of text
• What are the read patterns?
– Comments may not be displayed
– May be shown in their own window
– People rarely look at all the comments
16
Approach 2 –Separate Collection
• Keep all comments in a separate comments collection
• Add references to comments as an array of comment IDs
• Requires two queries to display blog post and associated comments
• Requires two writes to create a comments
{
_id : ObjectID( “AAAA” ),
name : “Joe Drumgoole”,
email : “[email protected]”,
comment :“I love your writing style”,
}
{
_id : ObjectID( “AAAB” ),
name : “John Smith”,
email : “[email protected]”,
comment :“I hate your writing style”,
}
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”,
“comments” : [ ObjectID( “AAAA” ),
ObjectID( “AAAB” )]
}
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”,
“comments” : []
}
17
Approach 3 –A Hybrid Approach
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”,
“comments” : [{
“_id” : ObjectID( “AAAA” )
“name” : “Joe Drumgoole”,
“email” : “[email protected]”,
comment :“I love your writing style”,
}
{
_id : ObjectID( “AAAB” ),
name : “John Smith”,
email : “[email protected]”,
comment :“I hate your writing style”,
}]
}
{
“_post_jd” : ObjectID( “ZZZZ” ),
“comments” : [{
“_id” : ObjectID( “AAAA” )
“name” : “Joe Drumgoole”,
“email” : “[email protected]”,
“comment” :“I love your writing
style”,
}
{...},{...},{...},{...},{...},{...}
,{..},{...},{...},{...} ]
18
What About One to A Million
• What is we were tracking mouse position for heat tracking?
– Each user will generate hundreds of data points per visit
– Thousands of data points per post
– Millions of data points per blog site
• Reverse the model
– Store a blog ID per event
{
“post_id” : ObjectID(“ZZZZ”),
“timestamp” : ISODate("2005-01-02T00:00:00Z”),
“location” : [24, 34]
“click” : False,
}
19
But – Finite number of events per second
{
post_id : ObjectID ( “ZZZZ” ),
timeStamp: ISODate("2005-01-02T00:00:00Z”),
events : {
0 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }},
1 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }},
2 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }},
3 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }},
...
59 :{ 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}
}
20
Guidelines
• Embed objects for one to one capabilities
• Look at read and write patterns to determine when to break out data
• Don’t get stuck in “one record” per item thinking
• Embrace the hierarchy
• Think about cardinality
• Grow your data by adding documents not be increasing document size
• Think about your indexes
• Document updates are transactions