21

Back to Basics Webinar 3 - Thinking in Documents

Embed Size (px)

Citation preview

Page 1: Back to Basics Webinar 3 - Thinking in Documents
Page 2: Back to Basics Webinar 3 - Thinking in Documents

Code JoeD gets you a 25% discount off the list priceEarly Bird Registration Ends May 13, 2016

Page 3: Back to Basics Webinar 3 - Thinking in Documents

Back to Basics 2016 : Webinar 3

Thinking in DocumentsJoe Drumgoole

Director of Developer Advocacy, EMEA@jdrumgoole

V1.1

Page 4: Back to Basics Webinar 3 - Thinking in Documents

4

Review

• Webinar 1 : Introduction to NoSQL– Types of NoSQL database– MongoDB is a document database– Replica Sets and Shards

• Webinar 2– Building a basic application– Adding indexes– Using Explain to measure database operators

Page 5: Back to Basics Webinar 3 - Thinking in Documents

5

Thinking in Documents

• Documents in MongoDB are Javascript Objects (JSON)• Actually they are encoded as BSON• BSON is “Binary JSON”• BSON allows efficient encoding and decoding of JSON• Required for efficient transmission and storage on disk• Eliminates the need to “text parse” all the sub objects• Full spec is online at http://bsonspec.org/

Page 6: Back to Basics Webinar 3 - Thinking in Documents

6

Example Document

{ first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ]}

Fields can contain an array of sub-documents

Fields

Typed field values

Fields can contain arrays

String

Number

Geo-Location

Page 7: Back to Basics Webinar 3 - Thinking in Documents

7

Data Stores – Key Value

Key 1 Value

Key 1 Value

Key 1 Value

Page 8: Back to Basics Webinar 3 - Thinking in Documents

8

Data Stores - Relational

Key 1

Value 1

Value 1

Value 1

Value 1

Key 2

Value 1

Value 1

Value 1

Value 1

Key 3

Value 1

Value 1

Value 1

Value 1

Key 4

Value 1

Value 1

Value 1

Value 1

Page 9: Back to Basics Webinar 3 - Thinking in Documents

9

Data Stores - Document

Key3

Key4

Key5

Value 3

Value 5

Value 4Key6

Value 5Key7

Value 2

Value 1Key1

Key1

Key1

Key2

Page 10: Back to Basics Webinar 3 - Thinking in Documents

10

In Document Form

{ “key1” : “value 1” }

{ “key1” : { “key2” : “value 1”, “key3” : { “key4” : “value 3”, “key5” : “value 4” }}

{ “key1” : { “key6” : “value 5”, “key7” : “value 6” }}

Page 11: Back to Basics Webinar 3 - Thinking in Documents

11

Some Example Queries

# Will find the first two documentsdb.demo.find( { “key1” : “value” } )

# find the second document by nested valuedb.demo.find( { "key1.key3.key4" : "value 3" } )

# will find the third documentdb.demo.find( { "key1.key6" : "value 6" } )

Page 12: Back to Basics Webinar 3 - Thinking in Documents

12

Modelling and Cardinality

• One to One–Title to blog post

• One to Many–Blog post to comments

• One to Millions–Blog post to site views (e.g. Huffington Post)

Page 13: Back to Basics Webinar 3 - Thinking in Documents

13

One To One

{ “Title” : “This is a blog post”, “Body” : “This is the body text of a very short blog post”, …}

We can index on “Title” and “Body”.

Page 14: Back to Basics Webinar 3 - Thinking in Documents

14

One to Many

{ “Title” : “This is a blog post”, “Body” : “This is the body text”, “Comments” : [ { “name” : “Joe Drumgoole”, “email” : “[email protected]”, “comment” : “I love your writing style” }, { “name” : “John Smith”, “email” : “[email protected]”, “comment” : “I hate your writing style” }]}

Where we expect a small number of comments we can embed them in the main document

Page 15: Back to Basics Webinar 3 - Thinking in Documents

15

Key Concerns

• What are the write patterns?– Comments are added more frequently than posts– Comments may have images, tags, large bodies of text

• What are the read patterns?– Comments may not be displayed– May be shown in their own window– People rarely look at all the comments

Page 16: Back to Basics Webinar 3 - Thinking in Documents

16

Approach 2 – Separate Collection

• Keep all comments in a separate comments collection• Add references to comments as an array of comment IDs• Requires two queries to display blog post and associated comments• Requires two writes to create a comments

{ _id : ObjectID( “AAAA” ), name : “Joe Drumgoole”, email : “[email protected]”, comment :“I love your writing style”,}{ _id : ObjectID( “AAAB” ), name : “John Smith”, email : “[email protected]”, comment :“I hate your writing style”,}

{ “_id” : ObjectID( “ZZZZ” ), “Title” : “A Blog Title”, “Body” : “A blog post”, “comments” : [ ObjectID( “AAAA” ), ObjectID( “AAAB” )]}{ “_id” : ObjectID( “ZZZZ” ), “Title” : “A Blog Title”, “Body” : “A blog post”, “comments” : []}

Page 17: Back to Basics Webinar 3 - Thinking in Documents

17

Approach 3 – A Hybrid Approach

{ “_id” : ObjectID( “ZZZZ” ), “Title” : “A Blog Title”, “Body” : “A blog post”, “comments” : [{ “_id” : ObjectID( “AAAA” ) “name” : “Joe Drumgoole”, “email” : “[email protected]”,

comment :“I love your writing style”,}{ _id : ObjectID( “AAAB” ), name : “John Smith”, email : “[email protected]”, comment :“I hate your writing style”,}]

}

{ “_post_jd” : ObjectID( “ZZZZ” ), “comments” : [{ “_id” : ObjectID( “AAAA” ) “name” : “Joe Drumgoole”, “email” : “[email protected]”,

“comment” :“I love your writing style”,}{...},{...},{...},{...},{...},{...},{..},{...},{...},{...} ]

Page 18: Back to Basics Webinar 3 - Thinking in Documents

18

What About One to A Million

• What is we were tracking mouse position for heat tracking?– Each user will generate hundreds of data points per visit– Thousands of data points per post– Millions of data points per blog site

• Reverse the model– Store a blog ID per event

{ “post_id” : ObjectID(“ZZZZ”), “timestamp” : ISODate("2005-01-02T00:00:00Z”), “location” : [24, 34] “click” : False,}

Page 19: Back to Basics Webinar 3 - Thinking in Documents

19

But – Finite number of events per second

{ post_id : ObjectID ( “ZZZZ” ), timeStamp: ISODate("2005-01-02T00:00:00Z”), events : { 0 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, 1 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, 2 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, 3 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, ... 59 :{ 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}}

Page 20: Back to Basics Webinar 3 - Thinking in Documents

20

Guidelines

• Embed objects for one to one capabilities• Look at read and write patterns to determine when to break out data• Don’t get stuck in “one record” per item thinking• Embrace the hierarchy• Think about cardinality• Grow your data by adding documents not be increasing document size• Think about your indexes• Document updates are transactions

Page 21: Back to Basics Webinar 3 - Thinking in Documents