21

Back to Basics Webinar 3: Schema Design Thinking in Documents

  • Upload
    mongodb

  • View
    2.579

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Back to Basics Webinar 3: Schema Design Thinking in Documents
Page 2: Back to Basics Webinar 3: Schema Design Thinking in Documents

Code JoeD gets you a 25% discount off the list price

Early Bird Registration Ends May 13, 2016

Page 3: Back to Basics Webinar 3: Schema Design Thinking in Documents

Back to Basics 2016 : Webinar 3

Thinking in Documents

Joe Drumgoole

Director of Developer Advocacy, EMEA

@jdrumgoole

V1.2

Page 4: Back to Basics Webinar 3: Schema Design Thinking in Documents

4

Review

• Webinar 1 : Introduction to NoSQL

– Types of NoSQL database

– MongoDB is a document database

– Replica Sets and Shards

• Webinar 2

– Building a basic application

– Adding indexes

– Using Explain to measure database operators

Page 5: Back to Basics Webinar 3: Schema Design Thinking in Documents

5

Thinking in Documents

• Documents in MongoDB are Javascript Objects (JSON)

• Actually they are encoded as BSON

• BSON is “Binary JSON”

• BSON allows efficient encoding and decoding of JSON

• Required for efficient transmission and storage on disk

• Eliminates the need to “text parse” all the sub objects

• Full spec is online at http://bsonspec.org/

Page 6: Back to Basics Webinar 3: Schema Design Thinking in Documents

6

Example Document

{

first_name: ‘Paul’,

surname: ‘Miller’,

cell: 447557505611,

city: ‘London’,

location: [45.123,47.232],

Profession: [‘banking’, ‘finance’, ‘trader’],

cars: [

{ model: ‘Bentley’,

year: 1973,

value: 100000, … },

{ model: ‘Rolls Royce’,

year: 1965,

value: 330000, … }

]

}

Fields can contain an array

of sub-documents

Fields

Typed field values

Fields can

contain arrays

Page 7: Back to Basics Webinar 3: Schema Design Thinking in Documents

7

Data Stores –Key Value

Key 1 Value

Key 1 Value

Key 1 Value

Page 8: Back to Basics Webinar 3: Schema Design Thinking in Documents

8

Data Stores - Relational

Key 1

Value 1

Value 1

Value 1

Value 1

Key 2

Value 1

Value 1

Value 1

Value 1

Key 3

Value 1

Value 1

Value 1

Value 1

Key 4

Value 1

Value 1

Value 1

Value 1

Page 9: Back to Basics Webinar 3: Schema Design Thinking in Documents

9

Data Stores - Document

Key3

Key4

Key5

Value 3

Value 5

Value 4Key6

Value 5Key7

Value 2

Value 1Key1

Key1

Key1

Key2

Page 10: Back to Basics Webinar 3: Schema Design Thinking in Documents

10

In Document Form

{ “key1” : “value 1” }

{ “key1” : { “key2” : “value 1”,

“key3” : { “key4” : “value 3”,

“key5” : “value 4” }

}

{ “key1” : { “key6” : “value 5”,

“key7” : “value 6” }

}

Page 11: Back to Basics Webinar 3: Schema Design Thinking in Documents

11

Some Example Queries

# Will find the first two documents

db.demo.find( { “key1” : “value” } )

# find the second document by nested value

db.demo.find( { "key1.key3.key4" : "value 3" } )

# will find the third document

db.demo.find( { "key1.key6" : "value 4" } )

Page 12: Back to Basics Webinar 3: Schema Design Thinking in Documents

12

Modelling and Cardinality

• One to One

–Title to blog post

• One to Many

–Blog post to comments

• One to Millions

–Blog post to site views (e.g. Huffington Post)

Page 13: Back to Basics Webinar 3: Schema Design Thinking in Documents

13

One To One

{

“Title” : “This is a blog post”,

“Body” : “This is the body text of a very

short blog post”,

}

We can index on “Title” and “Body”.

Page 14: Back to Basics Webinar 3: Schema Design Thinking in Documents

14

One to Many

{

“Title” : “This is a blog post”,

“Body” : “This is the body text”,

“Comments” : [ { “name” : “Joe Drumgoole”,

“email” : “[email protected]”,

“comment” : “I love your writing style” },

{ “name” : “John Smith”,

“email” : “[email protected]”,

“comment” : “I hate your writing style” }]

}

Where we expect a small number of comments we can embed them

in the main document

Page 15: Back to Basics Webinar 3: Schema Design Thinking in Documents

15

Key Concerns

• What are the write patterns?

– Comments are added more frequently than posts

– Comments may have images, tags, large bodies of text

• What are the read patterns?

– Comments may not be displayed

– May be shown in their own window

– People rarely look at all the comments

Page 16: Back to Basics Webinar 3: Schema Design Thinking in Documents

16

Approach 2 –Separate Collection

• Keep all comments in a separate comments collection

• Add references to comments as an array of comment IDs

• Requires two queries to display blog post and associated comments

• Requires two writes to create a comments

{

_id : ObjectID( “AAAA” ),

name : “Joe Drumgoole”,

email : “[email protected]”,

comment :“I love your writing style”,

}

{

_id : ObjectID( “AAAB” ),

name : “John Smith”,

email : “[email protected]”,

comment :“I hate your writing style”,

}

{

“_id” : ObjectID( “ZZZZ” ),

“Title” : “A Blog Title”,

“Body” : “A blog post”,

“comments” : [ ObjectID( “AAAA” ),

ObjectID( “AAAB” )]

}

{

“_id” : ObjectID( “ZZZZ” ),

“Title” : “A Blog Title”,

“Body” : “A blog post”,

“comments” : []

}

Page 17: Back to Basics Webinar 3: Schema Design Thinking in Documents

17

Approach 3 –A Hybrid Approach

{

“_id” : ObjectID( “ZZZZ” ),

“Title” : “A Blog Title”,

“Body” : “A blog post”,

“comments” : [{

“_id” : ObjectID( “AAAA” )

“name” : “Joe Drumgoole”,

“email” : “[email protected]”,

comment :“I love your writing style”,

}

{

_id : ObjectID( “AAAB” ),

name : “John Smith”,

email : “[email protected]”,

comment :“I hate your writing style”,

}]

}

{

“_post_jd” : ObjectID( “ZZZZ” ),

“comments” : [{

“_id” : ObjectID( “AAAA” )

“name” : “Joe Drumgoole”,

“email” : “[email protected]”,

“comment” :“I love your writing

style”,

}

{...},{...},{...},{...},{...},{...}

,{..},{...},{...},{...} ]

Page 18: Back to Basics Webinar 3: Schema Design Thinking in Documents

18

What About One to A Million

• What is we were tracking mouse position for heat tracking?

– Each user will generate hundreds of data points per visit

– Thousands of data points per post

– Millions of data points per blog site

• Reverse the model

– Store a blog ID per event

{

“post_id” : ObjectID(“ZZZZ”),

“timestamp” : ISODate("2005-01-02T00:00:00Z”),

“location” : [24, 34]

“click” : False,

}

Page 19: Back to Basics Webinar 3: Schema Design Thinking in Documents

19

But – Finite number of events per second

{

post_id : ObjectID ( “ZZZZ” ),

timeStamp: ISODate("2005-01-02T00:00:00Z”),

events : {

0 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }},

1 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }},

2 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }},

3 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }},

...

59 :{ 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}

}

Page 20: Back to Basics Webinar 3: Schema Design Thinking in Documents

20

Guidelines

• Embed objects for one to one capabilities

• Look at read and write patterns to determine when to break out data

• Don’t get stuck in “one record” per item thinking

• Embrace the hierarchy

• Think about cardinality

• Grow your data by adding documents not be increasing document size

• Think about your indexes

• Document updates are transactions

Page 21: Back to Basics Webinar 3: Schema Design Thinking in Documents