54
Thinking in Documents Perl Engineer & Evangelist, MongoDB, Inc Mike Friedman #mongodb @friedo

Back to Basics 1: Thinking in documents

  • Upload
    mongodb

  • View
    11.693

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Back to Basics 1: Thinking in documents

Thinking in Documents

Perl Engineer & Evangelist, MongoDB, Inc

Mike Friedman

#mongodb

@friedo

Page 2: Back to Basics 1: Thinking in documents

Agenda

• What is a Record?

• Core Concepts

• What is an Entity?

• Associating Entities

• General Recommendations

Page 3: Back to Basics 1: Thinking in documents

All application development isSchema Design

Page 4: Back to Basics 1: Thinking in documents

Success comes fromProper Data Structure

Page 5: Back to Basics 1: Thinking in documents

What is a Record?

Page 6: Back to Basics 1: Thinking in documents

Key → Value

• One-dimensional storage

• Single value is a blob

• Query on key only

• No schema

• Value cannot be updated, only replaced

Key Blob

Page 7: Back to Basics 1: Thinking in documents

Relational

• Two-dimensional storage (tuples)

• Each field contains a single value

• Query on any field

• Very structured schema (table)

• In-place updates

• Normalization process requires many tables, joins, indexes, and poor data locality

PrimaryKey

Page 8: Back to Basics 1: Thinking in documents

Document

• N-dimensional storage

• Each field can contain 0, 1, many, or embedded values

• Query on any field & level

• Flexible schema

• Inline updates *

• Embedding related data has optimal data locality, requires fewer indexes, has better performance

_id

Page 9: Back to Basics 1: Thinking in documents

Core Concepts

Page 10: Back to Basics 1: Thinking in documents

Traditional Schema DesignFocus on data storage

Page 11: Back to Basics 1: Thinking in documents

Document Schema DesignFocus on data use

Page 12: Back to Basics 1: Thinking in documents

Traditional:What answers do I have?

Document:What questions do I have?

Page 13: Back to Basics 1: Thinking in documents

Schema Design is Flexible

Page 14: Back to Basics 1: Thinking in documents

Flexibility

• Choices for schema design

• Each record can have different fields

• Field names consistent for programming

• Common structure can be enforced by application

• Easy to evolve as needed

Page 15: Back to Basics 1: Thinking in documents

Building Blocks ofDocument Schema Design

Page 16: Back to Basics 1: Thinking in documents

1 - Arrays

[ 1, 2, 3, "four", 5, "six", [ 7, 8, 9 ]]

Page 17: Back to Basics 1: Thinking in documents

1 – ArraysMultiple Values per Field

• Absent

• Set to null

• Set to a single value

• Set to an array of many values

Each field in a document can be:

Page 18: Back to Basics 1: Thinking in documents

1 – ArraysMultiple Values per Field

• Query for any matching value– Can be indexed and each value in the array is in

the index

Page 19: Back to Basics 1: Thinking in documents

2 – Embedded Documents{

"foo": 42, "bar": 43, "stuff": { ... }, ...}

Page 20: Back to Basics 1: Thinking in documents

2 - Embedded Documents

• A value in a document can be another document

• Nested documents provide structure

• Query any field at any level– Can be indexed

Page 21: Back to Basics 1: Thinking in documents

What is an Entity?

Page 22: Back to Basics 1: Thinking in documents

An Entity

• Object in your model

• Associations with other entities

An Entity

• Object in your model

• Associations with other entities

Referencing (Relational)

Embedding (Document)

has_one embeds_one

belongs_to embedded_in

has_many embeds_many

has_and_belongs_to_many

Page 23: Back to Basics 1: Thinking in documents

Let's model something togetherHow about a business card?

Page 24: Back to Basics 1: Thinking in documents

Business Card

Page 25: Back to Basics 1: Thinking in documents

Referencing

Addresses

{"_id": 1,"street": "10260 Bandley

Dr","city": "Cupertino","state": "CA","zip_code": "95014","country": "USA"

}

Contacts

{ "_id": 2, "name": "Steven Jobs", "title": "VP, New Product Development", "company": "Apple Computer", "phone": "408-996-1010", "address_id": 1}

Page 26: Back to Basics 1: Thinking in documents

Embedding

Contacts

{ "_id": 2, "name": "Steven Jobs", "title": "VP, New Product Development", "company": "Apple Computer", "address": { "street": "10260 Bandley Dr", "city": "Cupertino", "state": "CA", "zip_code": "95014", "country": "USA" }, "phone": "408-996-1010"}

Page 27: Back to Basics 1: Thinking in documents

Relational Schema

Contact

• name• compan

y• title• phone

Address

• street• city• state• zip_cod

e

Page 28: Back to Basics 1: Thinking in documents

Contact

• name• company• adress

• Street• City• State• Zip

• title• phone

• address• street• city• State• zip_cod

e

Document Schema

Page 29: Back to Basics 1: Thinking in documents

How are they different? Why?

Contact

• name• compan

y• title• phone

Address

• street• city• state• zip_cod

e

Contact

• name• company• adress

• Street• City• State• Zip

• title• phone

• address• street• city• state• zip_cod

e

Page 30: Back to Basics 1: Thinking in documents

Schema Flexibility

{ "name": "Steven Jobs", "title": "VP, New Product Development", "company": "Apple Computer", "address": { "street": "10260 Bandley Dr", "city": "Cupertino", "state": "CA", "zip_code": "95014" }, "phone": "408-996-1010"}

{ "name": "Larry Page", "url": "http://google.com/", "title": "CEO", "company": "Google!", "email": "[email protected]", "address": { "street": "555 Bryant, #106", "city": "Palo Alto", "state": "CA", "zip_code": "94301" } "phone": "650-618-1499", "fax": "650-330-0100"}

Page 31: Back to Basics 1: Thinking in documents

Example

Page 32: Back to Basics 1: Thinking in documents

Let’s Look at anAddress Book

Page 33: Back to Basics 1: Thinking in documents

Address Book

• What questions do I have?

• What are my entities?

• What are my associations?

Page 34: Back to Basics 1: Thinking in documents

Address Book Entity-Relationship

Contacts• name• company• title

Addresses

• type• street• city• state• zip_code

Phones• type• number

Emails• type• address

Thumbnails

• mime_type• data

Portraits• mime_type• data

Groups• name

N

1

N

1

N

N

N

1

1

1

11

Twitters• name• location• web• bio

1

1

Page 35: Back to Basics 1: Thinking in documents

Associating Entities

Page 36: Back to Basics 1: Thinking in documents

One to One

Contacts• name• company• title

Addresses

• type• street• city• state• zip_code

Phones• type• number

Emails• type• address

Thumbnails

• mime_type• data

Portraits• mime_type• data

Groups• name

N

1

N

1

N

N

N

1

1

1

11

Twitters• name• location• web• bio

1

1

Page 37: Back to Basics 1: Thinking in documents

One to OneSchema Design Choices

contact• twitter_id

twitter1 1

contact twitter• contact_id1 1

Redundant to track relationship on both sides • Both references must be updated for consistency

• May save a fetch?

Contact• twitter

twitter 1

Page 38: Back to Basics 1: Thinking in documents

One to OneGeneral Recommendation

• Full contact info all at once– Contact embeds twitter• Parent-child relationship

– "contains"

• No additional data duplication• Can query or index on embedded field

– e.g., "twitter.name"– Exceptional cases…• Reference portrait which has very large

data

Contact• twitter

twitter 1

Page 39: Back to Basics 1: Thinking in documents

One to Many

Contacts• name• company• title

Addresses

• type• street• city• state• zip_code

Phones• type• number

Emails• type• address

Thumbnails

• mime_type• data

Portraits• mime_type• data

Groups• name

N

1

N

1

N

N

N

1

1

1

11

Twitters• name• location• web• bio

1

1

Page 40: Back to Basics 1: Thinking in documents

One to ManySchema Design Choices

contact• phone_ids: [

]phone1 N

contact phone• contact_id1 N

Redundant to track relationship on both sides • Both references must be updated for consistency

• Not possible in relational DBs• Save a fetch?

Contact• phones

phoneN

Page 41: Back to Basics 1: Thinking in documents

One to ManyGeneral Recommendation

• Full contact info all at once– Contact embeds multiple phones• Parent-children relationship

– "contains"

• No additional data duplication• Can query or index on any field

– e.g., { "phones.type": "mobile" }– Exceptional cases…• Scaling: maximum document size is 16MB

Contact• phones

phoneN

Page 42: Back to Basics 1: Thinking in documents

Many to Many

Contacts• name• company• title

Addresses

• type• street• city• state• zip_code

Phones• type• number

Emails• type• address

Thumbnails

• mime_type• data

Portraits• mime_type• data

Groups• name

N

1

N

1

N

N

N

1

1

1

11

Twitters• name• location• web• bio

1

1

Page 43: Back to Basics 1: Thinking in documents

Many to ManyTraditional Relational Association

Join table

Contacts• name• company• title• phone

Groups• name

GroupContacts

• group_id• contact_id

Use arrays instead

X

Page 44: Back to Basics 1: Thinking in documents

Many to ManySchema Design Choices

group• contact_ids:

[ ]contactN N

groupcontact• group_ids:

[ ]N N

Redundant to track relationship on both sides • Both references must be

updated for consistency

Redundant to track relationship on both sides • Duplicated data must be

updated for consistency

group• contacts

contactN

contact• groups

group N

Page 45: Back to Basics 1: Thinking in documents

Many to ManyGeneral Recommendation

• Depends on use case1. Simple address book• Contact references groups

2. Corporate email groups• Group embeds contacts for performance

• Exceptional cases– Scaling: maximum document size is 16MB– Scaling may affect performance and working set

groupcontact• group_ids:

[ ]N N

Page 46: Back to Basics 1: Thinking in documents

Contacts• name• company• title

addresses• type• street• city• state• zip_code

phones• type• number

emails• type• address

thumbnail• mime_type• data

Portraits• mime_type• data

Groups• name

N

1

N

1

twitter• name• location• web• bio

N

N

N

1

1

Document model - holistic and efficient representation

Page 47: Back to Basics 1: Thinking in documents

Contact document example

{

"name" : "Gary J. Murakami, Ph.D.",

"company" : "MongoDB, Inc.",

"title" : "Lead Engineer",

"twitter" : {

"name" : "Gary Murakami", "location" : "New Providence, NJ",

"web" : "http://www.nobell.org"

},

"portrait_id" : 1,

"addresses" : [

{ "type" : "work", "street" : "229 W 43rd St.", "city" : "New York", "zip_code" : "10036" }

],

"phones" : [

{ "type" : "work", "number" : "1-866-237-8815 x8015" }

],

"emails" : [

{ "type" : "work", "address" : "[email protected]" },

{ "type" : "home", "address" : "[email protected]" }

]

}

Page 48: Back to Basics 1: Thinking in documents

Working Set

To reduce the working set, consider…

• Reference bulk data, e.g., portrait

• Reference less-used data instead of embedding – Extract into referenced child document

Also for performance issues with large documents

Page 49: Back to Basics 1: Thinking in documents

General Recommendations

Page 50: Back to Basics 1: Thinking in documents

Legacy Migration

1. Copy existing schema & some data to MongoDB

2. Iterate schema design developmentMeasure performance, find bottlenecks, and embed

1. one to one associations first2. one to many associations next3. many to many associations

3. Migrate full dataset to new schema

New Software Application? Embed by default

Page 51: Back to Basics 1: Thinking in documents

Embedding over Referencing • Embedding is a bit like pre-joined data

– BSON (Binary JSON) document ops are easy for the server

• Embed (90/10 following rule of thumb)– When the "one" or "many" objects are viewed in

the context of their parent– For performance– For atomicity

• Reference– When you need more scaling– For easy consistency with "many to many"

associations without duplicated data

Page 52: Back to Basics 1: Thinking in documents

It’s All About Your Application

• Programs+Databases = (Big) Data Applications

• Your schema is the impedance matcher– Design choices: normalize/denormalize,

reference/embed– Melds programming with MongoDB for best of

both– Flexible for development and change

• Programs×MongoDB = Great Big Data Applications

Page 53: Back to Basics 1: Thinking in documents
Page 54: Back to Basics 1: Thinking in documents

Thank You

Perl Engineer & Evangelist, MongoDB

Mike Friedman

#mongodb

@friedo