46
Schema Design Buzz Moschetti Enterprise Architect, MongoDB [email protected] @buzzmoschetti

Webinar: Schema Design

  • Upload
    mongodb

  • View
    7.077

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Webinar: Schema Design

Schema Design

Buzz MoschettiEnterprise Architect, MongoDB

[email protected]

@buzzmoschetti

Page 2: Webinar: Schema Design

2

Before We Begin

• This webinar is being recorded

• Use The Chat Window for

• Technical assistance

• Q&A

• MongoDB Team will answer questions

• Common questions will be reviewed at the

end of the webinar

Page 3: Webinar: Schema Design

Theme #1:

Great Schema Design involves

much more than the database

• Easily understood structures

• Harmonized with software

• Acknowledging legacy issues

Page 4: Webinar: Schema Design

Theme #2:

Today’s solutions need to

accommodate tomorrow’s needs

• End of “Requirements Complete”

• Ability to economically scale

• Shorter solutions lifecycles

Page 5: Webinar: Schema Design

Theme #3:

MongoDB offers you choice

Page 6: Webinar: Schema Design

RDBMS MongoDB

Database Database

Table Collection

Index Index

Row Document

Join Embedding & Linking

Terminology

Page 7: Webinar: Schema Design

{

_id: “123”,

title: "MongoDB: The Definitive Guide",

authors: [

{ _id: "kchodorow", name: "Kristina Chodorow“ },

{ _id: "mdirold", name: “Mike Dirolf“ }

],

published_date: ISODate("2010-09-24"),

pages: 216,

language: "English",

thumbnail: BinData(0,"AREhMQ=="),

publisher: {

name: "O’Reilly Media",

founded: 1980,

locations: ["CA”, ”NY” ]

}

}

What is a Document?

Page 8: Webinar: Schema Design

// Java: maps

DBObject query = new BasicDBObject(”publisher.founded”, 1980));

Map m = collection.findOne(query);

Date pubDate = (Date)m.get(”published_date”); // java.util.Date

// Javascript: objects

m = collection.findOne({”publisher.founded” : 1980});

pubDate = m.published_date; // ISODate

year = pubDate.getUTCFullYear();

# Python: dictionaries

m = coll.find_one({”publisher.founded” : 1980 });

pubDate = m[”pubDate”].year # datetime.datetime

Documents Map to Language Constructs

Page 9: Webinar: Schema Design

9

Traditional Schema Design• Static, Uniform Scalar Data

• Rectangles

• Low-level, physical

representation

Page 10: Webinar: Schema Design

10

Document Schema Design• Flexible, Rich Shapes

• Objects

• Higher-level, business

representation

Page 11: Webinar: Schema Design

Schema Design By

Example

Page 12: Webinar: Schema Design

12

Library Management Application

• Patrons/Users

• Books

• Authors

• Publishers

Page 13: Webinar: Schema Design

13

Question:

What is a Patron’s Address?

Page 14: Webinar: Schema Design

A Patron and their Address

> patron = db.patrons.find({ _id : “joe” })

{

_id: "joe“,

name: "Joe Bookreader”,

favoriteGenres: [ ”mystery”, ”programming” ]

}

> address = db.addresses.find({ _id : “joe” })

{

_id: "joe“,

street: "123 Fake St.",

city: "Faketon",

state: "MA",

zip: 12345

}

Page 15: Webinar: Schema Design

A Patron and their Address

> patron = db.patrons.find({ _id : “joe” })

{

_id: "joe",

name: "Joe Bookreader",

favoriteGenres: [ ”mystery”, ”programming” ]

address: {

street: "123 Fake St. ",

city: "Faketon",

state: "MA",

zip: 12345

}

}

Page 16: Webinar: Schema Design

Projection: Return only what you need

> patron = db.patrons.find({ _id : “joe” },

{“_id”: 0, ”address”:1})

{

address: {

street: "123 Fake St. ",

city: "Faketon",

state: "MA",

zip: 12345

}

}

> patron = db.patrons.find({ _id : “joe” },

{“_id”: 0, “name”:1, ”address.state”:1})

{

name: "Joe Bookreader",

address: {

state: "MA”

}

}

Page 17: Webinar: Schema Design

17

One-to-One Relationships

• “Belongs to” relationships are often embedded

• Holistic representation of entities with their

embedded attributes and relationships.

• Great read performance

Most important:

• Keeps simple things simple

• Frees up time to tackle harder schema

design issues

Page 18: Webinar: Schema Design

18

Question:

What are a Patron’s Addresses?

Page 19: Webinar: Schema Design

A Patron and their Addresses

> patron = db.patrons.find({ _id : “bob” })

{

_id: “bob",

name: “Bob Knowitall",

addresses: [

{street: "1 Vernon St.", city: "Newton", …},

{street: "52 Main St.", city: "Boston", …}

]

}

Page 20: Webinar: Schema Design

A Patron and their Addresses

> patron = db.patrons.find({ _id : “bob” })

{

_id: “bob",

name: “Bob Knowitall",

addresses: [

{street: "1 Vernon St.", city: "Newton", …},

{street: "52 Main St.", city: "Boston", …}

]

}

> patron = db.patrons.find({ _id : “joe” })

{

_id: "joe",

name: "Joe Bookreader",

address: { street: "123 Fake St. ", city: "Faketon", …}

}

Page 21: Webinar: Schema Design

21

Migration Options

• Migrate all documents when the schema changes.

• Migrate On-Demand

– As we pull up a patron’s document, we make the

change.

– Any patrons that never come into the library never get

updated.

• Leave it alone

– The code layer knows about both address and

addresses

Page 22: Webinar: Schema Design

22

The Utility of Substructure

Map d = collection.find(new BasicDBObject(”_id”,”Bob”));

Map addr = (Map) d.get(”address”);

If(addr == null) {

List<Map> addrl = (List) d.get(”addresses”);

addr = addrl.get(0);

}

doSomethingWithOneAddress(addr);

/**

If later we add “region” to the address substructure, none

of the queries have to change! Another value will appear

in the Map (or not -- and that can be interrogated) and be

processed.

**/

Page 23: Webinar: Schema Design

23

Question:

Who is the publisher of this

book?

Page 24: Webinar: Schema Design

24

Book

• MongoDB: The Definitive Guide,

• By Kristina Chodorow and Mike Dirolf

• Published: 9/24/2010

• Pages: 216

• Language: English

• Publisher: O’Reilly Media, CA

Page 25: Webinar: Schema Design

Book with embedded Publisher

> book = db.books.find({ _id : “123” })

{

_id: “123”,

title: "MongoDB: The Definitive Guide",

authors: [ "Kristina Chodorow", "Mike Dirolf" ],

published_date: ISODate("2010-09-24"),

pages: 216,

language: "English",

publisher: {

name: "O’Reilly Media",

founded: 1980,

locations: ["CA”, ”NY” ]

}

}

Page 26: Webinar: Schema Design

Don’t Forget the Substructure!

> book = db.books.find({ _id : “123” })

{

_id: “123”,

title: "MongoDB: The Definitive Guide",

authors: [

{ first: "Kristina”, last: “Chodorow” },

{ first: ”Mike”, last: “Dirolf” }

],

published_date: ISODate("2010-09-24"),

pages: 216,

language: "English",

publisher: {

name: "O’Reilly Media",

founded: 1980,

locations: ["CA”, ”NY” ]

}

}

Page 27: Webinar: Schema Design

27

Book with embedded Publisher

• Optimized for read performance of Books

• We accept data duplication

• An index on “publisher.name” provides:

– Efficient lookup of all books for given publisher name

– Efficient way to find all publisher names (distinct)

Page 28: Webinar: Schema Design

Publishers as a Separate Entity

> publishers = db.publishers.find()

{

_id: “oreilly”,

name: "O’Reilly Media",

founded: 1980,

locations: ["CA”, ”NY” ]

}

{

_id: “penguin”,

name: “Penguin”,

founded: 1983,

locations: [ ”IL” ]

}

Page 29: Webinar: Schema Design

Book with Linked Publisher

> book = db.books.find({ _id: “123” })

{

_id: “123”,

publisher_id: “oreilly”,

title: "MongoDB: The Definitive Guide",

}

> db.publishers.find({ _id : book.publisher_id })

{

_id: “oreilly”,

name: "O’Reilly Media",

founded: 1980,

locations: ["CA”, ”NY” ]

}

Page 30: Webinar: Schema Design

Books with Linked Publisher

db.books.find({ criteria } ).forEach(function(r) {

m[r.publisher.name] = 1; // Capture publisher ID

});

uniqueIDs = Object.keys(m);

cursor = db.publishers.find({"_id": {"$in": uniqueIDs } });

Page 31: Webinar: Schema Design

31

Question:

What are all the books a

publisher has published?

Page 32: Webinar: Schema Design

Publisher with linked Books

> publisher = db.publishers.find({ _id : “oreilly” })

{

_id: “oreilly”,

name: "O’Reilly Media",

founded: 1980,

locations: [ "CA“, ”NY” ],

books: [“123”, “456”, “789”, “10112”, …]

}

> books = db.books.find({ _id: { $in : publisher.books } })

Page 33: Webinar: Schema Design

33

Question:

Who are the authors of a given

book?

Page 34: Webinar: Schema Design

Books with linked Authors

> book = db.books.find({ _id : “123” })

{

_id: “123”,

title: "MongoDB: The Definitive Guide",

authors: [

{ _id: “X12”, first: "Kristina”, last: “Chodorow” },

{ _id: “Y45”, first: ”Mike”, last: “Dirolf” }

],

}

> a2 = book.authors.map(function(r) { return r._id; });

> authors = db.authors.find({ _id : { $in : a2}})

{_id:”X12”,name:{first:"Kristina”,last:”Chodorow”},hometown: …

}

{_id:“Y45”,name:{first:”Mike”,last:”Dirolf”}, hometown: … }

Page 35: Webinar: Schema Design

35

Question:

What are all the books an author

has written?

Page 36: Webinar: Schema Design

> authors = db.authors.find({ _id : “X12” })

{

_id: ”X12",

name: { first: "Kristina”, last: “Chodorow” } ,

hometown: "Cincinnati",

books: [ {id: “123”, title : "MongoDB: The Definitive

Guide“ } ]

}

> book = db.books.find({ _id : “123” })

{

_id: “123”,

title: "MongoDB: The Definitive Guide",

authors: [

{ _id: “X12”, first: "Kristina”, last: “Chodorow” },

{ _id: “Y45”, first: ”Mike”, last: “Dirolf” }

],

}

Double Link Books and Authors

Page 37: Webinar: Schema Design

> book = db.books.find({ _id : “123” })

{

authors: [

{ _id: “X12”, first: "Kristina”, last: “Chodorow” },

{ _id: “Y45”, first: ”Mike”, last: “Dirolf” }

],

}

> db.books.ensureIndex({“authors._id”: 1});

> db.books.find({ “authors._id” : “X12” }).explain();

{

"cursor" : "BtreeCursor authors.id_1",

"millis" : 0,

}

…or Use Indexes

Page 38: Webinar: Schema Design

38

Embedding vs. Linking

• Embedding

– Terrific for read performance

• Webapp “front pages” and pre-aggregated material

• Complex structures

– Inserts might be slower than linking

– Data integrity needs to be managed

• Linking

– Flexible

– Data integrity is built-in

– Work is done during reads

• But not necessarily more work than RDBMS

Page 39: Webinar: Schema Design

39

Question:

What are the personalized

attributes for each author?

Page 40: Webinar: Schema Design

> db.authors.find()

{

_id: ”X12",

name: { first: "Kristina”, last: “Chodorow” },

personalData: {

favoritePets: [ “bird”, “dog” ],

awards: [ {name: “Hugo”, when: 1983}, {name: “SSFX”,

when: 1992} ]

}

}

{

_id: ”Y45",

name: { first: ”Mike”, last: “Dirolf” } ,

personalData: {

dob: ISODate(“1970-04-05”)

}

}

Assign Dynamic Structure to a Known Name

Page 41: Webinar: Schema Design

> db.events.find()

{ type: ”click", ts: ISODate(“2015-03-03T12:34:56.789Z”,

data: { x: 123, y: 625, adId: “AE23A” } }

{ type: ”click", ts: ISODate(“2015-03-03T12:35:01.003Z”,

data: { x: 456, y: 611, adId: “FA213” } }

{ type: ”view", ts: ISODate(“2015-03-03T12:35:04.102Z”,

data: { scn: 2, reset: false, … } }

{ type: ”click", ts: ISODate(“2015-03-03T12:35:05.312Z”,

data: { x: 23, y: 32, adId: “BB512” } }

{ type: ”close", ts: ISODate(“2015-03-03T12:35:08.774Z”,

data: { snc: 2, … } }

{ type: ”click", ts: ISODate(“2015-03-03T12:35:10.114Z”,

data: { x: 881, y: 913, adId: “F430” } }

Polymorphism: Worth an Extra Slide

Page 42: Webinar: Schema Design

42

Question:

What are all the books about

databases?

Page 43: Webinar: Schema Design

Categories as an Array

> book = db.books.find({ _id : “123” })

{

_id: “123”,

title: "MongoDB: The Definitive Guide",

categories: [“MongoDB”, “Databases”, “Programming”]

}

> db.books.find({ categories: “Databases” })

Page 44: Webinar: Schema Design

Categories as a Path

> book = db.books.find({ _id : “123” })

{

_id: “123”,

title: "MongoDB: The Definitive Guide",

category: “Programming/Databases/MongoDB”

}

> db.books.find({ category: ^Programming/Databases/* })

Page 45: Webinar: Schema Design

45

Summary

• Schema design is different in MongoDB

– But basic data design principles stay the same

• Focus on how an application accesses/manipulates data

• Seek out and capture belongs-to 1:1 relationships

• Use substructure to better align to code objects

• Be polymorphic!

• Evolve the schema to meet requirements as they change

Page 46: Webinar: Schema Design

Questions & Answers