36
Schema Design By Example Schema Design By Example Emily Stolfo - twitter: @EmStolfo @EmStolfo

Schema Design By Example

  • Upload
    mongodb

  • View
    40.184

  • Download
    0

Embed Size (px)

DESCRIPTION

One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. Understandably, this begets good questions...

Citation preview

Page 1: Schema Design By Example

Schema Design By ExampleSchema Design By Example

Emily Stolfo - twitter: @EmStolfo@EmStolfo

Page 2: Schema Design By Example

Differences with Traditional Schema DesignDifferences with Traditional Schema DesignTraditional MongoDBYour application doesn'tmatter.

It's always about your application.

Only About the Data Not Only About the Data, but also howit's used.

Only relevant duringdesign.

Relevant during the lifetime of yourapplication.

Page 3: Schema Design By Example

Schema Design is EvolutionarySchema Design is EvolutionaryDesign and Development.Deployment and Monitoring.Iterative Modification.

Page 4: Schema Design By Example

3 Components to Schema Design3 Components to Schema Design1. The Data Your Application Needs.2. How Your Application Will Read the Data.3. How Your Application Will Write the Data.

Page 5: Schema Design By Example

What is a Document Database?What is a Document Database?Storage in json.Application-defined schemaOpportunity for choice

Page 6: Schema Design By Example

Schema - AuthorSchema - Authorauthor = { "_id": int, "first_name": string, "last_name": string}

Page 7: Schema Design By Example

Schema - UserSchema - Useruser = { "_id": int, "username": string, "password": string}

Page 8: Schema Design By Example

Schema - BookSchema - Bookbook = { "_id": int, "title": string, "author": int, "isbn": string, "slug": string, "publisher": { "name": string, "date": timestamp, "city": string }, "available": boolean, "pages": int, "summary": string, "subjects": [string, string] "notes": [{ "user": int, "note": string }], "language": string}

Page 9: Schema Design By Example

Example: Library ApplicationExample: Library ApplicationAn application for saving a collection of books.

Page 10: Schema Design By Example
Page 11: Schema Design By Example
Page 12: Schema Design By Example

Some Sample DocumentsSome Sample Documents> db.authors.findOne(){ _id: 1, first_name: "F. Scott", last_name: "Fitgerald"}

> db.users.findOne(){ _id: 1, username: "[email protected]", password: "slsjfk4odk84k209dlkdj90009283d"}

Page 13: Schema Design By Example

> db.books.findOne(){ _id: 1, title: "The Great Gatsby", slug: "9781857150193-the-great-gatsby", author: 1, available: true, isbn: "9781857150193", pages: 176, publisher: { publisher_name: "Everyman’s Library", date: ISODate("1991-09-19T00:00:00Z"), publisher_city: "London" }, summary: "Set in the post-Great War...", subjects: ["Love stories", "1920s", "Jazz Age"], notes: [ { user: 1, note: "One of the best..."}, { user: 2, note: "It’s hard to..."} ], language: "English"}

Page 14: Schema Design By Example

Design and Development.Design and Development.

Page 15: Schema Design By Example

The Data Our Library NeedsThe Data Our Library NeedsEmbedded Documents

Embedded Arrays

Embedded Arrays of Documents

book.publisher: { publisher_name: "Everyman’s Library", date: ISODate("1991-09-19T00:00:00Z"), city: "London"}

book.subjects: ["Love stories", "1920s", "Jazz Age"],

book.notes: [ { user: 1, note: "One of the best..."}, { user: 2, note: "It’s hard to..."}],

Page 16: Schema Design By Example

How Our Library Reads Data.How Our Library Reads Data.Query for all the books by a specific authorQuery for all the books by a specific author

> author = db.authors.findOne({first_name: "F. Scott", last_name: "Fitzgerald"});

> db.books.find({author: author._id}){ ... }{ ...}

Page 17: Schema Design By Example

Query for books by titleQuery for books by title

> db.books.find({title: "The Great Gatsby"}){ ... }

Page 18: Schema Design By Example

Query for books in which I have made notesQuery for books in which I have made notes

> db.books.find({notes.user: 1}){ ... }{ ...}

Page 19: Schema Design By Example

How Our Library Writes Data.How Our Library Writes Data.Add notes to a bookAdd notes to a book

> note = { user: 1, note: "I did NOT like this book." }

> db.books.update({ _id: 1 }, { $push: { notes: note }})

Page 20: Schema Design By Example

Take Advantage of Atomic OperationsTake Advantage of Atomic Operations$set - set a value$unset - unsets a value$inc - increment an integer$push - append a value to an array$pushAll - append several values to an array$pull - remove a value from an array$pullAll - remove several values from an array$bit - bitwise operations$addToSet - adds a value to a set if it doesn't already exist$rename - renames a field

Page 21: Schema Design By Example

Deployment and MonitoringDeployment and Monitoring

Page 22: Schema Design By Example

How Our Library Reads DataHow Our Library Reads DataQuery for books by an author's first nameQuery for books by an author's first name

> authors = db.authors.find( { first_name: /^f.*/i }, {_id: 1})

> authorIds = authors.map(function(x) { return x._id; })

> db.books.find({author: { $in: authorIds } }){ ... }{ ...}

Page 23: Schema Design By Example

Documents Should Reflect Query PatternsDocuments Should Reflect Query PatternsPartially Embedded Documents

Query for books by an author's first name using an embeddeddocument with a denormalized name.

book = { "_id": int, "title": string, "author": { "author": int, "name": string }, ...}

> db.books.find({author.name: /^f.*/i }){ ... }{...}

Page 24: Schema Design By Example

How Our Library Reads DataHow Our Library Reads DataQuery for a book's notes and get user infoQuery for a book's notes and get user info

> db.books.find({title: "The Great Gatsby"}, {_notes: 1}){ _id: 1, notes: [ { user: 1, note: "One of the best..."}, { user: 2, note: "It’s hard to..."} ]}

Page 25: Schema Design By Example

Take Advantage of Immutable DataTake Advantage of Immutable DataUsername is the natural key and is immutable

Query for a book's notes

user = { "_id": string, "password": string}book = { //... "notes": [{ "user": string, "note": string }],}

> db.books.find({title: "The Great Gatsby"}, {notes: 1}){ _id: 1, notes: [ { user: "[email protected]", note: "One of the best..."}, { user: "[email protected]", note: "It’s hard to..."} ]}

Page 26: Schema Design By Example

How Our Library Writes DataHow Our Library Writes DataAdd notes to a bookAdd notes to a book

> note = { user: "[email protected]", note: "I did NOT like this book." }

> db.books.update({ _id: 1 }, { $push: { notes: note }})

Page 27: Schema Design By Example

Anti-pattern: Continually Growing DocumentsAnti-pattern: Continually Growing DocumentsDocument Size LimitStorage Fragmentation

Page 28: Schema Design By Example

Linking: One-to-One relationshipLinking: One-to-One relationshipMove notes to another document, with one document perbook id.

Keeps the books document size consistent.Queries for books don't return all the notes.Still suffers from a continually growing bookNotes document.

book = { "_id": int, ... // remove the notes field}

bookNotes = { "_id": int, // this will be the same as the book id... "notes": [{ "user": string, "note": string }]}

Page 29: Schema Design By Example

Linking: One-to-Many relationshipLinking: One-to-Many relationshipMove notes to another document with one document per note.

Keeps the book document size consistent.Queries for books don't return all the notes.Notes not necessarily near each other on disk. Possibly slowreads.

book = { "_id": int, ... // remove the notes field}

bookNotes = { "_id": int, "book": int, // this will be the same as the book id... "date": timestamp, "user": string, "note": string}

Page 30: Schema Design By Example

BucketingBucketingbookNotes contains a maximum number of documents

Still use atomic operations to update or create a document

bookNotes = { "_id": int, "book": int, // this is the book id "note_count": int, "last_changed": datetime, "notes": [{ "user": string, "note": string }]}

> note = { user: "[email protected]", note: "I did NOT like this book." }> db.bookNotes.update({ { book: 1, note_count { $lt: 10 } }, { $inc: { note_count: 1 }, $push { notes: note }, $set: { last_changed: new Date() } }, true // upsert})

Page 31: Schema Design By Example

Interative ModificationInterative Modification

Page 32: Schema Design By Example

The Data Our Library NeedsThe Data Our Library NeedsUsers want to comment on other people's notesUsers want to comment on other people's notes

bookNotes = { "_id": int, "book": int, // this is the book id "note_count": int, "last_changed": datetime, "notes": [{ "user": string, "note": string, "comments": [{ "user": string, "text": string, "replies": [{ "user": string, "text": string, "replies": [{...}] }] }] }]}

Page 33: Schema Design By Example

What benefits does this provide?What benefits does this provide?Single document for all comments on a note.Single location on disk for the whole tree.Legible tree structure.

What are the drawbacks of storing a tree in this way?What are the drawbacks of storing a tree in this way?Difficult to search.Difficult to get back partial results.Document can get large very quickly.

Page 34: Schema Design By Example

Alternative solution: Alternative solution: Store arrays of ancestorsStore arrays of ancestors

How would you do it?How would you do it?

> t = db.mytree;

> t.find(){ _id: "a" }{ _id: "b", ancestors: [ "a" ], parent: "a" }{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }{ _id: "e", ancestors: [ "a" ], parent: "a" }{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }

Page 35: Schema Design By Example

General notes for read-heavy schemasGeneral notes for read-heavy schemasThink about indexes and shardingThink about indexes and sharding

Multi-key indexes.Secondary indexes.Shard key choice.

Page 36: Schema Design By Example

Schema design in mongoDBSchema design in mongoDB

It's about your application.It's about your application.It's about your data and how it's used.It's about your data and how it's used.It's about the entire lifetime of your application.It's about the entire lifetime of your application.