57
MongoDB Schema Design Patterns Jumpstart Session @SigNarvaez

Webinar: MongoDB Schema Design and Performance Implications

  • Upload
    mongodb

  • View
    1.727

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Webinar: MongoDB Schema Design and Performance Implications

MongoDB Schema Design PatternsJumpstart Session

@SigNarvaez

Page 2: Webinar: MongoDB Schema Design and Performance Implications

Sigfrido ”Sig” NarváezSr. Solutions Architect, [email protected]@SigNarvaez

Page 3: Webinar: MongoDB Schema Design and Performance Implications

Agenda

Medical Record Example01 Modeling

Relationships03Schema Design: MongoDB vs. Relational

02

Performance04 SummaryQ&A06What’s new

with 3.205

Page 4: Webinar: MongoDB Schema Design and Performance Implications

Medical Record Example

Page 5: Webinar: MongoDB Schema Design and Performance Implications

Medical Records• Collects all patient information in a central repository• Provide central point of access for

• Patients• Care providers: physicians, nurses, etc.• Billing• Insurance reconciliation

• Hospitals, physicians, patients, procedures, records

PatientRecords

Medications

Lab Results

Procedures

Hospital Records

Physicians

Patients

Nurses

Billing

Page 6: Webinar: MongoDB Schema Design and Performance Implications

Medical Record Data• Hospitals

• have physicians

• Physicians• Have patients• Perform procedures• Belong to hospitals

• Patients• Have physicians• Are the subject of procedures

• Procedures• Associated with a patient• Associated with a physician• Have a record• Variable meta data

• Records• Associated with a procedure• Binary data• Variable fields

Page 7: Webinar: MongoDB Schema Design and Performance Implications

Lot of Variability

Page 8: Webinar: MongoDB Schema Design and Performance Implications

Schema Design: MongoDB vs. Relational

Page 9: Webinar: MongoDB Schema Design and Performance Implications

MongoDB Relational

Collections Tables

Documents Rows

Data Use Data Storage

What questions do I have? What answers do I have?

MongoDB vs. Relational

Page 10: Webinar: MongoDB Schema Design and Performance Implications

Attribute MongoDB Relational

Storage N-dimensional Two-dimensional

Field Values 0, 1, many, or embed Single value

Query Any field, at any level Any field

Schema Flexible Very structured

MongoDB vs. Relational

Page 11: Webinar: MongoDB Schema Design and Performance Implications

Complex Normalized Schemas

Page 12: Webinar: MongoDB Schema Design and Performance Implications

Complex Normalized Schemas

Page 13: Webinar: MongoDB Schema Design and Performance Implications

Documents are Rich Data Structures{ first_name: ‘Paul’, last_name: ‘Miller’, cell: 1234567890, city: ‘London’, location: [45.123,47.232], professions: [‘banking’, ‘finance’, ‘trader’], physicians: [ { name: ‘Canelo Álvarez, M.D.’, last_visit: ‘Del Carmen Hospital’, last_visit_dt: ‘20160501’, … }, { name: ‘Érik Morales, M.D.’, last_visit: ‘Del Prado Hospital’, last_visit_dt: ‘20160302’, … } ]}

Fields can contain an array of sub-documents

Fields

Strongly Typed field values

Fields can contain arrays

String

Number

Geo-Coordinates

Fields can be indexed and queried at any level

ORM Layer removed – Data is already an object!

Page 14: Webinar: MongoDB Schema Design and Performance Implications

Modeling Relationships

Page 15: Webinar: MongoDB Schema Design and Performance Implications

1-1

Page 17: Webinar: MongoDB Schema Design and Performance Implications

Procedure• patient• date• type• physician• type

Results• dataType• size• content:

{…}

Use two collections with a

reference field – “relational”

Procedure• patient• date• type• results

• equipmentId• data1• data2

• physician

• Results• type• size• content:

{…}

Embedding

Document Schema

Referencing

Page 18: Webinar: MongoDB Schema Design and Performance Implications

ReferencingProcedure{ "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result_id" : 134}

Results{ “_id” : 134 "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }

Page 19: Webinar: MongoDB Schema Design and Performance Implications

Embedding Procedure{ "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }}

Page 20: Webinar: MongoDB Schema Design and Performance Implications

Embedding

• Advantages• Retrieve all relevant information in a single query/document• Avoid implementing joins in application code• Update related information as a single atomic operation

• MongoDB doesn’t offer multi-document transactions

• Limitations• Large documents mean more overhead if most fields are not

relevant• 16 MB document size limit

Page 21: Webinar: MongoDB Schema Design and Performance Implications

Atomicity

• Document operations are atomicdb.patients.update({_id: 12345}, { $inc : { numProcedures : 1 }, $push : { procedures : “proc123” }, $set : { addr.state : “TX” }})

• No multi-document transactions

db.beginTransaction();

db.patients.update({_id: 12345}, …);db.procedure.insert({_id: “proc123”, …});db.records.insert({_id: “rec123”, …});

db.endTransaction();

Page 22: Webinar: MongoDB Schema Design and Performance Implications

Embedding

• Advantages• Retrieve all relevant information in a single query/document• Avoid implementing joins in application code• Update related information as a single atomic operation

• MongoDB doesn’t offer multi-document transactions

• Limitations• Large documents mean more overhead if most fields are not

relevant• 16 MB document size limit

Page 23: Webinar: MongoDB Schema Design and Performance Implications

Referencing

• Advantages• Smaller documents• Less likely to reach 16 MB document limit• Infrequently accessed information not accessed on every query• No duplication of data

• Limitations• Two queries required to retrieve information• Cannot update related information atomically

Page 24: Webinar: MongoDB Schema Design and Performance Implications

1-1: General Recommendations• Embed

• No additional data duplication• Can query or index on

embedded field• e.g., “result.type”

• Exceptional cases…• Embedding results in large

documents• Set of infrequently access

fields

{"_id": 333,"date": "2003-02-09T05:00:00","hospital": "County Hills","patient": "John Doe","physician": "Stephen Smith","type": "Chest X - ray","result": {

"type": "txt","size": 12,"content": {

"value1": 343,"value2": "abc"

}}

}

Page 25: Webinar: MongoDB Schema Design and Performance Implications

1-M

Page 26: Webinar: MongoDB Schema Design and Performance Implications

{ _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”,

…}, { id: 12346, date: 2015-02-15, type: “blood test”,

…}]}

Pat

ient

s

Embed

1-MModeled in 2 possible ways

{ _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [12345, 12346]}

{ _id: 12345, date: 2015-02-15, type: “Cat scan”, …} { _id: 12346, date: 2015-02-15, type: “blood test”, …}

Pat

ient

s

Reference

Pro

cedu

res

Page 27: Webinar: MongoDB Schema Design and Performance Implications

1-M : General Recommendations

• Embed, when possible• Many are weak entities• Access all information in a single query• Take advantage of update atomicity• No additional data duplication• Can query or index on any field

• e.g., { “phones.type”: “mobile” }

• Exceptional cases:• 16 MB document size• Large number of infrequently accessed fields

{ _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”,

…}, { id: 12346, date: 2015-02-15, type: “blood test”,

…}]}

Page 28: Webinar: MongoDB Schema Design and Performance Implications

M-M

Page 29: Webinar: MongoDB Schema Design and Performance Implications

M-M Traditional Relational Association

Join table Physiciansnamespecialtyphone

Hospitalsname

HosPhysicanRelhospitalIdphysicianIdXUse arrays instead

Page 30: Webinar: MongoDB Schema Design and Performance Implications

{ _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [ { id: 12345, name: “Joe Doctor”, address: {…},

…}, { id: 12346, name: “Mary Well”, address: {…},

…}]}

M-MEmbedding Physicians in Hospitals collection

{ _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [ { id: 63633, name: “Harold Green”, address: {…},

…}, { id: 12345, name: “Joe Doctor”, address: {…},

…}]}

Data Duplication…

is ok!

Page 31: Webinar: MongoDB Schema Design and Performance Implications

{ _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]}

M-MReferencing

{ id: 63633, name: “Harold Green”, hospitals: [1,2], …}

Hospitals

{ _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [63633, 12345]}

Physicians

{ id: 12345, name: “Joe Doctor”, hospitals: [1], …}

{ id: 12346, name: “Mary Well”, hospitals: [1,2], …}

Page 32: Webinar: MongoDB Schema Design and Performance Implications

M-M : General Recommendation• Use case determines whether to reference or embed:

1. Data Duplication• Embedding may result in data

duplication• Duplication may be okay if reads

dominate updates• Of the two, which one changes the

least?2. Referencing may be required if many

related items3. Hybrid approach

• Potentially do both .. It’s ok!

{ _id: 2, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]}

{ _id: 12345, name: “Joe Doctor”, address: {…}, …} { _id: 12346, name: “Mary Well”, address: {…}, …}

Hos

pita

ls

Reference

Phy

sici

ans

Page 33: Webinar: MongoDB Schema Design and Performance Implications

Performance

Page 34: Webinar: MongoDB Schema Design and Performance Implications

Example 1: Hybrid ApproachEmbed and Reference

Page 35: Webinar: MongoDB Schema Design and Performance Implications

Healthcare Example

patients

procedures

Page 36: Webinar: MongoDB Schema Design and Performance Implications

Tailor Schema to Queries

{ "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : ["551ac”, “343fs”]}

{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}

Patient Procedure

Find all patients from NH that have had chest x-rays

Page 37: Webinar: MongoDB Schema Design and Performance Implications

Tailor Schema to Queries (cont.){ "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}]}

{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}

Patient Procedure

Find all patients from NH that have had chest x-rays

3.2’s $lookup!!(left-outer

join)

Page 38: Webinar: MongoDB Schema Design and Performance Implications

Example 2: Time Series DataMedical Devices

Page 39: Webinar: MongoDB Schema Design and Performance Implications

Vital Sign Monitoring DeviceVital Signs Measured:• Blood Pressure• Pulse• Blood Oxygen Levels

Produces data at regular intervals• Once per minute • Many Devices, Many Hospitals

Page 40: Webinar: MongoDB Schema Design and Performance Implications

Data From Vital Signs Monitoring Device

{ deviceId: 123456, ts: ISODate("2013-10-16T22:07:00.000-0500"), spO2: 88, pulse: 74, bp: [128, 80]}

• One document x minute x device• Relational approach

Page 41: Webinar: MongoDB Schema Design and Performance Implications

Document Per Hour (By minute)

{ deviceId: 123456, ts: ISODate("2013-10-16T22:00:00.000-0500"), spO2: { 0: 88, 1: 90, …, 59: 92}, pulse: { 0: 74, 1: 76, …, 59: 72}, bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]}} • 1 document x device x hour

• Store per-minute data at the hourly level

• Update-driven workload

Page 42: Webinar: MongoDB Schema Design and Performance Implications

Characterizing Write Differences

• Example: data generated every minute• Recording the data for 1 patient for 1 hour:

Document Per Event60 inserts

Document Per Hour1 insert, 59 updates

Page 43: Webinar: MongoDB Schema Design and Performance Implications

Characterizing Read Differences

• Want to graph 24 hour of vital signs for a patient:

• Read performance is greatly improved

Document Per Event 1440 reads

Document Per Hour24 reads

Page 44: Webinar: MongoDB Schema Design and Performance Implications

Characterizing Memory and Storage Differences

Document Per Minute Document Per HourNumber Documents 52.6 Billion 876 Million

Total Index Size 6,364 GB 106 GB_id index 1,468 GB 24.5 GB{ts: 1, deviceId: 1} 4,895 GB 81.6 GB

Document Size 92 Bytes 758 BytesDatabase Size 4,503 GB 618 GB

• 100K Devices • 1 years worth of data, at second resolution (365 x 24 x 60)

Page 45: Webinar: MongoDB Schema Design and Performance Implications

MongoDB 3.2

Page 46: Webinar: MongoDB Schema Design and Performance Implications

MongoDB 3.2 – a GIANT Release

Hash-Based ShardingRolesKerberosOn-Prem Monitoring

2.2 2.4 2.6 3.0 3.2

Agg. FrameworkLocation-Aware Sharding

$outIndex IntersectionText SearchField-Level RedactionLDAP & x509Auditing

Document ValidationFast FailoverSimpler ScalabilityAggregation ++Encryption At RestIn-Memory Storage EngineBI Connector$lookupMongoDB CompassAPM IntegrationProfiler VisualizationAuto Index BuildsBackups to File System

Doc-Level ConcurrencyCompressionStorage Engine API≤50 replicasAuditing ++Ops Manager

Page 47: Webinar: MongoDB Schema Design and Performance Implications

Tools• mgenerate

• Part of mtools: https://github.com/rueckstiess/mtools/wiki/mgenerate

• Model schema using json definition

• Generate Millions of documents with random data

• How well does the schema work?• Queries, Indexes, Data Size, Index Size, Replication

• Demo

Page 48: Webinar: MongoDB Schema Design and Performance Implications

Documents are Rich Data Structures{ first_name: ‘Paul’, last_name: ‘Miller’, cell: 1234567890, city: ‘London’, location: [45.123,47.232], professions: [‘banking’, ‘finance’, ‘trader’], physicians: [ { name: ‘Canelo Álvarez, M.D.’, last_visit: ‘Mission Hospital’, last_visit_dt: ‘20160501’, … }, { name: ‘Érik Morales, M.D.’, last_visit: ‘Del Prado Hospital’, last_visit_dt: ‘20160302’, … } ]}

Fields can contain an array of sub-documents

Fields

Typed field values

Fields can contain arrays

String

Number

Geo-Coordinates

Fields can be indexed and queried at any level

ORM Layer removed – Data is already an object!

Page 49: Webinar: MongoDB Schema Design and Performance Implications

Schema using mgenerate{ "first_name" : { "$string" : { "length" : 30 }}, "last_name" : { "$string" : { "length" : 30 }}, "cell" : "$number", "city" : { "$string" : { "length" : 30 }}, "location" : [ "$number", "$number"], "professions" : { "$array" : [ {

"$choose" : [ "banking", "finance", "trader" ] }, { "$number": [1, 3] }

] }, "physicians" : { "$array" : [ { "name" : { "$string" : { "length" : 30 }}, "last_visit" : { "$string" : { "length" : 30 }}, "last_visit_dt" : "$datetime" }, { "$number" : [1, 5]} ] }}

> mgenerate --host localhost --port 27017 -d webinar -c patients --drop -n 100 patients.json

Page 50: Webinar: MongoDB Schema Design and Performance Implications

Use Compass to visualize & query data!

Page 51: Webinar: MongoDB Schema Design and Performance Implications

Visual Query ProfilerIdentify your slow-running queries with the click of a button

Index SuggestionsIndex recommendations to improve your deployment

&

Page 52: Webinar: MongoDB Schema Design and Performance Implications

MongoDB 3.2 $lookup{ "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}]}

{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}

Patient Procedure

Find all patients from NH that have had chest x-rays

3.2’s $lookup!!(left-outer

join)

Page 53: Webinar: MongoDB Schema Design and Performance Implications

MongoDB 3.2 $lookup

{ "_id": 593340651,"first": "Gregorio","last": "Lang","addr": {

"street": "623 Flowers Rd","city": "Groton",

"state": "NH","zip": 3266 },

"physicians": [10387, 33456],"procedures": ["551ac", "343fs"]}

{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}

Patient Procedure

Obtain Patient view with Procedure details, but

without Physicians

Page 54: Webinar: MongoDB Schema Design and Performance Implications

MongoDB 3.2 $lookupdb.PatientsColl.aggregate([ { "$match" : { "_id": 593340651 }}, { "$unwind" : "$procedures"}, { "$lookup" : { "from" : "ProceduresColl", "localField" : "procedures", "foreignField": "_id", "as" : "procs" }}, { "$unwind" : "$procs" }, { "$group" : { "_id" : { "_id" : "$_id", "first" : "$first", "last" : "$last", "addr" : "$addr" }, "procedures" : { "$push" : "$procs"} } }, { "$project" : { "_id" : "$_id._id", "first" : "$_id.first", "last" : "$_id.last", "addr" : "$_id.addr", "procedures._id" : 1, "procedures.type" : 1, "procedures.date" : 1 }}]);https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

{"_id": 593340651,"first": "Gregorio","last": "Lang","addr": {

"street": "623 Flowers Rd",

"city": "Groton","state": "NH","zip": 3266

},"procedures": [{

"_id": "551ac",

"date": "2000-04-26",

"type": "Chest X-ray"

}, {"_id":

"343fs","date":

"2000-04-26","type":

"Blood Test"}]

}

Obtain Patient view with Procedure details, but

without Physicians

Page 55: Webinar: MongoDB Schema Design and Performance Implications

MongoDB 3.2 Document Validation

db.runCommand( {collMod: "Patients", validator: { $and: [

{ "first_name": { "$type": "string" }},

{ "last_name": { "$type": "string"}}, { "physicians": { "$type": "array"}}

] }, validationLevel: "strict"

});

https://docs.mongodb.com/manual/core/document-validation/

All Patient records must have alphanumeric data for the first and last name, and a list of Physicians

Page 56: Webinar: MongoDB Schema Design and Performance Implications

Summary

Embedding and Referencing01

Context of Application Data and Query Workload

Decisions031-1 : Embed1-M : Embed when possible

M-M : Hybrid

02

Different schemas may result in dramatically different query performance, data/index size and hardware requirements!

Iterate04$lookupDocument Validation

3.206Measure data/index size, query performance- mgenerate/mtools- Compass- Cloud Manager / Ops Manager

Tools!05

Page 57: Webinar: MongoDB Schema Design and Performance Implications

Q&ASigfrido Narváez

Sr. Solutions Architect, MongoDB