Upload
mongodb
View
1.727
Download
3
Embed Size (px)
Citation preview
MongoDB Schema Design PatternsJumpstart Session
@SigNarvaez
Sigfrido ”Sig” NarváezSr. Solutions Architect, [email protected]@SigNarvaez
Agenda
Medical Record Example01 Modeling
Relationships03Schema Design: MongoDB vs. Relational
02
Performance04 SummaryQ&A06What’s new
with 3.205
Medical Record Example
Medical Records• Collects all patient information in a central repository• Provide central point of access for
• Patients• Care providers: physicians, nurses, etc.• Billing• Insurance reconciliation
• Hospitals, physicians, patients, procedures, records
PatientRecords
Medications
Lab Results
Procedures
Hospital Records
Physicians
Patients
Nurses
Billing
Medical Record Data• Hospitals
• have physicians
• Physicians• Have patients• Perform procedures• Belong to hospitals
• Patients• Have physicians• Are the subject of procedures
• Procedures• Associated with a patient• Associated with a physician• Have a record• Variable meta data
• Records• Associated with a procedure• Binary data• Variable fields
Lot of Variability
Schema Design: MongoDB vs. Relational
MongoDB Relational
Collections Tables
Documents Rows
Data Use Data Storage
What questions do I have? What answers do I have?
MongoDB vs. Relational
Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values 0, 1, many, or embed Single value
Query Any field, at any level Any field
Schema Flexible Very structured
MongoDB vs. Relational
Complex Normalized Schemas
Complex Normalized Schemas
Documents are Rich Data Structures{ first_name: ‘Paul’, last_name: ‘Miller’, cell: 1234567890, city: ‘London’, location: [45.123,47.232], professions: [‘banking’, ‘finance’, ‘trader’], physicians: [ { name: ‘Canelo Álvarez, M.D.’, last_visit: ‘Del Carmen Hospital’, last_visit_dt: ‘20160501’, … }, { name: ‘Érik Morales, M.D.’, last_visit: ‘Del Prado Hospital’, last_visit_dt: ‘20160302’, … } ]}
Fields can contain an array of sub-documents
Fields
Strongly Typed field values
Fields can contain arrays
String
Number
Geo-Coordinates
Fields can be indexed and queried at any level
ORM Layer removed – Data is already an object!
Modeling Relationships
1-1
Referencing & Embedding
https://docs.mongodb.com/manual/core/data-modeling-introduction/
Procedure• patient• date• type• physician• type
Results• dataType• size• content:
{…}
Use two collections with a
reference field – “relational”
Procedure• patient• date• type• results
• equipmentId• data1• data2
• physician
• Results• type• size• content:
{…}
Embedding
Document Schema
Referencing
ReferencingProcedure{ "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result_id" : 134}
Results{ “_id” : 134 "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }
Embedding Procedure{ "_id" : 333, "date" : "2003-02-09T05:00:00"), "hospital" : “County Hills”, "patient" : “John Doe”, "physician" : “Stephen Smith”, "type" : ”Chest X-ray", ”result" : { "type" : "txt", "size" : NumberInt(12), "content" : { value1: 343, value2: “abc”, … } }}
Embedding
• Advantages• Retrieve all relevant information in a single query/document• Avoid implementing joins in application code• Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations• Large documents mean more overhead if most fields are not
relevant• 16 MB document size limit
Atomicity
• Document operations are atomicdb.patients.update({_id: 12345}, { $inc : { numProcedures : 1 }, $push : { procedures : “proc123” }, $set : { addr.state : “TX” }})
• No multi-document transactions
db.beginTransaction();
db.patients.update({_id: 12345}, …);db.procedure.insert({_id: “proc123”, …});db.records.insert({_id: “rec123”, …});
db.endTransaction();
Embedding
• Advantages• Retrieve all relevant information in a single query/document• Avoid implementing joins in application code• Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations• Large documents mean more overhead if most fields are not
relevant• 16 MB document size limit
Referencing
• Advantages• Smaller documents• Less likely to reach 16 MB document limit• Infrequently accessed information not accessed on every query• No duplication of data
• Limitations• Two queries required to retrieve information• Cannot update related information atomically
1-1: General Recommendations• Embed
• No additional data duplication• Can query or index on
embedded field• e.g., “result.type”
• Exceptional cases…• Embedding results in large
documents• Set of infrequently access
fields
{"_id": 333,"date": "2003-02-09T05:00:00","hospital": "County Hills","patient": "John Doe","physician": "Stephen Smith","type": "Chest X - ray","result": {
"type": "txt","size": 12,"content": {
"value1": 343,"value2": "abc"
}}
}
1-M
{ _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”,
…}, { id: 12346, date: 2015-02-15, type: “blood test”,
…}]}
Pat
ient
s
Embed
1-MModeled in 2 possible ways
{ _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [12345, 12346]}
{ _id: 12345, date: 2015-02-15, type: “Cat scan”, …} { _id: 12346, date: 2015-02-15, type: “blood test”, …}
Pat
ient
s
Reference
Pro
cedu
res
1-M : General Recommendations
• Embed, when possible• Many are weak entities• Access all information in a single query• Take advantage of update atomicity• No additional data duplication• Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:• 16 MB document size• Large number of infrequently accessed fields
{ _id: 2, first: “Joe”, last: “Patient”, addr: { …}, procedures: [ { id: 12345, date: 2015-02-15, type: “Cat scan”,
…}, { id: 12346, date: 2015-02-15, type: “blood test”,
…}]}
M-M
M-M Traditional Relational Association
Join table Physiciansnamespecialtyphone
Hospitalsname
HosPhysicanRelhospitalIdphysicianIdXUse arrays instead
{ _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [ { id: 12345, name: “Joe Doctor”, address: {…},
…}, { id: 12346, name: “Mary Well”, address: {…},
…}]}
M-MEmbedding Physicians in Hospitals collection
{ _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [ { id: 63633, name: “Harold Green”, address: {…},
…}, { id: 12345, name: “Joe Doctor”, address: {…},
…}]}
Data Duplication…
is ok!
{ _id: 1, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]}
M-MReferencing
{ id: 63633, name: “Harold Green”, hospitals: [1,2], …}
Hospitals
{ _id: 2, name: “Plainmont Hospital”, city: “Omaha”, beds: 85, physicians: [63633, 12345]}
Physicians
{ id: 12345, name: “Joe Doctor”, hospitals: [1], …}
{ id: 12346, name: “Mary Well”, hospitals: [1,2], …}
M-M : General Recommendation• Use case determines whether to reference or embed:
1. Data Duplication• Embedding may result in data
duplication• Duplication may be okay if reads
dominate updates• Of the two, which one changes the
least?2. Referencing may be required if many
related items3. Hybrid approach
• Potentially do both .. It’s ok!
{ _id: 2, name: “Oak Valley Hospital”, city: “New York”, beds: 131, physicians: [12345, 12346]}
{ _id: 12345, name: “Joe Doctor”, address: {…}, …} { _id: 12346, name: “Mary Well”, address: {…}, …}
Hos
pita
ls
Reference
Phy
sici
ans
Performance
Example 1: Hybrid ApproachEmbed and Reference
Healthcare Example
patients
procedures
Tailor Schema to Queries
{ "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : ["551ac”, “343fs”]}
{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}
Patient Procedure
Find all patients from NH that have had chest x-rays
Tailor Schema to Queries (cont.){ "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}]}
{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}
Patient Procedure
Find all patients from NH that have had chest x-rays
3.2’s $lookup!!(left-outer
join)
Example 2: Time Series DataMedical Devices
Vital Sign Monitoring DeviceVital Signs Measured:• Blood Pressure• Pulse• Blood Oxygen Levels
Produces data at regular intervals• Once per minute • Many Devices, Many Hospitals
Data From Vital Signs Monitoring Device
{ deviceId: 123456, ts: ISODate("2013-10-16T22:07:00.000-0500"), spO2: 88, pulse: 74, bp: [128, 80]}
• One document x minute x device• Relational approach
Document Per Hour (By minute)
{ deviceId: 123456, ts: ISODate("2013-10-16T22:00:00.000-0500"), spO2: { 0: 88, 1: 90, …, 59: 92}, pulse: { 0: 74, 1: 76, …, 59: 72}, bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]}} • 1 document x device x hour
• Store per-minute data at the hourly level
• Update-driven workload
Characterizing Write Differences
• Example: data generated every minute• Recording the data for 1 patient for 1 hour:
Document Per Event60 inserts
Document Per Hour1 insert, 59 updates
Characterizing Read Differences
• Want to graph 24 hour of vital signs for a patient:
• Read performance is greatly improved
Document Per Event 1440 reads
Document Per Hour24 reads
Characterizing Memory and Storage Differences
Document Per Minute Document Per HourNumber Documents 52.6 Billion 876 Million
Total Index Size 6,364 GB 106 GB_id index 1,468 GB 24.5 GB{ts: 1, deviceId: 1} 4,895 GB 81.6 GB
Document Size 92 Bytes 758 BytesDatabase Size 4,503 GB 618 GB
• 100K Devices • 1 years worth of data, at second resolution (365 x 24 x 60)
MongoDB 3.2
MongoDB 3.2 – a GIANT Release
Hash-Based ShardingRolesKerberosOn-Prem Monitoring
2.2 2.4 2.6 3.0 3.2
Agg. FrameworkLocation-Aware Sharding
$outIndex IntersectionText SearchField-Level RedactionLDAP & x509Auditing
Document ValidationFast FailoverSimpler ScalabilityAggregation ++Encryption At RestIn-Memory Storage EngineBI Connector$lookupMongoDB CompassAPM IntegrationProfiler VisualizationAuto Index BuildsBackups to File System
Doc-Level ConcurrencyCompressionStorage Engine API≤50 replicasAuditing ++Ops Manager
Tools• mgenerate
• Part of mtools: https://github.com/rueckstiess/mtools/wiki/mgenerate
• Model schema using json definition
• Generate Millions of documents with random data
• How well does the schema work?• Queries, Indexes, Data Size, Index Size, Replication
• Demo
Documents are Rich Data Structures{ first_name: ‘Paul’, last_name: ‘Miller’, cell: 1234567890, city: ‘London’, location: [45.123,47.232], professions: [‘banking’, ‘finance’, ‘trader’], physicians: [ { name: ‘Canelo Álvarez, M.D.’, last_visit: ‘Mission Hospital’, last_visit_dt: ‘20160501’, … }, { name: ‘Érik Morales, M.D.’, last_visit: ‘Del Prado Hospital’, last_visit_dt: ‘20160302’, … } ]}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
String
Number
Geo-Coordinates
Fields can be indexed and queried at any level
ORM Layer removed – Data is already an object!
Schema using mgenerate{ "first_name" : { "$string" : { "length" : 30 }}, "last_name" : { "$string" : { "length" : 30 }}, "cell" : "$number", "city" : { "$string" : { "length" : 30 }}, "location" : [ "$number", "$number"], "professions" : { "$array" : [ {
"$choose" : [ "banking", "finance", "trader" ] }, { "$number": [1, 3] }
] }, "physicians" : { "$array" : [ { "name" : { "$string" : { "length" : 30 }}, "last_visit" : { "$string" : { "length" : 30 }}, "last_visit_dt" : "$datetime" }, { "$number" : [1, 5]} ] }}
> mgenerate --host localhost --port 27017 -d webinar -c patients --drop -n 100 patients.json
Use Compass to visualize & query data!
Visual Query ProfilerIdentify your slow-running queries with the click of a button
Index SuggestionsIndex recommendations to improve your deployment
&
MongoDB 3.2 $lookup{ "_id" : 593340651, "first" : "Gregorio", "last" : "Lang", "addr" : { "street" : "623 Flowers Rd", "city" : "Groton", "state" : "NH", "zip" : 3266 }, "physicians" : [10387 33456], "procedures” : [ {id : "551ac”, type : “Chest X-ray”}, {id : “343fs”, type : “Blood Test”}]}
{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}
Patient Procedure
Find all patients from NH that have had chest x-rays
3.2’s $lookup!!(left-outer
join)
MongoDB 3.2 $lookup
{ "_id": 593340651,"first": "Gregorio","last": "Lang","addr": {
"street": "623 Flowers Rd","city": "Groton",
"state": "NH","zip": 3266 },
"physicians": [10387, 33456],"procedures": ["551ac", "343fs"]}
{ "_id" : "551ac”, "date" :"2000-04-26”, "hospital" : 161, "patient" : 593340651, "physician" : 10387, "type" : "Chest X-ray", "records" : [ “67bc6”]}
Patient Procedure
Obtain Patient view with Procedure details, but
without Physicians
MongoDB 3.2 $lookupdb.PatientsColl.aggregate([ { "$match" : { "_id": 593340651 }}, { "$unwind" : "$procedures"}, { "$lookup" : { "from" : "ProceduresColl", "localField" : "procedures", "foreignField": "_id", "as" : "procs" }}, { "$unwind" : "$procs" }, { "$group" : { "_id" : { "_id" : "$_id", "first" : "$first", "last" : "$last", "addr" : "$addr" }, "procedures" : { "$push" : "$procs"} } }, { "$project" : { "_id" : "$_id._id", "first" : "$_id.first", "last" : "$_id.last", "addr" : "$_id.addr", "procedures._id" : 1, "procedures.type" : 1, "procedures.date" : 1 }}]);https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
{"_id": 593340651,"first": "Gregorio","last": "Lang","addr": {
"street": "623 Flowers Rd",
"city": "Groton","state": "NH","zip": 3266
},"procedures": [{
"_id": "551ac",
"date": "2000-04-26",
"type": "Chest X-ray"
}, {"_id":
"343fs","date":
"2000-04-26","type":
"Blood Test"}]
}
Obtain Patient view with Procedure details, but
without Physicians
MongoDB 3.2 Document Validation
db.runCommand( {collMod: "Patients", validator: { $and: [
{ "first_name": { "$type": "string" }},
{ "last_name": { "$type": "string"}}, { "physicians": { "$type": "array"}}
] }, validationLevel: "strict"
});
https://docs.mongodb.com/manual/core/document-validation/
All Patient records must have alphanumeric data for the first and last name, and a list of Physicians
Summary
Embedding and Referencing01
Context of Application Data and Query Workload
Decisions031-1 : Embed1-M : Embed when possible
M-M : Hybrid
02
Different schemas may result in dramatically different query performance, data/index size and hardware requirements!
Iterate04$lookupDocument Validation
3.206Measure data/index size, query performance- mgenerate/mtools- Compass- Cloud Manager / Ops Manager
Tools!05
Q&ASigfrido Narváez
Sr. Solutions Architect, MongoDB