Upload
norberto-leite
View
227
Download
0
Tags:
Embed Size (px)
Citation preview
6
Traditional RDMS
create table users (id int, firstname text, lastname text); Table definition
Column structure
7
Traditional RDMS
Table with checks
create table cat_pictures( id int not null, size int not null, picture blob not null, user_id int, primary key (id), foreign key (user_id) references users(id));
Null checks
Foreign and Primary key checks
9
Is this Flexible?
• What happens when we need to change the schema? – Add new fields – Add new relations – Change data types
• What happens when we need to scale out our data structure?
11
Flexible Schema
• No mandatory schema definition • No structure restrictions • No schema validation process
12
We start from code public class CatPicture {
int size;byte[] blob;
}
public class User {
int id;String firstname;String lastname;CatPicture[] cat_pictures;
}
13
Document Structure
{ _id: 1234, firstname: 'Juan', lastname: 'Olivo', cat_pictures: [ { size: 10, picture: BinData("0x133334299399299432"), } ] }
Rich Data Types
Embedded Documents
14
Flexible Schema Databases
• Challenges – Different Versions of Documents – Different Structures of Documents – Different Value Types for Fields in
Documents
15
Different Versions of Documents
Same document across time suffers changes on how it represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
First Version
Second Version
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData("0x133334299399299432")}] }
Third Version
16
Different Versions of Documents
Same document across time suffers changes on how it represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "name": { "first": "Juan", "last": "Olivo"} }
Different Structure
17
Different Structures of Documents
Different documents coexisting on the same collection
{ "_id" : 175, "brand": "Ford", "model": "Mustang", "date": ISODate("XXX") }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
Within same collection
18
Different Data Types for Fields
Different documents coexisting on the same collection
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "bdate": 1224234312}
{ "_id" : 175, "firstname": "Paco", "lastname": "Hernan", "bdate": "2015-06-27"}
{ "_id" : 176, "firstname": "Tomas", "lastname": "Marce", "bdate": ISODate("2015-06-27")}
Same field, different data type
20
Change Management
Versioning Class Loading How to set correct data format
versioning? What mechanisms are out there to
make this work ?
Coupled Architectures
Database Application A
Application C
Application B Let me perform some schema
changes!
28
Decoupled Architectures
• Allows the business logic to evolve independently of the data layer
• Decouples the underlying storage / persistency option from the business service
• Changes are "requested" and not imposed across all applications
• Better versioning control of each request and it's mapping
30
ODM
• Reduce impedance between code and Databases • Data management facilitator • Hides complexity of operators • Tries to decouple business complexity with "magic"
recipes
31
Spring Data • POJO centric model • MongoTemplate || CrudRepository
extensions to make the connection to the repositories
• Uses annotations to override default field names and even data types (data type mapping)
public interface UserRepository extends MongoRepository<User, Integer>{
}
public class User {
@Idint id;@Field("first_name")String firstname;String lastname;
32
Spring Data Document Structure
{ "_id": 1, "first_name": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ] }
33
Spring Data Considerations
• Data formats, versions and types still need to be managed
• Does not solve issues like type validation out-of-box • Can make things more complicated but more
"controllable" @Field("first_name")
String firstname;
34
Morphia • Data source centric • Will do all the discovery of POJO's for
given package • Also uses annotations to perform
overrides and deal with object mapping
@Entity("users")public class User {
@Idint id;String firstname;String lastname;
morphia.mapPackage("examples.odms.morphia.pojos");
Datastore datastore = morphia.createDatastore(new MongoClient(), "morphia_example");datastore.save(user);
35
Morphia Document Structure
{ "_id": 1, "className": "examples.odms.morphia.pojos.User", "firstname": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ] }
Class Definition
36
Morphia Considerations
• Enables better control at Class loading • Also facilitates, like Spring Data, the field overriding (tags
to define field keys) • Better support for Object Polymorphism
38
Versioning
Versioning of data structures (specially documents) can be very helpful
Recreate documents over time Flow Control Data / Field Multiversion Requirements Archiving and History Purposes
39
Versioning – Option 0
Change existing document each time there is a write with monotonically increasing version number inside
{ "_id" : 174, "v" : 1, "firstname": "Juan" }
{ "_id" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.update( {"_id":174 } , { {"$set" :{ ... }, {"$inc": { "v": 1 }} } )!
Increment field value
40
Versioning – Option 1
Store full document each time there is a write with monotonically increasing version number inside
{ "docId" : 174, "v" : 1, "firstname": "Juan" }
{ "docId" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "docId" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.insert( {"docId":174 …})!
> db.docs.find({"docId":174}).sort({"v":-1}).limit(-1);!
!
Find always latest version
41
Versioning – Option 2
Store all document versions inside a single document.
> db.users.update( {"_id": 174 } , { {"$set" :{ "current": ... }, ! {"$inc": { "current.v": 1 }}, {"$addToSet": {"prev": {... }}} } )!
!
Current value
{ "_id" : 174, "current" : { "v" :3, "attr1": 184, "attr2" : "A-1" }, "prev" : [ { "v" : 1, "attr1": 165 }, { "v" : 2, "attr1": 165, "attr2": "A-1" } ] }
Previous values
42
Versioning – Option 3
Keep collection for "current" version and past versions
> db.users.find( {"_id": 174 })!
> db.users_past.find( {"pid": 174 })!
!
{ "pid" : 174, "v" : 1, "firstname": "Juan" }
{ "pid" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
Previous versions collection
Current collection
43
Versioning Schema Fetch 1 Fetch Many Update Recover if
Fail 0) Increment Version
Easy, Fast Fast Easy Medium N/A
1) New Document
Easy, Fast Not Easy, Slow
Medium Hard
2) Embedded in Single Doc
Easy, Fastest
Easy, Fastest Medium N/A
3) Separate Collection
Easy, Fastest
Easy, Fastest Medium Medium, Hard
45
Migrations
Several types of "Migrations":
Add/Remove Fields Change Field Names Change Field Data Type Extract Embedded Document into Collection
46
Add / Remove Fields
For Flexible Schema Database this is our Bread & Butter
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "newfield": "value" }
> db.users.update( {"_id": 174}, {"$set": { "newfield": "value" }, "$unset": {"gender":""} })!
47
Change Field Names
Again, programmatically you can do it
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}
{ "_id" : 174, "first": "Juan", "last": "Olivo" }
> db.users.update( {"_id": 174}, {"$rename": { "firstname": "first", "lastname":"last"} })!
48
Change Field Data Type
Align to a new code change and move from Int to String!
{..."bdate": 1435394461522} {..."bdate": "2015-06-27"}
1) Batch Process
2) Aggregation Framework
3) Change based on usage
49
Change Field Data Type 1) Batch Process – bulk api
public void migrateBulk(){DateFormat df = new SimpleDateFormat("yyyy-MM-DD");...List<UpdateOneModel<Document>> toUpdate = new ArrayList<UpdateOneModel<Document>>();for (Document doc : coll.find()){ String dateAsString = df.format( new Date( doc.getInteger("bdate", 0) )); Document filter = new Document("_id", doc.getInteger("_id")); Document value = new Document("bdate", dateAsString); Document update = new Document("$set", value); toUpdate.add(new UpdateOneModel<Document>(filter, update));}coll.bulkWrite(toUpdate);
50
Change Field Data Type 1) Batch Process – bulk api
public void migrateBulk(){...for (Document doc : coll.find()){ ... }coll.bulkWrite(toUpdate);
Is there any problem with this?
51
Change Field Data Type 1) Batch Process – bulk api
public void migrateBulk(){...//bson type 16 represents int32 data typeDocument query = new Document("bdate", new Document("$type", "16"));for (Document doc : coll.find(query)){ ... }
coll.bulkWrite(toUpdate);More efficient filtering!
52
Extract Document into Collection Normalize your schema
{"size": 10, picture: BinData("0x133334299399299432")} { "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}
> db.users.aggregate( [ ! {$unwind: "$cat_pictures"},! {$project: { "_id":0, "uid":"$_id", "size": "$cat_pictures.size", "picture": "$cat_pictures.picture"}}, ! {$out:"cats"}])!
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData(0, "m/lhLlLmoNiUKQ==")}] }
{"size": 10, "picture": BinData(0, "m/lhLlLmoNiUKQ==")}
54
Tradeoffs Positives Penalties
Decoupled Architecture - Should be your default approach
- Clean Solution - Scalable
N/A
Data Structures Variability - Reflects Nowadays data structures
- You can push decisions for later
- More complex code base
Data Structures Strictness - Simple to maintain - Always aligned with your
code base
- Will eventually need Migrations
- Restricts your code iterations
56
Recap
• Flexible and Dynamic Schemas are a great tool – Use them wisely – Make sure you understand the tradeoffs – Make sure you understand the different strategies and
options • Works well with Strongly Typed Languages
Obrigado!
Norberto Leite Technical Evangelist http://www.mongodb.com/norberto [email protected] @nleite