Upload
gaurav-awasthi
View
4.524
Download
3
Embed Size (px)
DESCRIPTION
The slides pertain to the talk delivered in Agile NCR 2013 conference at New Delhi
Citation preview
• A massive volume of both structured and unstructured data that it's difficult to process using traditional database and software techniques
• As of 2012, every day 2.5 quintillion bytes of data were created
• Data Source:
– Climate sensors
– Social media
– Digital pictures and videos
– Purchase transaction records
– Cell phone GPS signals
• Characteristics : Volume, Velocity and Variety
• Key Usage: leverage data-driven strategies to innovate, compete, and capture value from deep and up-to-real-time information
Big Data - The New World Order
NoSQL
Defining Characteristics
– Scaling out on commodity hardware
– Aggregate structure
– Schema-less attitude
– Impedance Mismatch : Relational model in-memory data structures
– Big Data : Massive data being stored and transacted
– Reduced Data Management and Tuning Requirements
– Eventually consistent / BASE (not ACID)
Mongo DB
• Open-source, Document-oriented, popular for its agile and scalable approach
• Notable Features :
– JSON/BSON data model with dynamic schema
– Auto-sharding for horizontal scalability
– Built-in replication with automated fail-overs
– Full, flexible index support including secondary indexes
– Rich document-based queries
– Aggregation framework and Map / Reduce
– GridFS for large file storage
Agile & MongoDB
Characteristics supporting Agility
– Allows dynamic schema (schemaless)
– JSON format, which maps well to object-style data.
– Simplified db tuning
– Cost Effective and Simple replica sets
– Easy scale out due to simplified sharding mechanism
– Rich content using GridFS
A Demo for Schema-less way
A Demo Query Plan and DB Tuning
Replication
• Replica set – a mongod cluster
• Ensures High Availability, Redundancy, Automated Fail-
over
• Writes to the Primary, Reads from all
• Asynchronous replication
• In conventional terms, more like Master/Slave replication
• Members can be configured to be: Secondary only /
Non- Voting / Hidden / Arbiters / Delayed
Elastic Architecture
A Demo for Replica Set
• Run the 3 mongod processes
• Demo that they are running on different ports using ps –ef
• Initiate the repl set and add members
• Demo which ones are primary and secondary using rs.status()
• Now insert docs into a collection in primary
• Demo that its replicated to secondary
• Thereby proving how straight fwd is replication
• Briefly touch upon the steps for sharding too
Case Study – E-Commerce ShopArchitecture Diagram
Product suppliercatalog App
(container) MongoExternal Feeds
Payment Gateway
Domain model
JSON structure{"_id" :
ObjectId("5082626144ae3a687919c094"),
"name" : "iPhone 5 Pop Blue Case",
"canonicalName" : "iphone-5-pop-blue-case",
"retailPrice" : 19.99,
"productCode" : "G4IC542G",
"category" : {
"categoryCode" : "CAS",
"name" : "Cases",
"canonicalName" : "cases"
},
"compatibleHandsets" : [{ "manufacturer" : { "name" : "Apple", "canonicalName" : "apple" }, "model" : "iPhone 5 16GB", "name" :
"Apple_iPhone_5_16GB", "canonicalName" :
"apple_iphone_5_16gb"}],
review_ids : ["review_id1", "review_id2"]
}
Design decisions with Mongo
• Agile incremental releases
• Unstructured data from multiple suppliers
• GridFS : Stores large binary objects
• Spring Data Services
• Embedding and linking documents
• Easy replication set up for AWS
Conclusion and Thanks
MongoDB: the right persistence tool for Agile Development for multitude of business problems in the new world order
References:• Mongodb.org