Product Catalog -Retail Reference Architecture
with MongoDB{ name : "Prasoon Kumar", mobile : "+91-9920138004", email : [email protected]}
2
What we will cover today?
Introduction
• Retail Solution• MongoDB
fitment
Retail Component Overview
• Product Catalog
• Content• Inventory• Customer
Product Catalog
• Item, Variant• Pricing• Summary
Pages• Faceted Search
Introduction
4
• it is way too broad to tackle with one solution• data maps so well to the document model• needs for agility, performance and scaling• Many (e)retailers are already using MongoDB• Let's define the best ways and places for it!
Retail solution
5
We have done it before
• Among the Fortune 500 andGlobal 500, MongoDBcustomers include 10 of the top retailers.
• eBay search Suggestion
• May, 2014,MongoDB press release link –Snapdeal customers will beable to choose from over4 million products across 500 and more categories, from more than 30,000 sellers.
6
• Holds complex JSON structures• Dynamic Schema for Agility• Complex querying and in-place updating• Secondary, compound and geo indexing• Full consistency, durability, atomic operations• Near linear scaling via sharding• Overall, MongoDB is a unique fit!
MongoDB is a great fit
7
MongoDB Strategic Advantages
Horizontally Scalable-Sharding
AgileFlexible
High Performance &Strong Consistency
Application
HighlyAvailable-Replica Sets
{ customer: “chandra”, date: new Date(), comment: “Spirited Away”, tags: [“Tezuka”, “Manga”]}
8
Build your data to fit your applicationRelational MongoDB
{ customer_id : 1,name : ”Ruby Mishra",city : ”Mumbai",orders: [ {
order_number : 13,store_id : 10,date: “2014-01-03”,products: [
{SKU: 24578234,
Qty: 3, Unit_price:
350},{SKU:
98762345, Qty: 1, Unit_Price:
110}]
},{ <...> }
]}
CustomerID First Name Last Name City0 Rajesh Kumar New Delhi1 Ruby Mishra Mumbai2 Prasoon Kumar Mumbai3 Chandra Verma New Delhi4 Saurabh Sen Kolkata
Order Number Store ID Product Customer ID10 100 Tablet 011 101 Smartphone 012 101 Dishwasher 013 200 Sofa 114 200 Coffee table 115 201 Suit 2
9
Notions
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Retail Components Overview
11
Information Management
Product Catalog
Content
Inventory
Customer
Channel
Sales & Fulfillment
Insight
Social
Architecture Overview
Customer
ChannelsAmazon
Ebay…
StoresPOSKiosk
…
MobileSmartphone
Tablet
Website
Contact Center
APIData and Service
Integration
SocialFacebook
Twitter…
Data Warehouse
Analytics
Supply Chain Management
System
Suppliers
3rd Party
In Network
Web Servers
Application Servers
12
Commerce Functional Components
Information Layer
Look & Feel
Navigation
Customization
Personalization
Branding
Promotions
Chat
Ads
Customer's Perspective
ResearchBrowseSearch
SelectShopping Cart
PurchaseCheckout
ReceiveTrack
UseFeedbackMaintain
DialogAssist
Market / Offer
Guide
Offer
Semantic Search
Recommend
Rule-based Decisions
Pricing
Coupons
Sell / FullfillOrders
Payments
Fraud Detection
Fulfillment
Business Rules
InsightSession CaptureActivity
Monitoring
Customer Enterprise
Information Management
Product Catalog
Content
Inventory
Customer
Channel
Sales & Fulfillment
Insight
Social
Product Catalog
14
Product Catalog
Product Catalog
MongoDB
Variant
Hierarchy
Pricing
Promotions
Ratings & Reviews
Calendar
Semantic Search
Item
Localization
15
• Single view of a product, one central catalog service
• Read volume high and sustained, 100k reads / s
• Write volume spikes up during catalog update
• Advanced indexing and querying
• Geographical distribution and low latency
• No need for a cache layer, CDN for assets
Product Catalog- principles
16
Product Catalog- requirements
Requirement Example Challenge MongoDB
Single-view of product Blended description and hierarchy of product to ensure availability on all channels
Flexible document-oriented storage
High sustained read volume with low latency
Constant querying from online users and sales associates, requiring immediate response
Fast indexed querying, replication allows local copy of catalog, sharding for scaling
Spiky and real-time write volume
Bulk update of full catalog without impacting production, real-time touch update
Fast in-place updating, real-time indexing, , sharding for scaling
Advanced querying Find product based on color, size, description
Ad-hoc querying on any field, advanced secondary and compound indexing
17
Product Catalog- Product Page
Product images
General Informatio
n
List of Variants
External Informatio
n
Localized Descriptio
n
18
db.item.findOne()
{ _id: "301671", // main item iddepartment: "Shoes",category: "Shoes/Women/Pumps",brand: "Guess",thumbnail: "http://cdn…/pump.jpg",image: "http://cdn…/pump1.jpg", // larger version of
thumbnailtitle: "Evening Platform Pumps",description: "Those evening platform pumps put the perfect
finishing touches on your most glamourous night-on-the-town outfit",
shortDescription: "Evening Platform Pumps",style: "Designer",type: "Platform",rating: 4.5, // user ratinglastUpdated: Date("2014/04/01"), // last update time… }
Product Catalog- Item Model
19
• Get item by id
db.item.findOne( { _id: "301671" } )
• Get item from Product Ids
db.item.findOne( { _id: { $in: ["301671", "301672" ] } } )
• Get items by department
db.item.find({ department: "Shoes" })
• Get items by category prefix
db.item.find( { category: /^Shoes\/Women/ } )
• Indices
productId, department, category, lastUpdated
Product Catalog- Item Definition
20
> db.variant.findOne(){
_id: "730223104376", // the skuitemId: "301671", // references item idthumbnail: "http://cdn…/pump-red.jpg", // variant
specificimage: "http://cdn…/pump-red.jpg",size: 6.0,color: "Red",width: "B",heelHeight: 5.0,lastUpdated: Date("2014/04/01"), // last update time…
}
Product Catalog– Variant Model
21
• Get variant from SKU
db.variation.find( { _id: "730223104376" } )
• Get all variants for a product, sorted by SKU
db.variation.find( { productId: "301671" } ).sort( { _id: 1 } )
• Indices
productId, lastUpdated
Product Catalog– Variant Model
22
Text: { _id: <unique value>,productId: "301671", // main product idlanguage: "en",title: "Evening Platform Pumps",description: "Those evening platform pumps put the
perfect finishing touches on your most glamourous night-on-the-town outfit …",
shortDescription: "Evening Platform Pumps",…
}Indices: productId, full text search
Product Catalog– Product Text
23
Per store Pricing could result in billions of documents,unless you build it in a modular
way
Price: {_id: "sku730223104376_store123",currency: "USD",price: 89.95,lastUpdated: Date("2014/04/01"), // last update time…
}
_id: concatenation of item and store.Item: can be an item id or skuStore: can be a store group or store id.
Indices: lastUpdated
Product Catalog– per store Pricing
24
• Get all prices for a given item
db.prices.find( { _id: /^p301671_/ )
• Get all prices for a given sku (price could be at item level)
db.prices.find( { _id: { $in: [ /^sku730223104376_/, /^p301671_/ ])
• Get minimum and maximum prices for a sku
db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price },
max: { $max : price} } })
• Get price for a sku and store id (returns up to 4 prices)
db.prices.find( { _id: { $in: [ "sku730223104376_store1234",
"sku730223104376_sgroup0",
"p301671_store1234",
"p301671_sgroup0"] , { price: 1 })
Product Catalog– per store Pricing
25
• The hierarchy of items typically follows:
• Company– Division:
• Department: Women's shoe store– Class: Pumps
»Item: Guess classic pump• Variation: size 6 black
Product Catalog– Product Hierarchy
26
Product Catalog– Promotions
27
Product Catalog– Browse and Search products
Browse by category
Special Lists
Filter by attributes
Lists hundreds of item
summaries
Ideally a single query is issued to the database to obtain all items and metadata to display
28
The previous page presents many challenges:
• Response within milliseconds for hundreds of items
• Faceted search on many attributes: category, brand, …
• Attributes at the variant level: color, size, etc, and the variation's image should be shown
• thousands of variants for an item, need to de-duplicate
• Efficient sorting on several attributes: price, popularity
• Pagination feature which requires deterministic ordering
Product Catalog– Browse and Search products
29
Product Catalog– Browse and Search products
Hundreds of sizes
One Item
Dozens of colors
A single item may have thousands of variants
30
Product Catalog– Browse and Search products
Images of the matching variants are displayed
Hierarchy Sort parameter
Faceted Search
31
Product Catalog– Traditional Architecture
Relational DBSystem of Records
Full Text SearchEngine
Indexing
#1 obtain search
results IDs
ApplicationCache
#2 obtain objects by
ID
Pre-joined into objects
32
The traditional architecture issues:
• 3 different systems to maintain: RDBMS, Search engine, Caching layer
• search returns a list of IDs to be looked up in the cache, increases latency of response
• RDBMS schema is complex and static• The search index is expensive to update
• Setup does not allow efficient pagination
Product Catalog– Traditional Architecture
33
MongoDB Data Store
Product Catalog- Architecture
SummariesItems Pricing
PromotionsVariants Ratings & Reviews
#1 Obtain results
34
The summary relies on the following parameters:
• department e.g. "Shoes"
• An indexed attribute
– Category path, e.g. "Shoes/Women/Pumps"
– Price range
– List of Item Attributes, e.g. Brand = Guess
– List of Variant Attributes, e.g. Color = red
• A non-indexed attribute
– List of Item Secondary Attributes, e.g. Style = Designer
– List of Variant Secondary Attributes, e.g. heel height = 4.0
• Sorting, e.g. Price Low to High
Product Catalog– Summary Model
35
> db.summaries.findOne(){ "_id": "p39", "title": "Evening Platform Pumps 39", "department": "Shoes", "category": "Shoes/Women/Pumps", "thumbnail": "http://cdn…/pump-small-39.jpg", "image": "http://cdn…/pump-39.jpg", "price": 145.99, "rating": 0.95, "attrs": [ { "brand" : "Guess"}, … ], "sattrs": [ { "style" : "Designer"} , { "type" : "Platform"}, …], "vars": [ { "sku": "sku2441", "thumbnail": "http://cdn…/pump-small-39.jpg.Blue", "image": "http://cdn…/pump-39.jpg.Blue", "attrs": [ { "size": 6.0 }, { "color": "Blue" }, …], "sattrs": [ { "width" : "B"} , { "heelHeight" : 5.0 }, …], }, … Many more skus … ] }
Product Catalog– Summary Model
36
• Get summary from item iddb.variation.find({ _id: "p301671" })
• Get summary's specific variation from SKUdb.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } )
• Get summary by department, sorted by ratingdb.variation.find( { department: "Shoes" } ).sort( { rating: 1 } )
• Get summary with mix of parametersdb.variation.find( { department : "Shoes" ,
"vars.attrs" : { "color" : "Gray"} , "category" : ^/Shoes/Women/ , "price" : { "$gte" : 65.99 , "$lte" :
180.99 } } )
Product Catalog- Summary Model
37
Product Catalog– Summary Model
• The following indices are used:– department + attr + category + _id– department + vars.attrs + category + _id– department + category + _id– department + price + _id– department + rating + _id
• _id used for pagination• Can take advantage of index intersection• With several attributes specified (e.g. color=red
and size=6), which one is looked up?
38
Facet samples:{ "_id" : "Accessory Type=Hosiery" , "count" : 14}{ "_id" : "Ladder Material=Steel" , "count" : 2}{ "_id" : "Gold Karat=14k" , "count" : 10138}{ "_id" : "Stone Color=Clear" , "count" : 1648}{ "_id" : "Metal=White gold" , "count" : 10852}
Single operations to insert / update:db.facet.update( { _id: "Accessory Type=Hosiery" },
{ $inc: 1 }, true, false)
The facet with lowest count is the most restrictive…It should come first in the query!
Product Catalog– Facet
39
Product Catalog– Query stats
Department Category Price Primary attribute
Time Average (ms)
90th (ms) 95th (ms)
1 0 0 0 2 3 3
1 1 0 0 1 2 2
1 0 1 0 1 2 3
1 1 1 0 1 2 2
1 0 0 1 0 1 2
1 1 0 1 0 1 1
1 0 1 1 1 2 2
1 1 1 1 0 1 1
1 0 0 2 1 3 3
1 1 0 2 0 2 2
1 0 1 2 10 20 35
1 1 1 2 0 1 1
Thank You!
Prasoon KumarConsulting Engineer - APAC, MongoDB@prasoonk