MongoDB@Baidu
Xiao Beibei Project Owner & Senior Developer
Baidu
Who are we?
ü Largest internet search services in China
ü Various products, solu=ons & services
ü NASDAQ: BIDU Market Cap: 64B Revenue: 10B Qtrly Growth: 33.10%
Story between 2 “Giants”
+Who am
I? ü Senior NoSQL Developer
ü Various MongoDB project owner
ü In charge of the LARGEST MongoDB cluster in CHINA
Where MongoDB fits?
Small Step à Big Surprise
l Start from Baidu Address Book
ü Small project
ü Various sources
ü Flexible schema
l more than 3 hundred million
users
Success + Confidence = More Projects • Message & Mul=media Message Projects • Netdisk picture meta data • Facial Recogni=on System • User Opera=on Log System • Baidu Cloud • Baidu Post Bar … …
ü Over 100 businesses ü Drive meta data > 200B ü PB Level
Big MongoDB Cluster • Consolidate the entrance • All use SSD + raid 0 • Most 1 Master, 2 Secondary, 2 Arbiter • Some 1 Master, 2 Secondary, 1 Arbiter
Standard Mongodb Cluster
Standard Mongodb Cluster ….
Rest mongoDB service Api
… mongos
P
S…
A…
P S…
A…
config
How we use MongoDB?
Throughput !!!
• All run good, BUT when WRITES > 10 thousands qps
Query Slow
Writes Timeout Mongod
Memory Usage Increase
Reads impact, Query Slow
Problem
Simple way is the BEST! Root Cause Cache Replacement
In 3.0, Cache replacement works not quite efficiently
Try to Pilot Upgrade to 3.2
Solu=on
Replica=on makes this possible Problem
Online index crea=on issue • Time-‐Consuming • Direct or background • Write =meout during crea=ng
Solu=on
• Crea=ng index in turn • Secondary first and primary last • Oplog =me
Big Issue Problem
Why? • MongoDB balancer user single thread to move data • Cons & Pros
Query Slow!!!
Data increases rapidly à Clusters increase accordingly Largest cluster = 160 shards, 2T each
Mi=ga=on • Reduced the balancer window from 24 to 6 hours, so that it ran in off-‐
peak hours • Good way for a period =me, BUT when more …
• Shard key: uid or Hash? • Pre-‐alloca=ng chunks • Balancer or oplog?
Solu=on
Na=ve Auto Balance
Config Server Mongos
shard1 shard2
Please receive data
Data Transferring …
Update Chunk Manager Update Chunk Manager
Update Chunk Informa=on
Update Chunk Cache
Delete or Not delete
Incremental data sync
Move certain chunk to shard2
Solu=on
Modified Balancer
Data Transferring …
Update Chunk Manager Update Chunk Manager
Update Chunk Informa=on
Update when WriteBack
Solu=on
Config Server Mongos
shard1 shard2
Itera=on in Detail
IdenFfy a range to be migrated Identify
Take a note of the current oplog Fme Record
Send a query to source shard, and iterate over the returned cursor to write matching documents to the desFnaFon shard
Query
Scan the oplog from the source shard for events recorded from Fmestamp recorded at the start of this pass; matching events are then wriLen to the desFnaFon shard
Scan & Match
When the last oplog event has been applied, the pass has completed and the worker process can be stopped
Apply
Summary
Quick Summary
• Early adop=on makes us
• 100+ diverse app & more are coming
• $$$ Cost saving with awesome scalability
• Con=nuous improvements = Confidence
• Add LSM to WT to have beier insert performance • Mulitmaster as an op=on
Key Take away • Baidu = Big system + Big data + Big challenge
– We need a strong & scalable DB architecture, MongoDB is fantas=c!
• Upgrading to 3.x is a MUST – WT engine, Document valida=on, …
• Innova=on & Automa=on via customized scripts
MongoDB CAN manage our “BIG DATA”
600 nodes 160 shards
200 B documents
Next Steps MongoDB: is enhancing balancer performance
Working with MongoDB as the beta tester for the new feature
Enabling parallel chunk migra=on Remove Throiling by Default (for WiredTiger)
+Questions?