Upload
mongodb
View
1.444
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
1
Rajat Bansal Chief Technology Officer
Scaling to Millions, FAST!
2
- The Hike Journey!
- Why MongoDB?!
- MongoDB Use Cases:Overview and Deep Dive!
- Lessons learned & Next Steps
Agenda
3
The fastest growing !Made in India IM App!
4
5
6
7
8
9
10
11
12
13
AND TODAY
14
15
Why MongoDB?
Storage decisions governed by CAP Theorem
16
Why MongoDB?
C Consistency Availability Partition Tolerance
Cost Availability of Talent
Production Support
A P
17
Cost
- Start Small, Grow Big, FAST, REALLY FAST!- Started with 1 replica set on EC2!- Grew to multiple replica sets, sharded clusters
18
Availability
- Small team of 3 people writing the entire server!- Easy Ramp-up and management!- Low setup and administration requirements
19
Production Support
- Cost of downtime is very high. !- Small team needed support in downtime. !- Decision to take Production Support proved life-saving in outages
20
User Profile Store!!! - 3000 reads / 500 writes per sec!
!
Temporary Message Store!!! - 1000 reads / 3900 writes per sec!!! !
Other Miscellaneous usage: Grid FS etc
HIKE’s MongoDB Use-cases
21
Offline Message Store!!! - App Level Sharding!!! - 4 Mongo instances with 32 DB each!!! - Horizontally scalable upto 128 instances when needed!!! - Tested upto 30K Ops in simulated environment !!! - Protected by “Redis” layer to reduce queries!!! - Latencies < 1ms
HIKE MongoDB Architecture
Primary!
Secondary!
Secondary!
Mongod-1
Primary!
Secondary!
Secondary!
Mongod-2
Primary!
Secondary!
Secondary!
Mongod-3
Primary!
Secondary!
Secondary!
Mongod-4
Shard Manager
32 dbs each
App Layer
23
User Profile Store - Replica Set (1 primary, 2 Secondary) - Writes to Primary - Reads from Secondary - Latencies < 1ms
HIKE MongoDB Architecture
24
mongoDB (Happy State < 1ms)
0.65ms
0.80ms
25
mongoDB (1 Year Timeline)
26
mongoDB (Outage 1)
Outage 1
27
Outage 1!!! - Latencies went over the roof “1ms —> 1000ms”!!! - What went wrong: Lot of operations on “Arrays”!!! - “Production Support” to the rescue! !!!“Adding and modifying array entries can require a scan of much or all of each array being updated, resulting in slow operations"
HIKE Learnings
28
mongoDB (Outage 2)
Outage 2
29
Outage 2!!! - Latencies increased 20-50X!!! - What went wrong: !!! ! ! - Disk I/O was bottleneck!!! ! ! - “ReadAhead” was high!!! !!“Read/Write Throughput Exceeds I/O”
HIKE Learnings
30
mongoDB (Outage 3)
Outage 3
31
Outage 3!!! - MongoDB crashed!!! - Adhoc Script doing fullTableScan !!! - Need to protect your systems “noTableScan” !!
!
“Protect your production systems. Use the mechanisms available”! !
HIKE Learnings
32
- Proactive Health Checks - Production Support Helps - Put Mechanisms to safeguard production
HIKE Learnings