54
Putting the mongo in MongoDB Scaling your reads and writes for “web-scale” performance

Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Putting the mongo in MongoDB

Scaling your reads and writes for “web-scale” performance

Page 2: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

About me

• Joris Kuipers, @jkuipers

• Manager Technology Delivery at Orange11 Software Pilot at Trifork Amsterdam

• Former SpringSource consultant

• Primary consultant in Trifork’s partnership with 10gen

Page 3: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Agenda

• MongoDB intro

• Replication Sets

• Sharding

Page 4: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Intro

Page 5: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Intro

Page 6: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Why so popular?

Page 7: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Why so popular?

Page 8: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Replication

Page 9: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Replication?

Multi-node setup with primary and two or more replicating secondaries

Primary use case:Automatic failover in case of node failure

Page 10: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Replica Set - Creation

Page 11: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Replica Set - Initialization

Page 12: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Replica Set - Failure

Page 13: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Replica Set - Failover

Page 14: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Replica Set - Recovery

Page 15: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Replica Set - Recovered

Page 16: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

I thought this was about scaling?

• Secondaries can be queried

• Eventually consistent reads

• Secondaries can be hidden

• Dedicated node for e.g. analytics

Page 17: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Strong Consistency

Page 18: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Delayed Consistency

Page 19: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Read Preferences Mode

5 modes :

• primary (only) - Default

• primaryPreferred

• secondary

• secondaryPreferred

• nearest

Closest node always used for reads (all modes but primary)

Page 20: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Replica Sets for scaling

• Similar limitations to RDBMSs:

– Loses consistency

– Scales reads only

– No help if working set > RAM of single node

• Real scalability requires distributing data

Page 21: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Sharding

“Shards are the secret ingredient in the web scale sauce. They just work”

Page 22: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

When Working Set > RAM

Page 23: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

When R/W throughput > I/O

Page 24: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

What is sharding?

• Distributing data across nodes in cluster

• Requires some form of partitioning

• Hard to do manually

–What if nodes are not balanced?

–What if you add or remove nodes?

• Automatic and at the core of MongoDB

Page 25: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Partition data based on ranges

• User defines shard key

• Shard key defines range of data

• Key space is like points on a line

• Range is a segment of that line

Page 26: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Data distributed in chunks across nodes

• Initially 1 chunk

• Default max chunk size: 64mb

• Chunks automatically split & migrated when max reached

Page 27: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

MongoDB manages data

• Queries routed to specific shards

• MongoDB balances cluster

• MongoDB migrates data to new nodes

Page 28: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

MongoDB Auto-Sharding

• Minimal effort required

– Same interface as single mongod

• Two steps

– Enable Sharding for a database

– Shard collection within database

Page 29: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Data stored in shard

• Shard is a node of the cluster

• Single mongod or (typically) replica set

Page 30: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Config server stores meta data

• Stores cluster chunk ranges and locations

• Must have 3 (1 for dev/test only)

• Two phase commit (not a replica set)

Page 31: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

MongoS manages data

• Acts as a router / balancer

• No local data

– lightweight process

– persists to config database

• Can have 1 or many

Page 32: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Sharding Infrastructure

Page 33: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Mechanics

Page 34: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Partitioning

• Remember: range-based

Page 35: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Chunk is section of full range

Page 36: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Chunk Splitting

• Chunk split if > maximum size

• No split point if all docs have same shard key

• Chunk split is a logical operation

• Balancing round if chunk count diff > X

Page 37: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Balancing

• Balancer runs on mongos

• Starts when chunk count diff between most dense and least dense shard > migration threshold

Page 38: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Acquiring Balancing Lock

• Mongos balancer takes “balancer lock”

• Status stored in config.locks

Page 39: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Moving The Chunk

• mongos:“moveChunk” cmd to source shard

• Source shard notifies target shard

• Target claims chunk’s shard-key range and starts pulling documents from source

Page 40: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Committing Migration

• When complete, target updates config server

– New chunk locations

Page 41: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Cleanup

• Source deletes moved data

–Waits for open cursors to close

• Lock released when old chunks deleted

Page 42: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Querying a sharded cluster

Page 43: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Cluster Request Routing

• Targeted Queries

– Include shard key

– Aim for large query % to be targeted!

• Scatter Gatter Queries

– Do not include shard key

• Note: queries include inserts/updates/deletes

Page 44: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Targeted Query Routing

Page 45: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Routed to appropriate shard

Page 46: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Results returned to mongos

Page 47: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Results returned to client

Page 48: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Non-targeted Query Routing

Page 49: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Routed to all shards

Page 50: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Results returned to mongos

Page 51: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Results returned to client

Page 52: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Shard Key

• Key and value immutable

– Choose wisely

– Key’s field(s) require index

• Considerations:

– Cardinality

–Write distribution

– Query isolation

– Data distribution

Page 53: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Conclusion

• Sharding is key to true scalability with Mongo

• Mechanism is automatic

• Understanding it is essential

– Choice in sharding key

–Writing efficient queries

– Don’t lock yourself in

–Monitor runtime behavior

Page 54: Putting the mongo in MongoDB - Trifork · • MongoDB intro • Replication Sets • Sharding. Intro. Intro. Why so popular? Why so popular? Replication. Replication? Multi-node setup

Questions?