55
Solutions Architect, 10gen [email protected] Twitter: @hungarianhc Kevin Hanson Sharding MongoDB Q&A to follow the webinar… Feel free to ask questions in the Q&A box. A recording will be available 24 hours from now. Any other issues? Contact [email protected]

Sharding morning session

  • Upload
    mongodb

  • View
    1.515

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Sharding   morning session

Solutions Architect, 10gen

[email protected]

Twitter: @hungarianhc

Kevin Hanson

Sharding MongoDB

Q&A to follow the webinar…Feel free to ask questions in the Q&A box.

A recording will be available 24 hours from now.Any other issues? Contact [email protected]

Page 2: Sharding   morning session

Agenda

• What Does it Mean?

• When to Shard

• Replica Set Crash Course

• Architecture

• Configuration

• Mechanics

• Solutions

Page 3: Sharding   morning session

Traditional Scaling Up(costly)

Page 4: Sharding   morning session

Visual representation of horizontal scaling

Google, ~2000: Horizontal Scalability (scale out)

Page 5: Sharding   morning session

Data Store Scalability in 2005

• Custom Hardware– Oracle

• Custom Software– Facebook + MySQL

Page 6: Sharding   morning session

Data Store Scalability Today

• MongoDB auto-sharding available in 2009

• A data store that is– Publicly available– Open source

(https://github.com/mongodb/mongo)– Horizontally scalable– Application independent

Page 7: Sharding   morning session

Why shard?

Page 8: Sharding   morning session

Throughput Exceeds I/O

Page 9: Sharding   morning session

Data Set Grows Too Large

Page 10: Sharding   morning session

Replica Sets: The Building Blocks of Sharding

Page 11: Sharding   morning session

Replica Sets

Primary

Secondary

Secondary

Read

Write

Read

Read

Dri

ver

Page 12: Sharding   morning session

Replica Sets

Primary

Secondary

Secondary

Read

Write

Read

Read

Dri

ver

• High Availability

• Read Scalability

Page 13: Sharding   morning session

Sharding Architecture

Page 14: Sharding   morning session

Data stored in shard

• Shard is a node of the cluster

• Shard can be a single mongod or a replica set

Page 15: Sharding   morning session

• Config Server– Stores cluster chunk ranges and locations– Can have only 1 or 3 (production must have

3)– Two phase commit (not a replica set)

Config server stores meta data

Page 16: Sharding   morning session

MongoS manages the data

• Mongos– Acts as a router / balancer– No local data (persists to config database)– Can have 1 or many

Page 17: Sharding   morning session

Sharding Infrastructure

Page 18: Sharding   morning session

Partition data based on ranges

• User defines shard key

• Shard key defines range of data

• Key space is like points on a line

• Range is a segment of that line

• Chunks are subsets of the entire range

Page 19: Sharding   morning session

Chunks

-∞ +∞

Page 22: Sharding   morning session

Chunks

-∞ +∞

[email protected]

[email protected]

[email protected]

Split!This is a chunk

This is a chunk

Page 26: Sharding   morning session

Distribute data in chunks across nodes

• Initially 1 chunk

• Default max chunk size: 64mb

• MongoDB automatically splits & migrates chunks when max reached

Page 27: Sharding   morning session

MongoDB manages data

• Queries routed to specific shards

• MongoDB balances cluster

• MongoDB migrates data to new nodes

Page 28: Sharding   morning session

MongoDB Auto-Sharding

• Transparent to Developers– Same interface as single mongod

• Two Steps– Enable Sharding for a database– Shard collection within database

Page 29: Sharding   morning session

Shard Key

Page 30: Sharding   morning session

Shard Key

• Choose a field common used in queries

• Shard key is immutable

• Shard key values are immutable

• Shard key requires index on fields contained in key

• Uniqueness of `_id` field is only guaranteed within individual shard

• Shard key limited to 512 bytes in size

Page 31: Sharding   morning session

Shard Key Considerations

• Cardinality

• Write distribution

• Query isolation

• Data Distribution

Page 32: Sharding   morning session

Mechanics

Page 33: Sharding   morning session

Chunk splitting

• A chunk is split once it exceeds the maximum size• There is no split point if all documents have the same

shard key• Chunk split is a logical operation (no data is moved)• If split creates too large of a discrepancy of chunk

count across cluster a balancing round starts

Page 34: Sharding   morning session

Chunk Moving

• Balancer is running on mongos• Once the difference in chunks between the most

dense shard and the least dense shard is above the migration threshold, a balancing round starts

Page 35: Sharding   morning session

Cluster Request Routing

• Targeted Queries

• Scatter Gather Queries

• Scatter Gather Queries with Sort

Page 36: Sharding   morning session

Cluster Request Routing: Targeted Query

Page 37: Sharding   morning session

Routable request received

Page 38: Sharding   morning session

Request routed to appropriate shard

Page 39: Sharding   morning session

Shard returns results

Page 40: Sharding   morning session

Mongos returns results to client

Page 41: Sharding   morning session

Cluster Request Routing: Non-Targeted Query

Page 42: Sharding   morning session

Non-Targeted Request Received

Page 43: Sharding   morning session

Request sent to all shards

Page 44: Sharding   morning session

Shards return results to mongos

Page 45: Sharding   morning session

Mongos returns results to client

Page 46: Sharding   morning session

Cluster Request Routing: Non-Targeted Query with Sort

Page 47: Sharding   morning session

Non-Targeted request with sort received

Page 48: Sharding   morning session

Request sent to all shards

Page 49: Sharding   morning session

Query and sort performed locally

Page 50: Sharding   morning session

Shards return results to mongos

Page 51: Sharding   morning session

Mongos merges sorted results

Page 52: Sharding   morning session

Mongos returns results to client

Page 53: Sharding   morning session

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Ranges ofChunks

Ranges ofChunks

Ranges ofChunks

Ranges ofChunks

MongoS MongoS

Queries

MongoS

Config

Config

Config

Page 54: Sharding   morning session

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Ranges ofChunks

Ranges ofChunks

Ranges ofChunks

Ranges ofChunks

MongoS MongoS

Queries

MongoS

Config

Config

Config

Chicago

Beijing

London

Page 55: Sharding   morning session

Kevin Hanson

Questions?

Solutions Architect, 10gen

[email protected]

Twitter: @hungarianhc