68
Kernel Engineer, 10gen Shaun Verch #MongoDBDays Introduction to Sharding

Basic Sharding in MongoDB presented by Shaun Verch

  • Upload
    mongodb

  • View
    1.801

  • Download
    6

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Basic Sharding in MongoDB presented by Shaun Verch

Kernel Engineer, 10gen

Shaun Verch

#MongoDBDays

Introduction to Sharding

Page 2: Basic Sharding in MongoDB presented by Shaun Verch

Enable Sharding?

> sh.enableSharding("production")

Mon Jun 17 13:45:39.929 not connected to a mongos at src/mongo/shell/utils_sh.js:6

>

Page 3: Basic Sharding in MongoDB presented by Shaun Verch

Agenda

• Scaling Data

• MongoDB's Approach

• Architecture

• Configuration

• Mechanics

Page 4: Basic Sharding in MongoDB presented by Shaun Verch

Scaling Data

Page 5: Basic Sharding in MongoDB presented by Shaun Verch

Do you need to shard?

• Hardware– Disks– Memory– Network– CPU

• Application– Access patterns– Indexing– Schema design

Page 6: Basic Sharding in MongoDB presented by Shaun Verch

Read/Write Throughput Exceeds I/O

Page 7: Basic Sharding in MongoDB presented by Shaun Verch

Working Set Exceeds Physical Memory

Working SetIndexes

Data

Working SetIndexes Data

Page 8: Basic Sharding in MongoDB presented by Shaun Verch

Vertical Scalability (Scale Up)

Page 9: Basic Sharding in MongoDB presented by Shaun Verch

Horizontal Scalability (Scale Out)

Page 10: Basic Sharding in MongoDB presented by Shaun Verch

Data Store Scalability

• Custom Hardware– Oracle

• Custom Software– Facebook + MySQL– Google

Page 11: Basic Sharding in MongoDB presented by Shaun Verch

Data Store Scalability Today

• MongoDB Auto-Sharding

• A data store that is– Free– Publicly available– Open Source

(https://github.com/mongodb/mongo)– Horizontally scalable– Application independent

Page 12: Basic Sharding in MongoDB presented by Shaun Verch

MongoDB's Approach to Sharding

Page 13: Basic Sharding in MongoDB presented by Shaun Verch

Partitioning

• User defines shard key

• Shard key defines range of data

• Key space is like points on a line

• Range is a segment of that line

Page 14: Basic Sharding in MongoDB presented by Shaun Verch

Shards and Shard Keys

Shard

Shard key range

Page 15: Basic Sharding in MongoDB presented by Shaun Verch

Data Distribution

• Initially 1 chunk

• Default max chunk size: 64mb

• MongoDB automatically splits & migrates chunks when max reached

Page 16: Basic Sharding in MongoDB presented by Shaun Verch

Routing and Balancing

• Queries routed to specific shards

• MongoDB balances cluster

• MongoDB migrates data to new nodes

Page 17: Basic Sharding in MongoDB presented by Shaun Verch

MongoDB Auto-Sharding

• Minimal effort required– Same interface as single mongod

• Two steps– Enable Sharding for a database– Shard collection within database

Page 18: Basic Sharding in MongoDB presented by Shaun Verch

Architecture

Page 19: Basic Sharding in MongoDB presented by Shaun Verch

What is a Shard?

• Shard is a node of the cluster

• Shard can be a single mongod or a replica set

Page 20: Basic Sharding in MongoDB presented by Shaun Verch

• Config Server– Stores cluster chunk ranges and locations– Can have only 1 or 3 (production must have

3)– Not a replica set

Meta Data Storage

Page 21: Basic Sharding in MongoDB presented by Shaun Verch

Routing and Managing Data

• Mongos– Acts as a router / balancer– No local data (persists to config database)– Can have 1 or many

Page 22: Basic Sharding in MongoDB presented by Shaun Verch

Sharding infrastructure

Page 23: Basic Sharding in MongoDB presented by Shaun Verch

Configuration

Page 24: Basic Sharding in MongoDB presented by Shaun Verch

Example Cluster

• Don’t use this setup in production!- Only one Config server (No Fault Tolerance)- Shard not in a replica set (Low Availability)- Only one mongos and shard (No Performance Improvement)- Useful for development or demonstrating configuration

mechanics

Page 25: Basic Sharding in MongoDB presented by Shaun Verch

Starting the Configuration Server

• mongod --configsvr• Starts a configuration server on the default port

(27019)

Page 26: Basic Sharding in MongoDB presented by Shaun Verch

Start the mongos Router

• mongos --configdb <hostname>:27019• For 3 configuration servers:

mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>

• This is always how to start a new mongos, even if the cluster is already running

Page 27: Basic Sharding in MongoDB presented by Shaun Verch

Start the shard database

• mongod --shardsvr• Starts a mongod with the default shard port (27018)• Shard is not yet connected to the rest of the cluster• Shard may have already been running in production

Page 28: Basic Sharding in MongoDB presented by Shaun Verch

Add the Shard

• On mongos: - sh.addShard(‘<host>:27018’)

• Adding a replica set: - sh.addShard(‘<rsname>/<seedlist>’)

Page 29: Basic Sharding in MongoDB presented by Shaun Verch

Verify that the shard was added

• db.runCommand({ listshards:1 }) { "shards" : [{"_id”: "shard0000”,"host”: ”<hostname>:27018” } ],

"ok" : 1 }

Page 30: Basic Sharding in MongoDB presented by Shaun Verch

Enabling Sharding

• Enable sharding on a database

sh.enableSharding(“<dbname>”)

• Shard a collection with the given key

sh.shardCollection(“<dbname>.people”,{“country”:1})

• Use a compound shard key to prevent duplicates

sh.shardCollection(“<dbname>.cars”,{“year”:1, ”uniqueid”:1})

Page 31: Basic Sharding in MongoDB presented by Shaun Verch

Tag Aware Sharding

• Tag aware sharding allows you to control the distribution of your data

• Tag a range of shard keys– sh.addTagRange(<collection>,<min>,<max>,<t

ag>)

• Tag a shard– sh.addShardTag(<shard>,<tag>)

Page 32: Basic Sharding in MongoDB presented by Shaun Verch

Mechanics

Page 33: Basic Sharding in MongoDB presented by Shaun Verch

Partitioning

• Remember it's based on ranges

Page 34: Basic Sharding in MongoDB presented by Shaun Verch

Chunk is a section of the entire range

Page 35: Basic Sharding in MongoDB presented by Shaun Verch

Chunk splitting

• A chunk is split once it exceeds the maximum size• There is no split point if all documents have the same

shard key• Chunk split is a logical operation (no data is moved)

Page 36: Basic Sharding in MongoDB presented by Shaun Verch

Balancing

• Balancer is running on mongos• Once the difference in chunks between the most

dense shard and the least dense shard is above the migration threshold, a balancing round starts

Page 37: Basic Sharding in MongoDB presented by Shaun Verch

Acquiring the Balancer Lock

• The balancer on mongos takes out a “balancer lock”• To see the status of these locks:

use configdb.locks.find({ _id: “balancer” })

Page 38: Basic Sharding in MongoDB presented by Shaun Verch

Moving the chunk

• The mongos sends a moveChunk command to source shard

• The source shard then notifies destination shard• Destination shard starts pulling documents from

source shard

Page 39: Basic Sharding in MongoDB presented by Shaun Verch

Committing Migration

• When complete, destination shard updates config server- Provides new locations of the chunks

Page 40: Basic Sharding in MongoDB presented by Shaun Verch

Cleanup

• Source shard deletes moved data- Must wait for open cursors to either close or time out- NoTimeout cursors may prevent the release of the lock

• The mongos releases the balancer lock after old chunks are deleted

Page 41: Basic Sharding in MongoDB presented by Shaun Verch

Routing Requests

Page 42: Basic Sharding in MongoDB presented by Shaun Verch

Cluster Request Routing

• Targeted Queries

• Scatter Gather Queries

• Scatter Gather Queries with Sort

Page 43: Basic Sharding in MongoDB presented by Shaun Verch

Cluster Request Routing: Targeted Query

Page 44: Basic Sharding in MongoDB presented by Shaun Verch

Routable request received

Page 45: Basic Sharding in MongoDB presented by Shaun Verch

Request routed to appropriate shard

Page 46: Basic Sharding in MongoDB presented by Shaun Verch

Shard returns results

Page 47: Basic Sharding in MongoDB presented by Shaun Verch

Mongos returns results to client

Page 48: Basic Sharding in MongoDB presented by Shaun Verch

Cluster Request Routing: Non-Targeted Query

Page 49: Basic Sharding in MongoDB presented by Shaun Verch

Non-Targeted Request Received

Page 50: Basic Sharding in MongoDB presented by Shaun Verch

Request sent to all shards

Page 51: Basic Sharding in MongoDB presented by Shaun Verch

Shards return results to mongos

Page 52: Basic Sharding in MongoDB presented by Shaun Verch

Mongos returns results to client

Page 53: Basic Sharding in MongoDB presented by Shaun Verch

Cluster Request Routing: Non-Targeted Query with Sort

Page 54: Basic Sharding in MongoDB presented by Shaun Verch

Non-Targeted request with sort received

Page 55: Basic Sharding in MongoDB presented by Shaun Verch

Request sent to all shards

Page 56: Basic Sharding in MongoDB presented by Shaun Verch

Query and sort performed locally

Page 57: Basic Sharding in MongoDB presented by Shaun Verch

Shards return results to mongos

Page 58: Basic Sharding in MongoDB presented by Shaun Verch

Mongos merges sorted results

Page 59: Basic Sharding in MongoDB presented by Shaun Verch

Mongos returns results to client

Page 60: Basic Sharding in MongoDB presented by Shaun Verch

Shard Key

Page 61: Basic Sharding in MongoDB presented by Shaun Verch

What is a Shard Key

• Shard key is used to partition your collection

• Shard key must exist in every document

• Shard key is immutable

• Shard key values are immutable

• Shard key must be indexed

• Shard key is used to route requests to shards

Page 62: Basic Sharding in MongoDB presented by Shaun Verch

Shard Key Considerations

• Cardinality

• Write Distribution

• Query Isolation

• Reliability

• Index Locality

Page 63: Basic Sharding in MongoDB presented by Shaun Verch

Conclusion

Page 64: Basic Sharding in MongoDB presented by Shaun Verch

Read/Write Throughput Exceeds I/O

Page 65: Basic Sharding in MongoDB presented by Shaun Verch

Working Set Exceeds Physical Memory

Working SetIndexes

Data

Working SetIndexes Data

Page 66: Basic Sharding in MongoDB presented by Shaun Verch

Sharding Enables Scale

• MongoDB’s Auto-Sharding– Easy to Configure– Consistent Interface– Free and Open Source

Page 67: Basic Sharding in MongoDB presented by Shaun Verch

• What’s next?– Advanced Sharding Features in MongoDB 2.4 -

Jeremy Mikola– MongoDB User Groups

• Resourceshttps://education.10gen.com/http://www.10gen.com/presentations

Page 68: Basic Sharding in MongoDB presented by Shaun Verch

Kernel Engineer, 10gen

Shaun Verch

#MongoDBDays

Thank You