67
Introduction To Sharding J. Randall Hunt Hackathoner, MongoDB @jrhunt, [email protected] #MongoDBDays Chicago

Sharding in MongoDB Days 2013

Embed Size (px)

DESCRIPTION

Sharding presentation used at the MongoDB Days 2013 conferences in North America: Seattle, Chicago,

Citation preview

Page 1: Sharding in MongoDB Days 2013

Introduction To ShardingJ. Randall HuntHackathoner, MongoDB@jrhunt, [email protected]

#MongoDBDays Chicago

Page 2: Sharding in MongoDB Days 2013

In Today's Talk

• What? Why? When?

• How?

• What's happening beind the scenes?

Page 3: Sharding in MongoDB Days 2013

What Is Sharding?

Page 4: Sharding in MongoDB Days 2013

This is a picture of my cat.

Page 5: Sharding in MongoDB Days 2013

This is a picture of ~100 cats.

http://a1.s6img.com/cdn/0011/p/3123272_8220815_lz.jpg

Page 6: Sharding in MongoDB Days 2013

This is a cat trying to find a home

webserver mongod

Page 7: Sharding in MongoDB Days 2013

100 cats trying to find a home.

webserver mongod

(not to scale)

Page 8: Sharding in MongoDB Days 2013

Scale Up?

Page 9: Sharding in MongoDB Days 2013
Page 10: Sharding in MongoDB Days 2013

Data Store Scalability

• Custom Hardware

• Custom Software

In the past you've had two options for achieving data store scalability: 1) custom hardware (oracle?) 2) custom software (google, facebook) !The reason these things were custom were that these problems were not yet common enough. The number of people on the internet 10 years ago is incredibly small compared to the number of people using web services 10 years from now.

Page 11: Sharding in MongoDB Days 2013

Scale Out?

Page 12: Sharding in MongoDB Days 2013

Scale Out?

Page 13: Sharding in MongoDB Days 2013

The MongoDB Sharding Solution

• Automatically partition your data

• Worry about failover at the partition layer

• Application independent

• Free and open source

Page 14: Sharding in MongoDB Days 2013

Why Do I Shard?

Page 15: Sharding in MongoDB Days 2013

Input/Output

You input/output exceeds the capacity of a single node or replica set.

this is not easy to do!

Page 16: Sharding in MongoDB Days 2013

Working Set Exceeds Physical Memory

RAM

Page 17: Sharding in MongoDB Days 2013

Working Set Exceeds Physical Memory

RAMData

Page 18: Sharding in MongoDB Days 2013

Working Set Exceeds Physical Memory

RAMData Indexes

Page 19: Sharding in MongoDB Days 2013

Working Set Exceeds Physical Memory

RAMData Indexes Sorts

Page 20: Sharding in MongoDB Days 2013

Working Set Exceeds Physical Memory

RAMData Indexes Sorts Aggregations

Page 21: Sharding in MongoDB Days 2013

Working Set Exceeds Physical Memory

RAMData Indexes Sorts Aggregations

Page 22: Sharding in MongoDB Days 2013

Working Set Exceeds Physical Memory

Page 23: Sharding in MongoDB Days 2013

How Does Sharding Work?

Page 24: Sharding in MongoDB Days 2013

MongoDB's Sharding Infrastructure

Page 25: Sharding in MongoDB Days 2013

mongod

MongoDB's Sharding Infrastructureapp server

Page 26: Sharding in MongoDB Days 2013

mongodmongodmongod

MongoDB's Sharding Infrastructureapp server

Page 27: Sharding in MongoDB Days 2013

shard

MongoDB's Sharding Infrastructureapp server

Page 28: Sharding in MongoDB Days 2013

shard

MongoDB's Sharding Infrastructureapp server

Page 29: Sharding in MongoDB Days 2013

shard

MongoDB's Sharding Infrastructureapp server

mongos

Page 30: Sharding in MongoDB Days 2013

shard

MongoDB's Sharding Infrastructureapp server

mongos

mongod --configsvr

Page 31: Sharding in MongoDB Days 2013

shard

MongoDB's Sharding Infrastructureapp server

mongos

mongod --configsvr

Page 32: Sharding in MongoDB Days 2013

Terminology

• Shards

• Chunks

• Config Servers

• mongos

A shard is a server, or a collection of servers, that holds chunks of info which are split up according to a shard key, a shard holds a subset of a collection's data A chunk of info is a group of data falling in a particular range based on a shard key that can be moved logically from server to server config serves hold information about where chunks live mongos is the router and balancer -- it communicates with the config servers and figures out how to intelligently direct your query.

Page 33: Sharding in MongoDB Days 2013

What exactly is a shard?

• Shard is a node of the cluster

• Can be a single mongod or an entire replica set

Shard

Primary

Secondary

Secondary

Shard

orMongod

Now what do shards hold? Chunks, which are partitions of your data that live in certain ranges.

Page 34: Sharding in MongoDB Days 2013

Partitioning

• User defines a shard key or uses hash based sharding

• Shard key defines a range of data

• The key space is like points on a line

• A range is a segment of that line

-∞ +∞Key Space

Remember interval notation?

Page 35: Sharding in MongoDB Days 2013

Data Distribution

Initially a single chunk

Default Max Chunk Size: 64mb

MongoDB will automatically split and migrate chunks as they reach the max size

Node 1SecondaryConfigServer Shard 1

MongosMongos Mongos

Shard 2

Mongod

Page 36: Sharding in MongoDB Days 2013

Shards and Shard Keys

Page 37: Sharding in MongoDB Days 2013

Shards and Shard Keys

Chunks!

Page 38: Sharding in MongoDB Days 2013

Shards and Shard Keys

Chunks!

Shard Keys!

Page 39: Sharding in MongoDB Days 2013

What is a config server?

• A config server is for storing shard meta-data

• It stores chunk ranges and locations

• Run with 3 in production!

orNode 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

this is not a replica set, the three servers are purely for failover purposes. !pro-tip use CNAMEs to identify these.

Page 40: Sharding in MongoDB Days 2013

What is a mongos?

• Acts as a router / balancer for queries and ops

• No local data (persists all info to the config servers)

• Can run with just one or many

App Server

Mongos Mongos

App Server App Server App Server

Mongos

or

Page 41: Sharding in MongoDB Days 2013

MongoDB's Sharding Infrastructure

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Page 42: Sharding in MongoDB Days 2013

Get Started With Sharding?

1. Choose a shard key (we'll talk about this later)

2. Start config servers

3. Turn on sharding

4. Profit.

Page 43: Sharding in MongoDB Days 2013

Mechanics of ShardingOh hey there devops!

Page 44: Sharding in MongoDB Days 2013

Start the Configuration Server

mongod --configsvr

Starts a configuration server on the default port (27019)

Node 1SecondaryConfigServer

Page 45: Sharding in MongoDB Days 2013

Start the mongos router

mongos --configdb catconf.mongodb.com:27019

Node 1SecondaryConfigServer

Mongos

Page 46: Sharding in MongoDB Days 2013

Start the mongod

mongod --shardsvr

Starts a mongod with the default shard port (27018) Shard is not yet connected to the rest of the cluster Could have already been a part of the cluster

Node 1SecondaryConfigServer

Mongos

Mongod

Shard

Page 47: Sharding in MongoDB Days 2013

Add the Shard

On mongos:

sh.addShard('cat1.mongodb.com:27018')

For a replica set:

sh.addShard('<rsname>/<seedlist>')

Node 1SecondaryConfigServer

Mongos

Mongod

Shard

Page 48: Sharding in MongoDB Days 2013

Check that everything is working!

[mongos] admin> db.runCommand({ listshards: 1 }) { "shards": [ { "_id": "shard0000", "host": "cat1.mongodb.com:27018" } ], "ok": 1 }

Node 1SecondaryConfigServer

Mongos

Mongod

Shard

Page 49: Sharding in MongoDB Days 2013

Now enable sharding

• Enable Sharding on a database sh.enableSharding("<dbname>")

• Shard a collection (with a key): sh.shardCollection( "<dbname>.cat", {"name": 1})

• Use a compound shard key to prevent duplicates sh.shardCollection( "<dbname>.cats", {"name": 1, "uniqueid": 1})

Page 50: Sharding in MongoDB Days 2013

Tag Aware Sharding

• Total control over the distribution of your data!

• Tag a range of shard keys: sh.addTagRange(<collection>,<min>,<max>,<tag>)

• Tag a shard: sh.addShardTag("shard0000","NYC")

Page 51: Sharding in MongoDB Days 2013

The Balancer

• Ensures even distribution of chunks across the cluster

• Transparent to driver and application

• Very tuneable but defaults are often sensible

try to minimize clock skew with ntpd

Page 52: Sharding in MongoDB Days 2013

Routing Requests(Oh hi there application developers!)

Page 53: Sharding in MongoDB Days 2013

Cluster Request Routing

Scatter Gather Targeted

Choose your own adventure!

Page 54: Sharding in MongoDB Days 2013

Targeted Query

Shard Shard Shard

Mongos

Page 55: Sharding in MongoDB Days 2013

Routable request received

Shard Shard Shard

Mongos

1

Page 56: Sharding in MongoDB Days 2013

Request routed to appropriate shard

Shard Shard Shard

Mongos

1

2

Page 57: Sharding in MongoDB Days 2013

Shard returns results

Shard Shard Shard

Mongos

1

2

3

Page 58: Sharding in MongoDB Days 2013

mongos returns results to client

Shard Shard Shard

Mongos

1

2

3

4

Page 59: Sharding in MongoDB Days 2013

Non-targeted queries

Shard Shard Shard

Mongos

Page 60: Sharding in MongoDB Days 2013

request received

Shard Shard Shard

Mongos

1

Page 61: Sharding in MongoDB Days 2013

Farm request out to all shards

Shard Shard Shard

Mongos

1

2 22

Page 62: Sharding in MongoDB Days 2013

shards return results to mongos

Shard Shard Shard

Mongos

1

2 2 2

3 33

Page 63: Sharding in MongoDB Days 2013

mongos returns results to client

Shard Shard Shard

Mongos

1

2 2 2

3 33

4

Page 64: Sharding in MongoDB Days 2013

Choosing A Shard Key

Page 65: Sharding in MongoDB Days 2013

Things to remember!

• Shard Key is immutable

• Shard key values are immutable

• Shard key must be indexed

• It is limited to 512 bytes in size

• Try to choose a field used in queries

• Only the shard key can be guaranteed unique across shards

should not be monotonically increasing!

Page 66: Sharding in MongoDB Days 2013

How to choose your key?

• Cardinality

• Write Distribution

• Query Isolation

• Reliability

• Index Locality

Cardinality – Can your data be broken down enough? Query Isolation - query targeting to a specific shard Reliability – shard outages!A good shard key can: Optimize routing Minimize (unnecessary) traffic Allow best scaling !consider pre splitting no unique indexes keys unless part of the shard key !geokeys cannot be part of a shardkey $near won't work but the $geo commands work fine

Page 67: Sharding in MongoDB Days 2013

Thanks!

• What's Next?

• Resources:https://education.mongodb.com/https://www.mongodb.com/presentations

• Me:@jrhunt, [email protected]

In summary -- and this is not a sales pitch... lots of other databases out there have sharding and replication... not many of them provide the granularity of control that you need for your applications while maintaining sensible defaults.