Sharding with MongoDB -- MongoNYC 2012

Preview:

Citation preview

Sharding with MongoDB

Tyler Brocktyler@10gen.com@TylerBrock

Philosophy

Concepts

Architecture

Mechanics

Philosophy

Philosophy

MongoDB is a database for developers.

Build

Philosophy

BuildScale

Philosophy

How to Draw an Owl

Philosophy

How to Draw an Owl

Philosophy

> db.runCommand({enablesharding: "<dbname>" })

> db.runCommand({ shardcollection: "<namespace>", key: <shardkeypatternobject> })

Draw Two Circles

Philosophy

Concepts

datastore

app

Read/Write

Simple Web Application

What happens when your working set exceeds memory?

What happens if your write load is enormous?

datastore

app

Vertical Scaling

app

Vertical Scaling

datastore

app

Vertical Scaling

datastore

appapp

68 GB RamRaid10 EBS

datastore

app

Vertical Scaling

appapp

128 GB RamRaid10 SSD

app

datastoredatastoredatastore

Horizontal Scaling

60gb

app

datastoredatastore datastore

20gb 20gb 20gb

Horizontal Scaling

Routing Logic

app

datastoredatastore datastore

20gb 20gb 20gb

Horizontal Scaling

metadata

Routing Logic

app

datastoredatastore datastore

20gb 20gb

Horizontal Scaling

metadata

60gb

app

Routing Logic

Balancer

datastoredatastore datastore

20gb 20gb

Horizontal Scaling

metadata

60gb

app

Routing Logic

Balancer

datastoredatastore datastore

Horizontal Scaling

metadata

30gb 30gb 30gb

Architecture

Really is just a mongod (or replica set)Where your data lives

mongod

Shard

Mongod started with --configsvr optionMust have 3 (or 1 in development)Data is commited using 2 phase commit

config

Config Server

mongos

Acts just like shard router / proxyOne or as many as you wantLight weight -- can run on App serversCaches meta-data from config servers

mongos

Routing Logic

Balancingmetadata

datastore datastoredatastore

metadata

datastore

mongos

datastoredatastore

metadata

datastore

mongos

datastoredatastore

app

datastore

mongos

config

datastoredatastore

app

datastore

mongos

config

datastoredatastore

config

config

app

mongos

config

mongod mongodmongod

config

config

app

mongos

config

mongod mongodmongod

mongod mongodmongod

mongod mongodmongod

RS RS RS

config

config

app

mongos

config

mongod mongodmongod

mongod mongodmongod

mongod mongodmongod

RS RS RS

config

config

app

Mechanics

How does MongoDB balance my data?

{ name: “Joe”, email: “Joe@fake.com”,},{ name: “Bob”, email: “bob@fake.com”,},{ name: “Tyler”, email: “tyler@fake.com”,}

Keys

test.users

> db.runCommand({

})

{ name: “Joe”, email: “Joe@fake.com”,},{ name: “Bob”, email: “bob@fake.com”,},{ name: “Tyler”, email: “tyler@fake.com”,}

shardcollection: “test.users”,

Keys

key: { email: 1 }

test.users

{ name: “Joe”, email: “Joe@fake.com”,},{ name: “Bob”, email: “bob@fake.com”,},{ name: “Tyler”, email: “tyler@fake.com”,}

shardcollection: “test.users”,

Keys

key: { email: 1 }

test.users

{ name: “Joe”, email: “Joe@fake.com”,},{ name: “Bob”, email: “bob@fake.com”,},{ name: “Tyler”, email: “tyler@fake.com”,}

Keys

key: { email: 1 }

test.users

Chunks

-∞ +∞

Chunks

-∞ +∞

Split!This is a chunk

This is a chunk

joe@fake.com

moe@fake.com

tyler@fake.com

Splitting

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Splitting

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Splitting

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Splitting

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Splitting

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Split this big chunk into 2

chunks

Splitting

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Splitting

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

These chunks have split

Balancing

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Balancing

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Shard1, move a chunk to

Shard2

Balancing

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Balancing

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Shard1, move another chunk

to Shard3

Balancing

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Balancing

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Shard1, move another chunk

to Shard4

Balancing

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Balancing

config

config

config

mongos

Shard 1 Shard 2 Shard 3 Shard 4

How does MongoDB route my queries?

Routed Request

mongos

shard shard shard

Routed Request1

mongos

shard shard shard

1. Query arrives at Mongos

Routed Request1

2

mongos

shard shard shard

1. Query arrives at Mongos

2. Mongos routes query to a single shard

Routed Request1

2

3

mongos

shard shard shard

1. Query arrives at Mongos

2. Mongos routes query to a single shard

3. Shard returns results of query

Routed Request1

2

3

4

mongos

shard shard shard

1. Query arrives at Mongos

2. Mongos routes query to a single shard

3. Shard returns results of query

4. Results returned to client

Scatter Gather Request

shard shard shard

mongos

Scatter Gather Request1

1. Query arrives at Mongos

shard shard shard

mongos

Scatter Gather Request1

1. Query arrives at Mongos

2 22

shard shard shard

mongos2. Mongos broadcasts queryto all shards

Scatter Gather Request1

1. Query arrives at Mongos

2 22

3 33

shard shard shard

mongos2. Mongos broadcasts queryto all shards

3. Each shard returns resultsfor query

Scatter Gather Request1

41. Query arrives at Mongos

2 22

3 33

shard shard shard

mongos2. Mongos broadcasts queryto all shards

3. Each shard returns resultsfor query

4. Results combined andreturned to client

mongos

Distributed Merge Sort Req.

shard shard shard

mongos

Distributed Merge Sort Req.1

shard shard shard

1. Query arrives at Mongos

mongos

Distributed Merge Sort Req.1

22 2

shard shard shard

1. Query arrives at Mongos

2. Mongos broadcasts query to all shards

mongos

Distributed Merge Sort Req.1

22 2

shard shard shard3 3 3

1. Query arrives at Mongos

2. Mongos broadcasts query to all shards

3. Each shard locally sorts results

mongos

Distributed Merge Sort Req.1

22 2

4 44

shard shard shard3 3 3

1. Query arrives at Mongos

2. Mongos broadcasts query to all shards

3. Each shard locally sorts results

4. Results returned to mongos

mongos

Distributed Merge Sort Req.1

5

22 2

4 44

shard shard shard3 3 3

1. Query arrives at Mongos

2. Mongos broadcasts query to all shards

3. Each shard locally sorts results

4. Results returned to mongos

5. Mongos merges sorted results

mongos

Distributed Merge Sort Req.1

6

5

22 2

4 44

shard shard shard3 3 3

1. Query arrives at Mongos

2. Mongos broadcasts query to all shards

3. Each shard locally sorts results

4. Results returned to mongos

5. Mongos merges sorted results

6. Combined results returned to client

Queries

By Shard Key Routed db.users.find({email: “bob@10gen.com”})

Sorted by shard key

Routed in order db.users.find().sort({email:-1})

Find by non shard key

Scatter Gather db.users.find({state:”NY”})

Sorted by non shard key

Distributed merge sort

db.users.find().sort({state:1})

Writes

Inserts Requires shard key db.users.insert({ name: “Bob”, email: “Bob@fake.com”})

Removes Routed db.users.delete({ email: “bob@fake.com”})

Removes

Scattered db.users.delete({name: “Bob”})

Updates Routed db.users.update( {email: “bob@10gen.com”}, {$set: { state: “NY”}})

Updates

Scattered db.users.update( {state: “CA”}, {$set:{ state: “NY”}} )

How do I choose my shard key?

Choose a field that is common to your queries.

Rule of Thumb

Write Scaling

Writes should be distributed.

{ node: "ny153.example.com", application: "apache", time: "2011-01-02T21:21:56Z", level: "ERROR", msg: "something is broken"}

Bad { time : 1 }

Writes should be distributed

{ node: "ny153.example.com", application: "apache", time: "2011-01-02T21:21:56Z", level: "ERROR", msg: "something is broken"}

Bad { time : 1 }

Better {node:1, application:1, time:1}

Writes should be distributed

Query Isolation & Data Locality

Queries should be routed to one shard.

Bad {msg: 1, node: 1}

{ node: "ny153.example.com", application: "apache", time: "2011-01-02T21:21:56Z", level: "ERROR", msg: "something is broken”}

Queries should be routed to one shard

Better {node: 1, time: 1}

Bad {msg: 1, node: 1}

{ node: "ny153.example.com", application: "apache", time: "2011-01-02T21:21:56Z", level: "ERROR", msg: "something is broken”}

Queries should be routed to one shard

Cardinality

Chunks should be able to split.

Bad {node: 1}

{ node: "ny153.example.com", application: "apache", time: "2011-01-02T21:21:56Z", level: "ERROR", msg: "something is broken"}

Chunks should be able to split

Better {node:1, time:1}

Bad {node: 1}

{ node: "ny153.example.com", application: "apache", time: "2011-01-02T21:21:56Z", level: "ERROR", msg: "something is broken"}

Chunks should be able to split

Configuration

mongodmongodmongod

Bring up mongods or Replica Sets

mongod mongodmongod

mongod mongodmongod

RS RS RS

mongod --shardsvrmongod --replSet --shardsvr

config

mongodmongodmongod

mongod mongodmongod

mongod mongodmongod

RS RS RS

Bring up Config Servers

config

config

mongod --configsvr

config

mongodmongodmongod

mongod mongodmongod

mongod mongodmongod

RS RS RS

Bring up Mongos

config

config

mongos

mongos --configdb <list of configdb uris>

> use admin> db.runCommand({"addShard": <shard uri>})

Connect to Mongos+ Add Shards

Enable Sharding

> db.runCommand( { enablesharding : "<dbname>" } );

> db.runCommand( { shardcollection : "<namespace>", key : <key> });

Shard a Collection

Recommended