MongoDB Auto-Sharding at Mongo Seattle

MongoDBauto shardingAaron [email protected] SeattleJuly 27, 2010

MongoDB v1.6out next week!

Why Scale Horizontally?

Vertical scaling is expensive

Horizontal scaling is more incremental – works well in the cloud

Will always be able to scale wider than higher

Distribution Models

Ad-hoc partitioning

Consistent hashing (dynamo)

Range based partitioning (BigTable/PNUTS)

Auto Sharding

Each piece of data is exclusively controlled by a single node (shard)

Each node (shard) has exclusive control over a well defined subset of the data

Database operations run against one shard when possible, multiple when necessary

As system load changes, assignment of data to shards is rebalanced automatically

Mongo Sharding

Basic goal is to make this

client

?

mongod mongod mongod

Mongo Sharding

Look like this

client

mongod

Mongo Sharding

Mapping of documents to shards controlled by shard key

Can convert from single master to sharded cluster with 0 downtime

Most functionality of a single Mongo master is preserved

Fully consistent

Shard Key

Collection

Min Max location

users {name:’Miller’} {name:’Nessman’}

shard 2

users {name:’Nessman’}

{name:’Ogden’} Shard 4

…

Shard Key Examples

{ user_id: 1 }

{ state: 1 }

{ lastname: 1, firstname: 1 }

{ tag: 1, timestamp: -1 }

{ _id: 1 } This is the default Careful when using ObjectId

Architecture Overview

client

mongos

mongod mongod mongod

-inf <= user_id < 20002000 <= user_id < 55005500 <= user_id < +inf

Query I

Shard key { user_id: 1 }

db.users.find( { user_id: 5000 } )

Query appropriate shard

Query II


db.users.find( { user_id: { $gt: 4000, $lt: 6000 } } )

Query appropriate shard(s)

{ a : …, b : …, c : … } a is declared shard key

find( { a : { $gt : 333, $lt : 400 } )

Range Shard

a in [-∞,2000) 2

a in [2000,2100) 8

a in [2100,5500) 3

… …

a in [88700, ∞) 0


find( { a : { $gt : 333, $lt : 2012 } )

Range Shard

a in [-∞,2000) 2

a in [2000,2100) 8

a in [2100,5500) 3

… …

a in [88700, ∞) 0

Query III


db.users.find( { hometown: ‘Seattle’ } )

Query all shards


find( { a : { $gt : 333, $lt : 2012 } )

Range Shard

a in [-∞,2000) 2

a in [2000,2100) 8

a in [2100,5500) 3

… …

a in [88700, ∞) 0

{ a : …, b : …, c : … } secondary query, secondary index

ensureIndex({b:1})find( { b : 99 } )

This case good when m is small (such as here), also when the queries are large tasks

Range Shard

a in [-∞,2000) 2

a in [2000,2100) 8

a in [2100,5500) 3

… …

a in [88700, ∞) 0

Query IV


db.users.find( { hometown: ‘Seattle’ } ).sort( { user_id: 1 } )

Query all shards, in sequence

Query V


db.users.find( { hometown: ‘Seattle’ } ).sort( { lastname: 1 } )

Query all shards in parallel, perform merge sort

Secondary index in { lastname: 1 } can be used

Map/Reduce

Map/Reduce was designed for distributed systems

Map/Reduce jobs will run on all relevant shards in parallel, subject to query spec

Map/Reduce

> m = function() { emit(this.user_id, 1); }

> r = function(k,vals) { return 1; }

> res = db.events.mapReduce(m, r, { query : {type:'sale'} });

> db[res.result].find().limit(2)

{ "_id" : 8321073716060 , "value" : 1 }

{ "_id" : 7921232311289 , "value" : 1 }

Writes

Inserts routed to appropriate shard (inserted doc must contain shard key)

Removes call upon matching shards

Updates call upon matching shards

Writes parallel if asynchronous, sequential if synchronous

Updates cannot modify the shard key

Choice of Shard Key

Key per document that is generally a component of your queries Often you want a unique key If not, consider granularity of key and potentially

add fields

The shard key is generally comprised of fields you would put in an index if operating on a single machine But in a sharded configuration, the shard key will

be indexed automatically

Shard Key Examples (again)

{ user_id: 1 }

{ state: 1 }

{ lastname: 1, firstname: 1 }

{ tag: 1, timestamp: -1 }

{ _id: 1 } This is the default Careful when using ObjectId

Bit.ly Example

~50M users

~10K concurrently using server at peak

~12.5B shortens per month (1K/sec peak)

History of all shortens per user stored in mongo

Bit.ly Example - Schema

{ "_id" : "lthp", "domain" : null, "keyword" : null, "title" : "bit.ly, a simple url shortener", "url" : "http://jay.bit.ly/", "labels" : [{"text" : "lthp",

"name" : "user_hash"},{"text" : "BUFx",

"name" : "global_hash"},{"text" : "http://jay.bit.ly/",

"name" : "url"},{"text" : "bit.ly, a simple url shortener",

"name" : "title"}], "ts" : 1229375771, "global_hash" : "BUFx", "user" :

"jay", "media_type" : null }

Bit.ly Example

Shard key { user: 1 }

Indices { _id: 1 } { user: 1 } { global_hash: 1 } { user: 1, ts: -1 }

Query db.history.find( {

user : user,

labels.text : …

} ).sort( { ts: -1 } )

Balancing The whole point of auto sharding is that

mongo balances the shards for you

The balancing algorithm is complicated, basic idea is that this

1000 <= user_id < +inf-inf <= user_id < 1000

Balancing

Becomes this

3000 <= user_id < +inf-inf <= user_id < 3000

Balancing

Current balancing metric is data size Future possibilities – cpu, disk utilization

There is some flexibility built into the partitioning algorithm – so we aren’t thrashing data back and forth between shards

Only move one ‘chunk’ of data at a time – a conservative choice that limits total overhead of balancing

System Architecture

Shard

Regular mongod process(es), storing all documents for a given key range

Handles all reads/writes for this key range as well Each shard indexes the data contained within it

Can be single mongod, master/slave, or replica set

Shard - Chunk

In a sharded cluster, shards partitioned by shard key

Within a shard, chunks partitioned by shard key

A chunk is the smallest unit of data for balancing Data moves between chunks at chunk

granularity

Upper limit on chunk size is 200MB

Special case if shard key range is open ended

Shard - Replica Sets

Replica sets provide data redundancy and auto failover

In the case of sharding, this means redundancy and failover per shard

All typical replica set operations are possible For example, write with w=N Replica sets were specifically designed to work

as shards

System Architecture

Mongos

Sharding router – distributes reads/writes to sharded cluster

Client interface is the same as a mongod

Can have as many mongos instances as you want

Can run on app server machine to avoid extra network traffic

Mongos also initiates balancing operations

Keeps metadata per chunk in RAM – 1MB RAM per 1TB of user data in cluster

System Architecture

Config Server

3 Config servers

Changes are made with a 2 phase commit

If any of the 3 servers goes down, config data becomes read only

Sharded cluster will remain online as long as 1 of the config servers is running

Config metadata size estimate 1MB metadata per 1TB data in cluster

Shared Machines

Bit.ly Architecture

Limitations

Unique index constraints not expressed by shard key are not enforced across shards

Updates to a document’s shard key aren’t allowed (you can remove and reinsert, but it’s not atomic)

Balancing metric is limited to # of chunks right now – but this will be enhanced

Right now only one chunk moves in the cluster at a time – this means balancing can be slow, it’s a conservative choice we’ve made to keep the overhead of balancing low for now

20 petabyte size limit

Start Up Config Servers

$ mkdir -p ~/dbs/config

$ ./mongod --dbpath ~/dbs/config --port 20000

Repeat as necessary

Start Up Mongos

$ ./mongos --port 30000 --configdb localhost:20000

No dbpath

Repeat as necessary

Start Up Shards

$ mkdir -p ~/dbs/shard1

$ ./mongod --dbpath ~/dbs/shard1 --port 10000

Repeat as necessary

Configure Shards

$ ./mongo localhost:30000/admin

> db.runCommand({addshard : "localhost:10000", allowLocal : true})

{

"added" : "localhost:10000",

"ok" : true

}

Repeat as necessary

Shard Data

> db.runCommand({"enablesharding" : "foo"})

> db.runCommand({"shardcollection" : "foo.bar", "key" : {"_id" : 1}})

Repeat as necessary

Production Configuration

Multiple config servers

Multiple mongos servers

Replica sets for each shard

Use getLastError/w correctly

Looking at config data

Mongo shell connected to config server, config database

> db.shards.find()

{ "_id" : "shard0", "host" : "localhost:10000" }

{ "_id" : "shard1", "host" : "localhost:10001" }


> db.databases.find()

{ "_id" : "admin", "partitioned" : false, "primary" : "config" }

{ "_id" : "foo", "partitioned" : false, "primary" : "shard1" }

{ "_id" : "x", "partitioned" : false, "primary" : "shard0” }

{"_id" : "test", "partitioned" : true, "primary" : "shard0", "sharded" :

{

"test.foo" : { "key" : {"x" : 1}, "unique" : false }

}

}


> db.chunks.find()

{"_id" : "test.foo-x_MinKey",

"lastmod" : { "t" : 1276636243000, "i" : 1 },

"ns" : "test.foo",

"min" : {"x" : { $minKey : 1 } },

"max" : {"x" : { $maxKey : 1 } },

"shard" : "shard0”

}


> db.printShardingStatus() --- Sharding Status ---sharding version: { "_id" : 1, "version" : 3 } shards:

{ "_id" : "shard0", "host" : "localhost:10000" }{ "_id" : "shard1", "host" : "localhost:10001" }

databases:{ "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "foo", "partitioned" : false, "primary" : "shard1" } { "_id" : "x", "partitioned" : false, "primary" : "shard0" } { "_id" : "test", "partitioned" : true, "primary" : "shard0","sharded" : { "test.foo" : { "key" : { "x" : 1 }, "unique" : false } } }

test.foo chunks:{ "x" : { $minKey : 1 } } -->> { "x" : { $maxKey : 1 } } on : shard0 { "t" : 1276636243 …

Give it a Try!

Download from mongodb.org

Sharding production ready in 1.6, which is scheduled for release next week

For now use 1.5 (unstable) to try sharding

Technology

MongoDB Auto-Sharding at Mongo Seattle