Introduction to MongoDB sharding

Preview:

Citation preview

Introduction toMongoDB sharding

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

About me

• Product engineer at ServerDensity• Working with mongoDB in production for more than 4 years• Python and php programmer• Pybcn co-organizer• FOSDEM volunteer

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

What is sharding?

It’s the system MongoDB uses to:• Distribute writes• Distribute primary reads• Distribute data• Or, in other words, grow horizontally and scale

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

What does it look like?

• Like this:

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

What does it look like?

• Or like this:

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Nomenclature:

• Shard:• Logical data partition• Each shard is handled by a server or replica set

• Shard key:• Key that all documents MUST have• Decided by the user

• Chunk:• Logical data partition inside a shard• They be split into 2 smaller chunks• They can be moved to another shard for balancing

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

What does each component do?

• Mongos processes route data• Config servers hold metadata:• What chunks are there• What shard holds each chunk• Which chunks are being migrated

• The shard servers hold the actual data

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

How does it work?

Whenever you read/write data this happens:1. You run your query in your shell/driver 2. Your driver contacts the mongos process (a proxy)3. The mongos process retrieves metadata from the config servers4. Based on the metadata, asks the shards affected by the query to run

their part of the job5. Mongos returns the result

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Data partitioning

Your data will be split in chunks based on your shard key:

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Choosing a good shard key

In order to get a good shard key it has to:• Be used in ALL queries• Allow a huge amount of possible values:• Sha1 hash -> good• Phone number -> not bad• Zip code -> bad• Boolean -> awful

• Have values evenly distributed across all the key space

If your shard key has a big cardinality, but it’s not evenly distributed across the key space: use a hashed shard key

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Chunk partitioning

Whenever a chunk reaches certain size, the mongos process will try to split it into two:

This will fail if all docs in this chunk belong to the same shard key value

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Balancing

• Inevitably, some shards will get more chunks than others• The sharded cluster will automatically move chunks from crowded

shards to under-populated shards:

• It’s possible to start/stop and customize the balancing algorithm• It’s possible to manually move chunks around

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

HA in a sharded cluster

In order to achieve HA in a sharded cluster you’ll need:• 3 config servers:• As long as 1 is up you’ll be able to read/write into the collection• If a config server is down the metadata collection will be read-

only, so you won’t be able to:• Split chunks• Balance the cluster• Add shards

• N shards; each one with, at least:• 2 data bearing-nodes• An arbiter or another data-bearing node

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Demo time!

Creating a new demo sharded cluster:

sudo service mongod stopmkdir shard0mkdir shard1mkdir config

# Start the config servermongod --fork --syslog --configsvr --dbpath config --port 27019

# Start the shard serversmongod --fork --syslog --dbpath shard0 --port 30000mongod --fork --syslog --dbpath shard1 --port 30001

# Start the mongos processmongos --fork --syslog --configdb localhost:27019

# Add shardsmongo initSharding.js

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Demo time!

Creating a new demo sharded cluster://Creating shardssh.addShard("localhost:30000");sh.addShard("localhost:30001");

//Adding test datafor (i = 0; i < 10000; i++) { db.testdata.insert({"i": i})}//Creating indexdb.testdata.createIndex({"i": 1});//Enabling shardingsh.enableSharding("test")sh.shardCollection("test.testdata", {i:1})//Manually splitting chunksfor(i = 1; i < 20; i++) { sh.splitAt("test.testdata", {"i": i*500})}//Statusprint(sh.status(true));

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Questions?

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

We’re hiring!

We’re looking for awesome engineers!

Talk to me after the presentation or go to:https://www.serverdensity.com/jobs/

Jordi Soucheiron - @jordixouBarcelona MongoDB User Group – 2015-06-29

Code

https://github.com/jsoucheiron/mongodb-barcelona-sharding-introduction

Slides

http://www.slideshare.net/jordixou (soon)

Recommended