Shortcuts around the mistakes I've made scaling MongoDB

SHORTCUTSAROUND THEMISTAKES I’VEMADE SCALING

MONGODB

Theo, Chief Architect atonsdag 21 september 11

What we doWe want to revolutionize the digital advertising industry by showing that there is more to ad analytics than click through rates.

onsdag 21 september 11

Assembling sessionsexposure

pingping

ping ping

session➔ ➔

Crunching

session

sessionsession

session session

session

➔ ➔ 42

Reports

What we doTrack ads, make pretty reports.

That doesn’t sound so hard

That doesn’t sound so hardWe don’t know when sessions end

That doesn’t sound so hardWe don’t know when sessions endThere’s a lot of data

That doesn’t sound so hardWe don’t know when sessions endThere’s a lot of dataIt’s all done in (close to) real time

Numbers

Numbers40 Gb data

Numbers40 Gb data50 million documents

Numbers40 Gb data50 million documentsper day

How we use MongoDB

How we use MongoDB“Virtual memory” to offload data while we wait for sessions to finish

How we use MongoDB“Virtual memory” to offload data while we wait for sessions to finishShort time storage (<48 hours) for batch jobs

How we use MongoDB“Virtual memory” to offload data while we wait for sessions to finishShort time storage (<48 hours) for batch jobsMetrics storage

Why we use MongoDB

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideas

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writes

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writesSecondary indexes and rich query language are great features (for the metrics store)

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writesSecondary indexes and rich query language are great features (for the metrics store)It’s just… nice

Btw.We use JRuby, it’s awesome

A story in 7 iterations

secondary indexes and updates1st iteration

One document per session, update as new data comes alongOutcome: 1000% write lock

#1Everything is aboutworking around the

GLOBALWRITELOCK

MongoDB 2.0.0

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)

MongoDB 1.8.1

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)

using scans for two step assembling2nd iteration

Instead of updating, save each fragment, then scan over _id to assemble sessions

using scans for two step assembling2nd iteration

Outcome: not as much lock, but still not great performance. We also realised we couldn’t remove data fast enough

GLOBALWRITELOCK

#3Give a lot of

thought to your

PRIMARYKEY

partitioning3rd iteration

We came up with the idea of partitioning the data by writing to a new collection every hour

We came up with the idea of partitioning the data by writing to a new collection every hourOutcome: lots of complicated code, lots of bugs, but we didn’t have to care about removing data

#4Make sure you can

REMOVE OLD DATA

sharding4th iteration

To get around the global write lock and get higher write performance we moved to a sharded cluster.

To get around the global write lock and get higher write performance we moved to a sharded cluster.Outcome: higher write performance, lots of problems, lots of ops time spent debugging

GLOBALWRITELOCK

#6SHARDINGIS NOT A

SILVER BULLETand it’s buggy,

if you can, avoid it

#7IT WILL FAIL

design for it

moving things to separate clusters5th iteration

We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.

We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.Outcome: a more balanced and stable cluster

GLOBALWRITELOCK

#9ONE DATABASEwith one usage pattern

PER CLUSTER

#10MONITOR

EVERYTHINGlook at your health

graphs daily

monster machines6th iteration

We got new problems removing data and needed some room to breathe and think

We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to High-Memory Quadruple Extra Large (with cheese).

♥Ionsdag 21 september 11

#11Don’t try to scale up

SCALE OUT

#12When you’re out of ideas

CALL THE EXPERTS

partitioning (again) and pre-chunking7th iteration

We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.

We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.Outcome: no more problems removing data.

#13Smaller objects means a smaller database, and a smaller database means

LESS RAM NEEDED

#14Give a lot of

thought to your

PRIMARYKEY

GLOBALWRITELOCK

KTHXBAI

@iconaraarchitecturalatrocities.com

burtcorp.com

Since we got time…

Safe modeTips

Run every Nth insert in safe mode

Safe modeTips

Run every Nth insert in safe modeThis will give you warnings when bad things happen; like failovers

Avoid bulk insertsTips

Very dangerous if there’s a possibility of duplicate key errors

EC2Tips

You have three copies of your data, do you really need EBS?

EC2Tips

You have three copies of your data, do you really need EBS?Instance store disks are included in the price and they have predictable performance.

EC2Tips

You have three copies of your data, do you really need EBS?Instance store disks are included in the price and they have predictable performance.m1.xlarge comes with 1.7 TB of storage.

Shortcuts around the mistakes I've made scaling MongoDB

Technology

MongoDB and using MongoDB with .NET

MongoDB Europe 2016 - MongoDB 3.4 preview and introduction to MongoDB Atlas

Key press shortcuts Table of Contents Key press shortcuts ...€¦ · Key press shortcuts Table of Contents Shortcuts Key press shortcuts Applications 3D Cycling Classic MultiRider

CHAO ZHANG MONGODB WITH DLVHEX PLUGINS. THE MONGODB

MongoDB - derickrethans.nlderickrethans.nl/talks/mongo-bbmw.pdf10gen, the company behind MongoDB 10gen began the MongoDB project Development, support, and services for MongoDB 100

tallyukegroup.files.wordpress.com · rain drops, moun - tain, D7 peace joy love pain tears strength I've I've I've I've I've I've like like like like like like my my my my my my got

MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB

MongoDB Backups and Disaster Recovery - Austin MongoDB Meetup

MongoDB Atlas - On Tour!: Introduction to MongoDB

Realtime Analytics with MongoDB - MongoDB Meetup NYC

MongoDB: What, why, when. Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb

Шардинг в MongoDB, Henrik Ingo (MongoDB)

MongoDB Profiler Deep Dive; MongoDB Austin 2013

System Keyboard Shortcuts · 2020. 1. 9. · System Keyboard Shortcuts for the ActiveX Viewer Keyboard Shortcuts Legend Enable/Disable Keyboard Shortcuts *Native and Long Text Modes

MongoDB IoT City Tour EINDHOVEN: Sharding in MongoDB

MongoDB Evenings Minneapolis: Medtronic's MongoDB Journey

Keyboard Shortcuts - Enterprise Architect · shortcuts ·There are additional shortcuts using the keyboard and mouse in combination ·If necessary, you can change the keyboard shortcuts

MongoDB Days UK: MongoDB and Spark

Excel - Shortcuts Bible · Excel Shortcuts Bible © eforexcel.com EXCEL Shortcuts Bible Excel 2013 / 2016

MongoDB Days Silicon Valley: Introducing MongoDB 3.2