Advanced Administration, Monitoring and Backup

Preview:

Citation preview

Advanced AdministrationMonitoring and Backup at Scale

Dr. Jeffrey BergerLead Database Engineer - Sailthru

Scale The Universe!

Scale The Universe!

Flat FRW Metric for isotropic cosmological geometry

Scale The Universe!

Flat FRW Metric for isotropic cosmological geometry

Scale Factor

Scale The Universe!

Flat FRW Metric for isotropic cosmological geometry

Scale Factor

Related to the hubble constant for an expanding universe, this does a great job of actually scaling our universe.

In fact the rate of expansion is continuing to grow and accelerate!

Scale The Universe!

Uhh maybe just the galaxy

The world is a lot still

Keep zooming in...

‘Big Data’...?

Sailthru

Sailthru

● Extremely early adopter of MongoDB ~2009

● 4 Clusters and 9 Stand-Alone RS

● Largest is 32 shards and 5.5TB with ~1.5 billion profiles

● All production systems are housed in a colo data center

on hardware owned and operated by Sailthru

Sailthru

● 4 DB Team Members

○ Me

○ Dr. Joshua Wickman

○ Chandrakant Gopalan

○ Tim Burrington

Sailthru

● Our systems are composed of replica sets of 2 live nodes and 1 arbiter

● Many of our systems are ‘microsharded’

PRIMARY

ARBITER

SECONDARY

PRIMARY

ARBITER

SECONDARY

Two tales of DBA struggle

No DBEs to DB team Mass Migration

What do you do if you have to move data from one data-center to another, while moving 17 replica sets into a single sharded cluster

with no (minimal) downtime?

What do you do when you join an organization which has been using

MongoDB without any DBA oversight?

Welcome to the DB team

What are the most important things for a DB to set up?

MONITORING BACKUPS

Monitoring

Microsharded systems are not easy to monitor!

● Multiple replica sets on a single machine

● Primaries and Secondaries often sharing hardware

● Monitoring systems for Mongo are at a instance level not server level

SHARD 1 PRIMARY

SHARD 2 SECONDARY

MEMORY

DISK IO

NETWORK IO

Monitoring - MMS

MMS is a great tool for all Mongo deployments

● Built in user level permissions● Automatic topology discovery● Graphs and time series data● Breakdown by replica set for

clusters● Pulls a wealth of data

Monitoring - MMS

● Built in alerting

● Many variable alerting criteria

● Integration with email, SMS, Pagerduty and more

Monitoring - MMS

MMS is our backup monitoring system

● Alerting time sometimes lags behind issue time

● Organizational decision not to host MMS and that we need an internal monitoring system as our main monitor

Monitoring - MMS

What we are looking forward to:● Proactive Support has some great features coming

through MMS● Enhanced monitoring and alerting options● Logging long queries? Non-indexed queries?● Perhaps we can run custom scripts and checks against

the system eventually!

Monitoring - Zabbix

“Quis custodiet ipsos custodes?” - ZABBIX

Monitoring - Zabbix

Monitoring mongo with Zabbixhttps://github.com/sailthru/mongodb-zabbix

● Number of voting members● Long query logging● Chunk distribution in a sharded cluster● Fsync lock status● Failover notification

Monitoring - Zabbix

Custom checks and graphs - cluster monitoring

Monitoring - Zabbix

Long Query Logging

Monitoring - Zabbix

Zabbix does not have any automated topology discovery!

Sailthru has created its own MongoDB topological discovery tool : DB Map

● Python Process● Automatically discovers nodes or config changes● Outputs all servers and information to a Mongo collection

Admin Tool - DB Map

Useful for many processes in our system

● Management scripts● Execute aggregation queries to pull specific systems● Keep Zabbix in sync using it as a source of truth● Exportable for Ansible inventory files or other

management software● Soon to be Open Sourced

Built By : Dr. Joshua Wickman

Backups

Many ways to skin a… cluster....?

● Volume snapshots (within our Datacenter)

● Snapshots of cloud secondaries (Hybrid Cloud)

● MMS Backups

Backups - Hybrid Cloud

SECONARY(HIDDEN)

SECONDARY

PRIMARY

DATACENTER

CLOUD

Sailthru had a hybrid cloud-physical topology.

Backups - Hybrid Cloud

● Disaster recovery is immediate● Backups can be taken care of by EC2

snapshotting

There are benefits to a hybrid setup

Backups - Hybrid Cloud

PRIMARY PRIMARY PRIMARY PRIMARY

SECONDARY SECONDARY SECONDARY SECONDARY

SECONDARY(hidden)

SECONDARY(hidden)

SECONDARY(hidden)

SECONDARY(hidden)

DC

Cloud

Backups - Hybrid Cloud

PRIMARY PRIMARY

SECONDARY SECONDARY

SECONDARY(hidden)

SECONDARY(hidden)

● Are these secondaries on hardware provisioned equally to the others?

● Is there enough bandwidth?● Can the disks keep up with

bursts of write activity?● Are the oplogs on these

secondaries long enough?● Is the connection to the

cloud secure and stable?

Backups - Hybrid Cloud

DO YOU HAVE THE TIME AND RESOURCES TO DO ALL OF THAT WORK??

We all just want backups that are fire-and-forget it!

Backups - MMS

● Save on your team’s time● Save on the provisioned hardware● Much cheaper than hybrid cloud solution

Sailthru has saved almost 1 million dollars year over year

Backups - MMS

● UI is easy to use and great for small/individual sets

● Need automation in order to bring up a cluster of any reasonable size○ Automation tools not yet

available out of the box● Pulls your data across the

internet - make sure you allocate this time!

The Power is Turning Off...

During 2014 Sailthru was forced to move Data Centers

Additionally we made the infrastructure decision to move from 17+ separate replica sets to a sharded cluster.

Data Migrations

DC1 DC2

CLOUD

With limited bandwidth and servers this becomes some interview’s brain teaser

Data Migrations - Dumps

DC1 DC2

MongodumpNetcat

Write to file then Mongorestore

● Lots of combinations, none ended up being fast enough.● Hampered by disk writes and reads.● If you touch disk you lose! The floor is lava!

Data Migrations - Mongopipe

Custom multiprocessing python process to insert without hitting disk

● Using python, multiprocessing, ZMQ, and some custom C objects

● Got around 2.4 bulk insert issue by sorting on shard key● Never touches disk, all processing is done in memory● Directly insert into many local mongos instances● Open source coming soon!

Data Migrations - Mongopipe

Cursor

Cursor

Cursor

Writer

Writer

Writer

Mongos

Mongos

Mongos

Target Cluster

ZMQ Batch Inserts

Sort on Shard Key

DC1 DC2

Data Migrations - Mongopipe

insert query update delete getmore command

64982 25 *0 *0 0 45|0

62484 23 *0 *0 0 50|0

37490 15 *0 *0 0 25|0

-1073585030 -4978381 *0 *0 -163 -5042014|0

197448 70 *0 *0 0 144|0

227440 105 *0 *0 0 181|0

49986 45 *0 *0 0 59|0

Data Migrations - Mongo Connector

● Mongoconnector is a way to mirror mongodb operations, creating almost a virtual secondary without adding it to a replica set

● Great for data migrations without downtime

https://github.com/10gen-labs/mongo-connector

Data Migrations - Mongo Connector

MONGO

OP LOG1….2….3….

TARGETDATASTORE

Elasticsearch..Solr...

Mongodb...

MONGO CONNECTOR

OPLOGMNGR.

DOC MNGR.

DOC MNGR.

DOC MNGR.

Access Patterns - Keystore

● What if I want to do a lot of findOnes on a cluster?● On many unique fields?● Am I doomed to many scatter gathers?

SHARD SHARD SHARD SHARD

MONGOSAssume sharded on _id: hashed

findOne({“ssn”: X}) findOne({“cell_phone”: X}) findOne({“_id”: X})

Created by : Ian White

Access Patterns - Keystore

Find by SSN

SHARDED COLLSharded on:{_id: hashed}

Doc:{ _id: SSN sid: ObjectId()}

Query on _id (shard key)Return an ObjectId

Main Sharded CollectionSharded on :{_id: hashed}

Use sid that was found to query the _id in the main collection

Access Patterns - Keystore

2 queries rather than n where n is your number of shards** Not useful unless you are sharded out very far **

● Time averaged by keystore : ~30 seconds● Time averaged by direct lookup: ~170 seconds

** tests done on a 32 shard cluster

Other Tools - Mongoexup

● Cron jobs are unreliable● Any ‘prototype’ inevitably becomes production● Constructed a python scheduler daemon to execute

these tasks● Looking to open source in the future

Business need to regularly execute mongoexport and uploads

Built By : Chandrakant Gopalan

Other Tools - Mongoexup

Mongo MongoExUp S3Greenlets Greenlets

Job Status Information

What are we doing next?

● Open source even more of our tools● Ansible Automation● Building API layers around all our DBs

○ Tornado - ASYNC RULES● MongoDB + Other Data Stores

○ Enhancing the Keystore concept● Upgrading

○ WT○ RocksDB

Recommended