42
Solutions Architect, 10gen Sandeep Parikh #mongodbdays Deployment Best Practices

Deployment Best Practices

  • Upload
    mongodb

  • View
    425

  • Download
    5

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Deployment Best Practices

Solutions Architect, 10gen

Sandeep Parikh

#mongodbdays

Deployment Best Practices

Page 2: Deployment Best Practices

Prototype

Test

MonitorScale

Script

The Cycle of Deployment Prep

Page 3: Deployment Best Practices

Prototype Your Deployment• You have to start somewhere

• Development is complete, deployment is next

• Sketch out some initial deployment parameters

Hardware sizingOperating systemDisk setupStorage layout, data vs. journal vs. log

Prototype

Test

MonitorScale

Script

Page 4: Deployment Best Practices

Prototyping Considerations• Additional considerations

– Horizontal vs. vertical scale options– Multiple datacenters

• Start thinking about data growth– Do you know how your data will evolve?– Does your data live in multiple

collections/databases– Read-centric, write-centric or both?

• The more you start thinking about it, the better

Prototype

Test

MonitorScale

Script

Page 5: Deployment Best Practices

Test, Test, Test

• Generate a lot of data– Write tests to measure bulk loading throughput– Scaffolding can be used for staging, validation

• Build your indexes– All in the beginning– On the fly

• Script your app– Can you simulate “expected” usage?

Prototype

Test

MonitorScale

Script

Page 6: Deployment Best Practices

Monitor Your Resources

• Watch everything

• The goal is to understand the numbers before deploying

• Monitor using– SNMP, munin, nagios– mongostat, mongotop, iostat, cpustat– MongoDB Monitoring Service (MMS)

• Other stats– Database, Collection level

Prototype

Test

MonitorScale

Script

Page 7: Deployment Best Practices

Monitoring Key Metrics

• Op Counters– Inserts, updates, deletes,

reads (more is generally better)

– Some differences in primary vs. secondary ops

• Resident memory– Want this lower than

available physical memory– Correlated with page faults

and index misses

• Queues– Readers and writers

Prototype

Test

MonitorScale

Script

Page 8: Deployment Best Practices

Monitoring Key Metrics

• Page faults and B-Tree– How often are you having

to hit the disk– Persistently non-zero?

Working set might not fit.

• Lock Percentage– If high and queues are

filled, hitting write capacity

• IO and CPU Stats– IO Sustained or fluctuating

=> IO bound– CPU hitting IOWAITs

Prototype

Test

MonitorScale

Script

Page 9: Deployment Best Practices

Scale Your Setup

• Monitor those metrics while testing

• Should tell you where to add capacity– CPU, RAM, Disks

• Storage configuration– RAID levels (10 preferred)– Filesystem selection– Block sizing– Readahead setting

Prototype

Test

MonitorScale

Script

Page 10: Deployment Best Practices

Script Your Plays

• Backups

• Restores (backups are not enough)

• Maintenance and Upgrades

• Replica Set operations– Stepping primaries down, adding new secondaries

• Sharding operations– Consistent backups, balancer operations

• Check out the Backup talk later today

Prototype

Test

MonitorScale

Script

Page 11: Deployment Best Practices

Prototype

Test

MonitorScale

Script

Lather, Rinse, Repeat

Page 12: Deployment Best Practices

Perfect. I know what to do.How Do I Do It?

Page 13: Deployment Best Practices

Balancing Priorities

Product Developme

nt

Infrastructure

Development

Integration

QA

Code

Operations

Monitoring

Page 14: Deployment Best Practices

The Scale Tips To One Side• Product development is the priority

– As it should be, but…

• Infrastructure development can’t be overlooked

• Know the downsides of not being prepared– Downtime– Data safety

• Disaster will strike

Page 15: Deployment Best Practices

Integrate With The Dev Cycle• Why are ops typically skipped over until

it’s too late?– Planning

• Make operations development a part of the dev cycle– Put it into the schedule– Make it a development milestone

• Use it to your advantage– Script deployment of development and test

systems

Page 16: Deployment Best Practices

That’s all well and good butwe are already deployed

Page 17: Deployment Best Practices

Let’s Avoid This Situation

Page 18: Deployment Best Practices

Prototype

Test

MonitorScale

Script

Start The Cycle Again

Page 19: Deployment Best Practices

Start With Monitoring

• Monitor your deployment– Munin, nagios– MMS

• Instrument your app– Know your queries– Read/write/update/delete behaviors– Index utilization

• Database and collection stats

Prototype

Test

MonitorScale

Script

Page 20: Deployment Best Practices

Scaling Deployment

• The numbers don’t lie– But individual measurements don’t always tell the

whole story

• Are you hardware bound?– Memory, Disks, CPU

• Is your app the problem?

• What about system settings?– Low Resident Memory > Readahead > Page Faults

Prototype

Test

MonitorScale

Script

Page 21: Deployment Best Practices

Basic Solutions

• Low opcounters + high page faults– More memory

• High paddingFactor and fragmentation– Data model changes

• Balancer running a lot, chunks always migrating– Better shard key

• Persistent b-tree misses, high page faults– Queries aren’t hitting the indexes or aren’t using

them

Prototype

Test

MonitorScale

Script

Page 22: Deployment Best Practices

Continue Through the Cycle• Script your setup

– This will save time as you iterate

• Prototype the fixes– Evaluate queries, how documents change,

expected usage

• Test the new setup– Scripts to build the deployment and model usage

Prototype

Test

MonitorScale

Script

Page 23: Deployment Best Practices

Deployment is aboutNot being surprised

Page 24: Deployment Best Practices

Questions?

Page 25: Deployment Best Practices

How To Get Help

• Ask the Experts sessions

• We are here to help, come find us

• Refer to our docs: docs.mongodb.org (hint: they’re great!)

• Other things we monitor– mongodb-user Google group– Stack Overflow

• Submit a ticket

Page 26: Deployment Best Practices

BackupProblem > Diagnosis > Solution

Page 27: Deployment Best Practices

Problem 1: Social Networking• Suboptimal write throughput

• Where is the bottleneck?– Check the metrics

Page 28: Deployment Best Practices

Diagnosis 1

• Are opcounters reasonably accurate?

• Check the queues

• Examine lock percentages

• How does resident memory look?

• How large are your indexes?

Page 29: Deployment Best Practices

Solution 1

• Opcounters aren’t as high as you’d expect but memory is saturated

• Correlated with high page faults

• You might need more memory

• MongoDB wants to fit your working set into memory

Page 30: Deployment Best Practices

Problem 2: Tracking FB Friends• Update-heavy workload is slow

• Document paddingFactor is increasing

Page 31: Deployment Best Practices

Diagnosis 2

• High paddingFactor– Fragmentation!

• More memory/disk is taken up by new documents– Inefficient space usage

• Documents are having to be relocated regularly

Page 32: Deployment Best Practices

Solution 2

• Check your queries– Are your documents growing because of arrays or

added fields?

• Pre-create required document structure or…

• Kick growing elements individual objects in a separate collection– Data model changes, app changes

Page 33: Deployment Best Practices

Problem 3: Status Updates • Write-heavy sharded deployment

– Is one shard getting burned– Balancer locked all the time

• Balancer is constantly migrating chunks

Page 34: Deployment Best Practices

Diagnosis 3

• Check the mongos logs– How often is migration occurring?– Are chunks constantly moving from one shard to

the next?

• Shard key distribution– Sequential keys?– One shard always getting new writes?

Page 35: Deployment Best Practices

Solution 3

• Consider using hash, byte swapping, etc. if no “natural” key that distributes well– Avoids the “hot” shard problem

• High writes and high balancer lock– Manage balancer window – Run it during low utilization

Page 36: Deployment Best Practices

Problem 4: File Sharing

• Storing files in GridFS

• Uploads are taking too long

Page 37: Deployment Best Practices

Diagnosis 4

• Check CPU and IO stats

• Is the CPU stuck in IOWAITS?

• High sustained IO operations

• Lots of queued operations

• IO bound workload

Page 38: Deployment Best Practices

Solution 4

• Ensure storage is in good health– RAID status– SAN or NAS devices functioning properly– Virtualized disks

• Consider separating data and journal– --directoryperdb– Symlink journal to another location

• Ensure other processes aren’t hitting storage

Page 39: Deployment Best Practices

Problem 5: Reading Logs

• Indexes are underperforming

• Queries are using indexes but yielding quite a bit

Page 40: Deployment Best Practices

Diagnosis 5

• Use .explain() and .hint() with your queries

• Check out the b-tree metrics– Persistent non-zero misses?– Correlated with memory, page faults, IO stats

• B-trees best for range queries over single dimension– Range queries on {A} if index is {A,B} could be

suboptimal

Page 41: Deployment Best Practices

Solution 5

• Revisit your indexing strategy

• Consider data model changes to optimize queries and indexes

• Some functionality doesn’t hit the index– $where javascript clauses– $mod, $not, $ne– Complex regular expressions

Page 42: Deployment Best Practices

Miscellaneous Deployment Notes

• Warm the cache– Use touch via db.runCommand()

• Dynamically change log levels

• Synchronize all clocks to the same NTP server