Upload
mongodb
View
425
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Solutions Architect, 10gen
Sandeep Parikh
#mongodbdays
Deployment Best Practices
Prototype
Test
MonitorScale
Script
The Cycle of Deployment Prep
Prototype Your Deployment• You have to start somewhere
• Development is complete, deployment is next
• Sketch out some initial deployment parameters
Hardware sizingOperating systemDisk setupStorage layout, data vs. journal vs. log
Prototype
Test
MonitorScale
Script
Prototyping Considerations• Additional considerations
– Horizontal vs. vertical scale options– Multiple datacenters
• Start thinking about data growth– Do you know how your data will evolve?– Does your data live in multiple
collections/databases– Read-centric, write-centric or both?
• The more you start thinking about it, the better
Prototype
Test
MonitorScale
Script
Test, Test, Test
• Generate a lot of data– Write tests to measure bulk loading throughput– Scaffolding can be used for staging, validation
• Build your indexes– All in the beginning– On the fly
• Script your app– Can you simulate “expected” usage?
Prototype
Test
MonitorScale
Script
Monitor Your Resources
• Watch everything
• The goal is to understand the numbers before deploying
• Monitor using– SNMP, munin, nagios– mongostat, mongotop, iostat, cpustat– MongoDB Monitoring Service (MMS)
• Other stats– Database, Collection level
Prototype
Test
MonitorScale
Script
Monitoring Key Metrics
• Op Counters– Inserts, updates, deletes,
reads (more is generally better)
– Some differences in primary vs. secondary ops
• Resident memory– Want this lower than
available physical memory– Correlated with page faults
and index misses
• Queues– Readers and writers
Prototype
Test
MonitorScale
Script
Monitoring Key Metrics
• Page faults and B-Tree– How often are you having
to hit the disk– Persistently non-zero?
Working set might not fit.
• Lock Percentage– If high and queues are
filled, hitting write capacity
• IO and CPU Stats– IO Sustained or fluctuating
=> IO bound– CPU hitting IOWAITs
Prototype
Test
MonitorScale
Script
Scale Your Setup
• Monitor those metrics while testing
• Should tell you where to add capacity– CPU, RAM, Disks
• Storage configuration– RAID levels (10 preferred)– Filesystem selection– Block sizing– Readahead setting
Prototype
Test
MonitorScale
Script
Script Your Plays
• Backups
• Restores (backups are not enough)
• Maintenance and Upgrades
• Replica Set operations– Stepping primaries down, adding new secondaries
• Sharding operations– Consistent backups, balancer operations
• Check out the Backup talk later today
Prototype
Test
MonitorScale
Script
Prototype
Test
MonitorScale
Script
Lather, Rinse, Repeat
Perfect. I know what to do.How Do I Do It?
Balancing Priorities
Product Developme
nt
Infrastructure
Development
Integration
QA
Code
Operations
Monitoring
The Scale Tips To One Side• Product development is the priority
– As it should be, but…
• Infrastructure development can’t be overlooked
• Know the downsides of not being prepared– Downtime– Data safety
• Disaster will strike
Integrate With The Dev Cycle• Why are ops typically skipped over until
it’s too late?– Planning
• Make operations development a part of the dev cycle– Put it into the schedule– Make it a development milestone
• Use it to your advantage– Script deployment of development and test
systems
That’s all well and good butwe are already deployed
Let’s Avoid This Situation
Prototype
Test
MonitorScale
Script
Start The Cycle Again
Start With Monitoring
• Monitor your deployment– Munin, nagios– MMS
• Instrument your app– Know your queries– Read/write/update/delete behaviors– Index utilization
• Database and collection stats
Prototype
Test
MonitorScale
Script
Scaling Deployment
• The numbers don’t lie– But individual measurements don’t always tell the
whole story
• Are you hardware bound?– Memory, Disks, CPU
• Is your app the problem?
• What about system settings?– Low Resident Memory > Readahead > Page Faults
Prototype
Test
MonitorScale
Script
Basic Solutions
• Low opcounters + high page faults– More memory
• High paddingFactor and fragmentation– Data model changes
• Balancer running a lot, chunks always migrating– Better shard key
• Persistent b-tree misses, high page faults– Queries aren’t hitting the indexes or aren’t using
them
Prototype
Test
MonitorScale
Script
Continue Through the Cycle• Script your setup
– This will save time as you iterate
• Prototype the fixes– Evaluate queries, how documents change,
expected usage
• Test the new setup– Scripts to build the deployment and model usage
Prototype
Test
MonitorScale
Script
Deployment is aboutNot being surprised
Questions?
How To Get Help
• Ask the Experts sessions
• We are here to help, come find us
• Refer to our docs: docs.mongodb.org (hint: they’re great!)
• Other things we monitor– mongodb-user Google group– Stack Overflow
• Submit a ticket
BackupProblem > Diagnosis > Solution
Problem 1: Social Networking• Suboptimal write throughput
• Where is the bottleneck?– Check the metrics
Diagnosis 1
• Are opcounters reasonably accurate?
• Check the queues
• Examine lock percentages
• How does resident memory look?
• How large are your indexes?
Solution 1
• Opcounters aren’t as high as you’d expect but memory is saturated
• Correlated with high page faults
• You might need more memory
• MongoDB wants to fit your working set into memory
Problem 2: Tracking FB Friends• Update-heavy workload is slow
• Document paddingFactor is increasing
Diagnosis 2
• High paddingFactor– Fragmentation!
• More memory/disk is taken up by new documents– Inefficient space usage
• Documents are having to be relocated regularly
Solution 2
• Check your queries– Are your documents growing because of arrays or
added fields?
• Pre-create required document structure or…
• Kick growing elements individual objects in a separate collection– Data model changes, app changes
Problem 3: Status Updates • Write-heavy sharded deployment
– Is one shard getting burned– Balancer locked all the time
• Balancer is constantly migrating chunks
Diagnosis 3
• Check the mongos logs– How often is migration occurring?– Are chunks constantly moving from one shard to
the next?
• Shard key distribution– Sequential keys?– One shard always getting new writes?
Solution 3
• Consider using hash, byte swapping, etc. if no “natural” key that distributes well– Avoids the “hot” shard problem
• High writes and high balancer lock– Manage balancer window – Run it during low utilization
Problem 4: File Sharing
• Storing files in GridFS
• Uploads are taking too long
Diagnosis 4
• Check CPU and IO stats
• Is the CPU stuck in IOWAITS?
• High sustained IO operations
• Lots of queued operations
• IO bound workload
Solution 4
• Ensure storage is in good health– RAID status– SAN or NAS devices functioning properly– Virtualized disks
• Consider separating data and journal– --directoryperdb– Symlink journal to another location
• Ensure other processes aren’t hitting storage
Problem 5: Reading Logs
• Indexes are underperforming
• Queries are using indexes but yielding quite a bit
Diagnosis 5
• Use .explain() and .hint() with your queries
• Check out the b-tree metrics– Persistent non-zero misses?– Correlated with memory, page faults, IO stats
• B-trees best for range queries over single dimension– Range queries on {A} if index is {A,B} could be
suboptimal
Solution 5
• Revisit your indexing strategy
• Consider data model changes to optimize queries and indexes
• Some functionality doesn’t hit the index– $where javascript clauses– $mod, $not, $ne– Complex regular expressions
Miscellaneous Deployment Notes
• Warm the cache– Use touch via db.runCommand()
• Dynamically change log levels
• Synchronize all clocks to the same NTP server