Click here to load reader
Upload
scott-hernandez
View
4.381
Download
2
Embed Size (px)
DESCRIPTION
Learn about mongodb best practices from examples from fields.
Citation preview
Operational Best Practices
Tales from the field
The Plan● Review support cases
○ Taken from real issues○ Names/ips/dates changed to protect identities
● Analyze reported issues● Distill best practices● Summarize takeaways ● Repeat...
Scenario 1● Fire, it is on fire! ● Users notice response time takes 1-3 sec● App logs show timeouts● Server log show socket exceptions
Scenario 1 - Diagnostics● Logs ● Understanding the timeouts
○ Client read timeout set○ Connection closed/discarded○ Symptom not cause
● Server connection exceptions
○ Match timing of client timeouts○ Symptom not cause
Scenario 1 - MonitoringGraphs speak a thousand words
Scenario 1 - Takeaways● Monitor Logs
○ Alert, escalate○ Correlate
● Disk○ Monitor○ Moved to RAID (10)
● Instrument/Monitor App● Know your application and application (write)
characteristics
Scenario 2● Alerts warn that server is running hot● Random (small) slowdowns● Increased traffic/queries
Scenario 2 - SymptomsHigh use cpu Similar query pattern
Scenario 2 - Diagnostics● Turn on DB Profiling● Look at logs Identify query patterns taking longest or with highest frequency and run explain
Scenario 2 - Explaindb.scenario2.find({...}).sort({...}).explain() { "cursor" : "BtreeCursor ABC", "nscanned" : 160677, "nscannedObjects" : 12015, "n" : 55, "millis" : 99, "scanAndOrder" : true, "indexBounds" : {...} }
Scenario 2 - Diagnostics● Create a compound index
○ Used for criteria and sort○ Reduced CPU dramatically
Scenario 2 - Takeaways● Performance test/analyze system behavior● Load test before deployment● Alert on abnormal states● High CPU is a sign of poorly indexed● Rolling upgrade for indexes
Scenario 3● General slowdown on login● High disk utilization
Scenario 3 - DiagnosticsiostatDevice: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %utilsdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00
Scenario 3$ blockdev --reportRO RA SSZ BSZ StartSec Size Devicerw 8096 512 4048 0 1099494850560 /dev/sdp
Huge read-ahead of 4MB
Scenario 3 - Takeaways● Pay attention to disk configurations● Load testing would have found this early● MongoDB depends on the OS a lot● Connect the dots from disportionate effects
Best Practices Learned● System provisioning
○ Capacity○ Performance○ Scale○ Configuration
● Logs○ Review○ Alert○ Rotate and collect (per cluster)
Best Practices Learned● Query/Index Analysis
○ Database Profiler○ Run explain periodically (sampled)○ Instrument code, generate metrics
● Plan/test rollouts○ Rolling upgrade for Replica Set○ Generate indexes on secondaries first○ Name services, use redirection
Thanks, more refsPlease take a look at http://mongodb.org (docs) ● Ask on mongodb-user group● Use MMS or historic monitoring
○ Watch for trends○ Create alerts○ Forecast capacity for provisioning
● logrotate unix command● monitor disk - munin or the like● iostat, dstat, vmstat, free, netstat
Questions