Storage Engine Wars at Parse

  • Upload
    mongodb

  • View
    172

  • Download
    0

Embed Size (px)

Citation preview

1. Charity Majors @mipsytipsy Parse Production Engineer Igor Canadi @igorcanadi Facebook Software Engineer 2. Storage Engine Wars at Parse 3. Parse does crazy shit with MongoDB 4. Parse 500k+ workloads millions of colls, 10s of millions of indexes ~35 replica sets 240 GB primary data nearly 1 PB data on AWS ~2 DBA-type engineers 5. Storage Engine Goals handle 10M collections+indexes compression document-level locking no stalls or outliers faster writes, ballpark read latencies 6. Storage Engine Wars 7. RocksDB 8. RocksDB open source storage engine write-optimized (LSM trees) highly compressible heavy investment by Facebook vibrant open-source community 9. Battle tested LinkedIn: Feed, Apache Samza Yahoo: Sherpas local storage Facebook: tens of PB, tens of billions of QPS, hundreds of different workloads 10. RocksDB::Internals 11. LSM architecture Memtable Write Request from ApplicationRead Request from Application Transaction log Read Only data in RAM on disk Periodic Compaction 12. Write amplification - B tree Row Row Row Row read Row Row Row Row modify write 13. Write amplification - Leveled LSM Level 0 Level 1 Level 2 Level 3 Compaction 14. Fragmentation - B tree Row Row Row Row split Row Row Row Row Row 15. Fragmentation - LSM Level 0 Level 1 Level 2 Level 3 16. Comparing with InnoDB 0 0.25 0.5 0.75 1 1.25 Database size (relative) InnoDB RocksDB 0 0.25 0.5 0.75 1 1.25 Bytes written (relative) InnoDB RocksDB 17. LSM read penalty B-tree LSM internal nodes leaf nodes range scan with covering index is sequential read memtable L0 L1 L2 L3 we need more reads for range scans 18. Integration with MongoDB type key value Collection Unique index Non-unique index 19. Storage Engine Goals handle 10M collections+indexes compression document-level locking no stalls or outliers faster writes, ballpark read latencies 20. Storage efficiency Storage efficiency Higher throughput No stalls Possibly trade-off read latencies 21. Latencies TODO latency graph 22. Key findings 90% compression 50-200x faster writes much less IO exercised when records are smaller queries marginally slower when: scanning a lot of documents, large documents querying cached data capped collections suck RocksDB IS AWESOME 23. Operations::Rollout 24. Gaining confidence snapshot and replay (Flashback) hidden secondaries secondaries primaries mixed replsets for a loooong time :) 25. MongoDB 3.0 issues $nearSphere 10x performance regression https://jira.mongodb.org/browse/SERVER-18056 long running reads on secondaries blocking replication https://jira.mongodb.org/browse/SERVER-18190 26. Tombstone trap R T R T R T R T R T R T R R Solution: Automagically compact 27. Operations::Production 28. Backing up RocksDB table files are immutable so backups are easy. just hardlink! were building a tool that will send incremental backups to S3 29. Monitoring db.serverStatus()[rocksdb] 30. Monitoring db.serverStatus()[rocksdb] 31. Monitoring Tombstones Disk I/O saturation CPU usage Latency db.serverStatus()[rocksdb] 32. Current status deployed as primary on 25% of replica sets secondaries on 50% of replica sets made ops tools storage engine agnostic made monitoring storage engine aware 33. Next steps more performance improvements improve operational tooling, monitoring continue to test alongside WT, TokuMX ** wider community adoption ** 34. Future of RocksDB 35. Future of Mongo+Rocks World domination. Let us know what you think. :) > 36. Resources http://rocksdb.org http://blog.parse.com/announcements/mongodb-rocksdb-parse/ http://blog.parse.com/learn/engineering/mongodb-rocksdb-benchmark-setup- compression/ http://blog.parse.com/learn/engineering/mongodb-rocksdb-writing-so-fast-it- makes-your-head-spin/ http://www.acmebenchmarking.com/ > 37. Charity Majors @mipsytipsy Parse Production Engineer Igor Canadi @igorcanadi Facebook Software Engineer