Benchmarking MongoDB and CouchBase

1. BenchmarkingNo-SQL DatabasesAlex Voss Chris Choi

2. TOP 2 QuestionsShould a social scientist buyMORE or UPGRADEcomputers?Which DATABASE(s)? 3. Document OrientedDatabases 4. The central concept of adocument-oriented databaseis the notion of a Document 5. BothandUses JSON to store theseDocuments 6. Example Documents: {{ FirstName:"Jonathan",FirstName:"Bob", Address:"15 Wanamassa Point Road",Address:"5 Oak St.", Children:[Hobby:"sailing" {Name:"Michael",Age:10},} {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2} ] }there are no empty fields, and it doesnot require explicitly stating if otherpieces of information are left out 7. The Database Tests 8. Database TestAggregate/Count: Most User Mentioned Most Hashed Tags Most Shared URLs 9. Database TestAggregate/Count:Typical questions Most User Mentioned Most Hashed Tags Most Shared URLs 10. Has two query approaches: AggregationMap Reduce & Framework 11. Map Reduce For Aggregation Uses JavaScript 12. Aggregation Framework Similar to Map Reduce But Uses C++ (Faster!) 13. Single Node SSD vs HDD Map Reduce Aggregation Framework60 60 Most User Mentioned Most Hashed-Tags Most User Mentioned50 50 Most Hashed-Tags Most Shared URLsMost Shared URLs40 4030 3020 2010 1000 HDDSSDHDD SSD 14. Single NodeAggregation Frameworkis roughly 2 times faster than Map Reduce 15. When we have more than 1node ~ ScalingGenerally . . . 16. ScalingBIGGERVerticallyandFATTERmachine!ScalingHorizontally MORE machines! 17. Replica-SetMaster-SlaveReplicationReplicates datafrom master toslaveOnly one node is incharge (the Master), anddata only travels in onedirection (to the Slave) 18. Replica-SetSingle Node vs Replica Set: Aggregation Framework 25Single NodeReplica Set 20 15Time (minutes) 10 5 0User Mentioned Hashed-Tags Shared URLs 19. Replica-SetSingle Node vs Replica Set:Map Reduce 80 Single Node 70Replica Set 60 50Time (minutes) 40 30 20 10 0User Mentioned Hashed-Tags Shared URLs 20. Replicating Data did not improvequery performance(MapReduce and Aggregation Framework) . . . so we tried Sharding 21. Sharding One BIG stackTwo or more smaller stacks 22. democratic uses a hierarchical scaling approachMaster-SlaveReplication 23. But Deployment is anissue with MongoDB . . .If you want to shard . . . 24. documentation suggests minimum 11 nodes, to build a fail-safe sharded cluster2 Shards 6 nodes (3 each)2 mongos (load balancer) 2 nodes3 mongod (config servers) 3 nodes------------------------------------------- Total: 11 nodes 25. The investment to go fromsingle node --> sharded clusteris very HIGH! 26. By omitting Fail-Safe featuresFor our tests, we kind of cheated and used:Note: X is variable2 X Shards 6 2 nodes2 1 mongos (load balancer) 2 0.5 node3 1 mongod (config servers) 3 0.5 node------------------------------------------- Total: 11 3 nodesNOT RECOMMENDEDIN PRODUCTION 27. Which looks something like this: 4 Shard configuration 28. Or this: 8 Shard configuration 29. Another problem is withquerying in a shardedenvironment . . . 30. We couldnt perform queries usingthe Aggregation Framework in theshards, because it was limited by:- Output at every stage of theaggregation can only contain 16MB- >10% RAM usage MongoDB would fail if the limits were reached 31. 4 Cores 8GB RAMSharded ClusterMap Reduce: On 3 modest machines Most hash tags (mins)30.00Most user mentions (mins) Most shared urls (mins)25.0020.00 Time (minutes)15.0010.00 5.00 0.002 48MostNumber of Shards Optimal 32. 4 Cores8GB RAM Sharded Cluster Map Reduce: On 3 modest machinesSingle Node Three Nodes60.00 60Most hash tags (mins) Most Hashed-Tags Most user mentions (mins)50.00 50Most User MentionedMost shared urls (mins) Most Shared URLs40.00 40 Time (minutes)Time (minutes) 30 30.00 20 20.00 10 10.00 0 0.00 HDDSSD 24 8 Number of Shards 33. 4 Cores8GB RAMSharded ClusterMap Reduce: On 3 modest machines Single NodeThree Nodes60.0060 60 Most Hashed-Tags Most hash tags (mins)Most Hashed-Tags Most User MentionedMost user mentions (mins)50.0050 50 Most User Mentioned Most Shared URLs Most shared urls (mins)Most Shared URLs40 40 40.00 Time (minutes)Time (minutes) Time (minutes)30 30 30.0020 20 20.0010 10 10.00 0 0 0.00 HDDHDDSSDSSD 24 8 Number of Shards 34. Performance of Single Node (SSD)is very comparable to 3 nodes (HDD)of different number of shards!(Single Node with SSD is2-3 minutes slower than 3nodes . . . [40 GB corpus])Perhaps its worth purchasing a goodSSD over investing more nodes toimprove aggregate performance . . . 35. What happens when we scale verticallywith ONE BIG BOX? we get ---> 36. 32 Cores128GB RAMSharded Cluster Map Reduce: On a Big Box 12.00Most hash tags (mins)Most user mentions (mins) Most 10.00Most shared urls (mins) Optimal8.00Time (minutes)6.004.002.000.002 4 8 16 242830Number of Shards 37. On a SINGLE NODE , whileCouchBase is FASTER thanMongoDBs MapReduceMongoDBs Aggregation Framework is FASTER thanCouchBase 38. That said adding moreCouchbase nodes wasdisappointing. . . 39. 12 VS 3 VSNodeNodesNodes One Node90 Two Nodes Three Nodes80706050403020100 Most user mentioned Most hashed tags Most shared urls 40. Adding more nodes did notimprove query performance . . . 41. Other users have found similar issues,Couchbase Support says: One known issue with the current developerpreview of Couchbase is that it slows downon smaller clusters. The views are optimalon very large clusters. [1]But they did not specify what is very large [1] Couchbase performance - real tests - its horrible - or am I doing somethin wrong? 42. That wasNovember 2011 . . . 43. Which one? 44. The obvious answer isFrom a performance standpoint. 45. Simple flat file aggregation with Javatook approx 60 minsOn the same machine, MongoDB tookapprox30 mins (MapReduce)10 mins (Aggregation Framework) 46. Deployment Options 140.002500 1 NodeTotal ExecutionHard driveHDD Time2200Map Reduce MR Costs 120.00 19502000 100.00Approximate Total Cost (GBP) Total Execution Time (mins)1 Node 150080.00SSDMR60.00100010009001000 3 Nodes40.00650 HDDMR MR50020.00 AFBig Box1 Node #2HDD 0.00Fast SSD0 MR Aggregate Framework 47. But!Has problems with: - Full Text Search - Social Network Graphs - Aggregation 48. s Full Text Searchis slow1 simple query on a singlenode, 44GB corpus tookapprox10 mins! 49. Solution: A Fast Full Text Search Engine The same search took only a fraction of a second 50. To create aSocialNetworkGraph withCan be Complex,since you query multipletimes to obtainrelationships 51. Solution: A Popular Graph DatabaseWhere you can obtain a network graph with VERY FEW queries 52. Example: START john=node:node_auto_index(name = John) MATCH john-[:friend]->()-[:friend]->fof RETURN john, fof The above query fetches all nodes who are friends of friends of Johns 53. With Aggregation Is perhaps not the best, we have yet to play with 54. Or evenAn SQL language tobuild MapReducequeriesWith 55. Moral of the storyUse the right toolsfor the right job 56. Future workWe aim to experiment with:With possibility in integrating bothwith