If you can't read please download the document
Upload
dibaunaumh
View
12.377
Download
2
Embed Size (px)
DESCRIPTION
My improvised/copied preso for some short talk I gave.
Citation preview
2. Agenda
3. The solution 4. Benefits 5. Cost 6. Example: Cassandra 7. The problem
8. Data(peta-bytes generated daily) 9. Processing(all this data needs processing) 10. Exponential growth(surging unpredictable demands) 11. The problem (contd.)
12. MS SQL 13. Sybase 14. MySQL 15. PostgreSQL Even with their high-end clustering solutions 16. The problem (contd.)
17. Existing RDBMS clustering solutions require scale-up, which is limited & not really scalable when dealing with exponential growth 18. Machines have upper limits on capacity, & sharding the data & processing across machines is very complex & app-specific 19. The problem (contd.)
20. Basically, you end up denormalizing everything & loosing all benefits of relational databases 21. Who faced this problem?
22. Yahoo! 23. Amazon 24. Facebook 25. Twitter 26. Linked-In 27. & many more 28. 1 difference though
29. Web search result (Google /Yahoo!) 30. Item added to cart (Amazon) 31. The solution
32. Must we sacrifice something?
33. A vailability 34. P artition-tolerance BTW, the theorem was later proved by MIT scientists in 2002 35. Simple example
36. This means, that when writing a record, all replica's must be updated too 37. Now you need to choose between:
38. Don't lock the replicas => be lessc onsistent 39. The consequence
40. Drop availability (CP) 41. Drop consistency (AP) Drop here is usually not meant as binary, but rather tunable 42. Non-relational databases
43. Hbase(developed at Yahoo!) 44. Dynamo(developed at Amazon) 45. Cassandra(developed at FaceBook) 46. Voldemort(developed at LinkedIn) 47. & a few more:
48. Benefits
49. Extremely fast 50. Highly available, decentralized & fault tolerant (no single-point-of-failure) 51. Transparent sharding (consistent hashing) 52. Elasticity 53. Parallel processing 54. Dynamic schema 55. Automatic conflict resolution 56. Consistent hashing 57. Replication 58. Replication node joining 59. Replication node leaving 60. Scale-out / elasticity?
61. Runs on a large number of cheap commodity machines 62. Replication 63. Gossip protocol 64. Transparently handles adding/removing nodes 65. Tunable consistency?
66. Read your writes consistency 67. Session consistency 68. Monotonic read consistency 69. Eventual consistency Tunable means: how many replica's to lock on write
70. Quorum 71. Dealing with inconsistency
72. Vector clock conflict resolution 73. Dynamic schema
74. Dynamic schema (contd.)
75. Record can have several supercolumns 76. Data processing
77. Brings the workers to the data excellent fit for non-relational databases 78. Minimizes the programming to 2 simple functions (map & reduce) 79. Example: count appearances of a word in a giant table of large texts 80. Map/Reduce (contd.) 81. Storage 82. Cost
83. Non-standard new API model 84. Non-standard new Schema model 85. New knowledge required to tune/optimize 86. Less mature 87. API model
88. Put(key, value) 89. Delete(key) 90. Execute(operation, key_list) value can be
91. a record (list of columns: ) 92. Schema model
93. No schema 94. Example: Cassandra
95. Eventual consistency
Values are structured, indexed 96. Columns / column families 97. Slicing with predicates (queries) 98. PartitionOrderer 99. Cassandra performance
100. 350ms read Cassandra:
101. 15ms read how come writes are so fast?
102. Use any node (closest to you) 103. Cassandra API 104. Cassandra API (contd.) 105. Example: Cassandra (contd.)
106. Simple client 107. Cassandra usage
108. Digg 109. Twitter 110. Further information
Nosql patterns:
Nosql conference video's:
Hebrew podcast covering nosql & Cassandra(episodes 56, 57 & more):
111. Further information (contd.)
112. http://prettyprint.me/2010/01/20/introduction-to-nosql-and-cassandra-part-2/