113
noSQL choices

noSQL choices

Embed Size (px)

Citation preview

Page 1: noSQL choices

noSQL choices

Page 2: noSQL choices

What is mySQL?

Page 3: noSQL choices
Page 4: noSQL choices

What is noSQL?

Page 5: noSQL choices
Page 6: noSQL choices

Types of noSQL databases

Page 7: noSQL choices
Page 8: noSQL choices

Why noSQL?

Page 9: noSQL choices
Page 10: noSQL choices
Page 11: noSQL choices
Page 12: noSQL choices
Page 13: noSQL choices
Page 14: noSQL choices
Page 15: noSQL choices
Page 16: noSQL choices
Page 17: noSQL choices
Page 18: noSQL choices

Differences between noSQL and MYSQL

Page 19: noSQL choices

Aggregated data vs tuples

Page 20: noSQL choices
Page 21: noSQL choices

ACID vs BASE transactions

Page 22: noSQL choices

• A – Atomicity• C – Consistency• I – Isolation• D - Durabilty

Page 23: noSQL choices
Page 24: noSQL choices

Schema vs Schema-less

Page 25: noSQL choices

The 5 main data stores

• Relational Databases• Key-store• Document Databases• Graph Stores• Column Stores

Page 26: noSQL choices

Relational DatabasesAKA RDBMS

Page 27: noSQL choices
Page 28: noSQL choices

Why is it good?

• Super flexible• Proven to work, dominant in the market for 3

years• Robust, Stable• Very consistent• Follows ACID transitions, making it industry

standard

Page 29: noSQL choices

Why is it bad?

• Strongly typed columns• Inefficient with high volumes of data• Not designed for clusters• ONLY EFFICIENT WITH STRUCTRED DATA• Vertical scaling, need to buy bigger computer

to process bigger data

Page 30: noSQL choices

mySQL

Page 31: noSQL choices

NOSQL databases

Page 32: noSQL choices

Key-value stores

Page 33: noSQL choices
Page 34: noSQL choices
Page 35: noSQL choices

Why is it good?

• Hyper fast data storing and retrievals • Good for storing sessions from users– User profiles on forums– Shopping carts on websites

Page 36: noSQL choices

Why is it bad?

• Can’t query for values within the values• Need to know the key to properly query

Page 37: noSQL choices

Examples of key-stores

• CouchDB• Aerospike• Hyperdex• Flare• Dynamo• Redis

Page 38: noSQL choices

Most popular key-store: Redis

Page 39: noSQL choices

• Able to write 114293.71 requests per second • Able to read 81234.77 requests per second• https://redis-docs.readthedocs.org/en/

latest/Benchmarks.html

Page 40: noSQL choices

Companies that use Redis

• Twitter• Github• Pinterest• Snapchat• Flickr• Hulu• Vine• Imgur• Craigslist

Page 41: noSQL choices

Document Databases

Page 42: noSQL choices
Page 43: noSQL choices
Page 44: noSQL choices

Why is it good?

• Very easy to write up• Turn objects directly into Json files and easily turn Json

files into objects

• Easy to store data, documents contain whatever key and value you want

• No schema• Documents are independent units, easy to

distribute• No need for data to be related at all

Page 45: noSQL choices

Why is it good? (cont)

• Very, very programmer friendly• Good for:– Event logging– Content managing systems– E-commerce applications– Real-time analytics

Page 46: noSQL choices

Why is it bad?

• Tends to struggle when database is too big.• Not good at handling data that are very

related to each other• Not designed to handle cross-document operations

• Can’t slice data

Page 47: noSQL choices

Examples of document stores

• Mongo DB• lotusNotes• Apache Couch DB

Page 48: noSQL choices

Most popular Document Store: Mongo DB

Page 49: noSQL choices

Companies that use MongoDB

• Expedia• The Weather Channel• Forbes• Otto

Page 50: noSQL choices

Graph Stores

Page 51: noSQL choices
Page 52: noSQL choices
Page 53: noSQL choices

“If you can whiteboard it, you can graph it”

Page 54: noSQL choices

Why is it good?

• Well suited for analyzing interconnections• Very good for data that involve complex

relationships• High interest in mining social media data• Used for creating “recommended products”

on sales websites

Page 55: noSQL choices

Why is it bad?

• Not good at updating all, or a subset of entities

• Changing a property on all nodes in not a straight-forward approach

• Some databases may not be able to handle large amounts of data

Page 56: noSQL choices

Most popular graph database: Neo4j

Page 57: noSQL choices

Companies that use Neo4j

• Ebay• Tomtom• Hp• Walmart• eHarmony

Page 58: noSQL choices

Column Stores

Page 59: noSQL choices
Page 60: noSQL choices

Row vs Column store

Page 61: noSQL choices

Why is it good?

• Designed for gigantic amounts of data• Far better than row store, doesn’t waste time

searching• 10,000 rows. If you are looking for a value in a

single column, no need to read every single row.• Good for blogs, forums• Event logging• When you want to count and categorize certain

values

Page 62: noSQL choices

Why is it bad?

• Not good at working with systems that require ACID transactions for writes and reads

• If the data set is small, it is better of to use relational databases– If you just need to look at rows, relational

database is much better• Or a bunch of columns

Page 63: noSQL choices

Most popular Column-family store: Cassandra

Page 64: noSQL choices

Companies that use Cassandra• Walmart• VMWare• Unity• Ubisoft• Sony• Reddit• Paypal• Netflix• Nasa• Instagram• IBM• Fedix• Ebay• Call of Duty

Page 65: noSQL choices

Scaling in Cassandra

• Horizontal scaling• A matter of adding more nodes• Add more nodes = cluster support more writes

and reads• While clusters are working, you can still add

more nodes

Page 66: noSQL choices

Benchmark reports

Page 67: noSQL choices

Throughput

• Higher, the better• The power of the database engine

Page 68: noSQL choices

Latency guidelines

• Excellent: < 1ms• Very good: < 5ms• Good: 5 – 10ms• Poor: 10 – 20ms• Bad: 20 – 100ms• Really bad: 100 – 500ms• OMG!: > 500ms

Page 69: noSQL choices

The University of Toronto test (2012)

• Cassandra 1.0.0 rc2• Redis 2.4.2• Hbase v0.90.4• Voldmort 0.90.1• MySQL – 5.5.17

Page 70: noSQL choices

The tests

• Workload R (95% reads)• Workload RW (50% writes, 50% reads)• Workload W (99% writes)

Page 71: noSQL choices
Page 72: noSQL choices
Page 73: noSQL choices
Page 74: noSQL choices
Page 75: noSQL choices
Page 76: noSQL choices
Page 77: noSQL choices

Conclusion

• Cassandra – Highest Scalability, suffered in latency

• Redis – Highest initial troughput in read-intensive workloads. Latency very low

Page 78: noSQL choices

Conclusion (cont.)

• MySQL – Almost the same as Cassandra, latency is better

• HBase – Lowest throughput. Highest latency for reading. Lower latency for writing

Page 79: noSQL choices

EndPoint: Benchmarking Top NoSQL Databases

• Published: April 13, 2015• Updated: May 27, 2015• Cassandra (2.1.0)• Couchbase (3.0.1)• MongoDB (3.0)• Hbase(0.98.6-1 and Hadoop (2.6.0))

Page 80: noSQL choices

What was updated?

• Cassandra’s and Hbase’s performance went far up after updating results

Page 81: noSQL choices

Workload selection

• Workloads selected to be similar to today’s applications

• Database nodes: (30.5 GB RAM, 4 CPU cores, and a single volume of 800 GB of SSD local storage)

• All data had no data loss• Used data volumes that exceeded RAM

capacity on each node

Page 82: noSQL choices

Workloads

• Read-mostly: 95% read, 5% update ratio• Read/write: 50% read, 50% update• Read-modify-write: 50% read to 50% read-

modify-write ratio• Insert mostly: 90% insert, 10% read• 9 million operations per workload

Page 83: noSQL choices
Page 84: noSQL choices
Page 85: noSQL choices
Page 86: noSQL choices
Page 87: noSQL choices
Page 88: noSQL choices
Page 89: noSQL choices
Page 90: noSQL choices
Page 91: noSQL choices

Problems

• Couchbase• HBase• MongoDB

Page 92: noSQL choices

Conclusion

• Cassandra outperform everyone heavily in latency and troughput

• Hbase or CouchDB came second• MongoDB came last in most test cases

Page 93: noSQL choices

Altoros: The NoSQL Technical Comparison Report

• Published September 2014• Pretty unbiased• Couchbase: 2.5.1• MongoDB: 2.6.1• Cassandra: 2.0.8

Page 94: noSQL choices
Page 95: noSQL choices

Workload B

• 50% read operations • 40% update operations • 5% insert operations• 5% delete operations• 50 million 1 KB records

Page 96: noSQL choices
Page 97: noSQL choices

Workload B

• 3 million 10 KB records

Page 98: noSQL choices
Page 99: noSQL choices

Workload C

• 90% read operations• 8% update operations• 1% insert operations• 1% delete operations.• 3 million 10 KB records (50 million records is

similar to workload B results)

Page 100: noSQL choices
Page 101: noSQL choices

Scalability

Page 102: noSQL choices
Page 103: noSQL choices
Page 104: noSQL choices
Page 105: noSQL choices

Conclusions

• Cassandra has amazing scalability again• Cassandra is weaker at reading in terms of

latency• MongoDB has the worst latency results in

almost all fields

Page 106: noSQL choices

Overall conclusion

• Can’t state a single noSQL structure beats all• How about combining?• POLYGOT PERSISTENCE

Page 107: noSQL choices

Example: Shopping Site

Page 108: noSQL choices

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

Page 109: noSQL choices

Key/value

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

Page 110: noSQL choices

Key/value

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

RDBMS

Page 111: noSQL choices

Key/value

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

RDBMS Document

Page 112: noSQL choices

Key/value

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

RDBMS Document Graph

Page 113: noSQL choices