noSQL choices

Preview:

Citation preview

noSQL choices

What is mySQL?

What is noSQL?

Types of noSQL databases

Why noSQL?

Differences between noSQL and MYSQL

Aggregated data vs tuples

ACID vs BASE transactions

• A – Atomicity• C – Consistency• I – Isolation• D - Durabilty

Schema vs Schema-less

The 5 main data stores

• Relational Databases• Key-store• Document Databases• Graph Stores• Column Stores

Relational DatabasesAKA RDBMS

Why is it good?

• Super flexible• Proven to work, dominant in the market for 3

years• Robust, Stable• Very consistent• Follows ACID transitions, making it industry

standard

Why is it bad?

• Strongly typed columns• Inefficient with high volumes of data• Not designed for clusters• ONLY EFFICIENT WITH STRUCTRED DATA• Vertical scaling, need to buy bigger computer

to process bigger data

mySQL

NOSQL databases

Key-value stores

Why is it good?

• Hyper fast data storing and retrievals • Good for storing sessions from users– User profiles on forums– Shopping carts on websites

Why is it bad?

• Can’t query for values within the values• Need to know the key to properly query

Examples of key-stores

• CouchDB• Aerospike• Hyperdex• Flare• Dynamo• Redis

Most popular key-store: Redis

• Able to write 114293.71 requests per second • Able to read 81234.77 requests per second• https://redis-docs.readthedocs.org/en/

latest/Benchmarks.html

Companies that use Redis

• Twitter• Github• Pinterest• Snapchat• Flickr• Hulu• Vine• Imgur• Craigslist

Document Databases

Why is it good?

• Very easy to write up• Turn objects directly into Json files and easily turn Json

files into objects

• Easy to store data, documents contain whatever key and value you want

• No schema• Documents are independent units, easy to

distribute• No need for data to be related at all

Why is it good? (cont)

• Very, very programmer friendly• Good for:– Event logging– Content managing systems– E-commerce applications– Real-time analytics

Why is it bad?

• Tends to struggle when database is too big.• Not good at handling data that are very

related to each other• Not designed to handle cross-document operations

• Can’t slice data

Examples of document stores

• Mongo DB• lotusNotes• Apache Couch DB

Most popular Document Store: Mongo DB

Companies that use MongoDB

• Expedia• The Weather Channel• Forbes• Otto

Graph Stores

“If you can whiteboard it, you can graph it”

Why is it good?

• Well suited for analyzing interconnections• Very good for data that involve complex

relationships• High interest in mining social media data• Used for creating “recommended products”

on sales websites

Why is it bad?

• Not good at updating all, or a subset of entities

• Changing a property on all nodes in not a straight-forward approach

• Some databases may not be able to handle large amounts of data

Most popular graph database: Neo4j

Companies that use Neo4j

• Ebay• Tomtom• Hp• Walmart• eHarmony

Column Stores

Row vs Column store

Why is it good?

• Designed for gigantic amounts of data• Far better than row store, doesn’t waste time

searching• 10,000 rows. If you are looking for a value in a

single column, no need to read every single row.• Good for blogs, forums• Event logging• When you want to count and categorize certain

values

Why is it bad?

• Not good at working with systems that require ACID transactions for writes and reads

• If the data set is small, it is better of to use relational databases– If you just need to look at rows, relational

database is much better• Or a bunch of columns

Most popular Column-family store: Cassandra

Companies that use Cassandra• Walmart• VMWare• Unity• Ubisoft• Sony• Reddit• Paypal• Netflix• Nasa• Instagram• IBM• Fedix• Ebay• Call of Duty

Scaling in Cassandra

• Horizontal scaling• A matter of adding more nodes• Add more nodes = cluster support more writes

and reads• While clusters are working, you can still add

more nodes

Benchmark reports

Throughput

• Higher, the better• The power of the database engine

Latency guidelines

• Excellent: < 1ms• Very good: < 5ms• Good: 5 – 10ms• Poor: 10 – 20ms• Bad: 20 – 100ms• Really bad: 100 – 500ms• OMG!: > 500ms

The University of Toronto test (2012)

• Cassandra 1.0.0 rc2• Redis 2.4.2• Hbase v0.90.4• Voldmort 0.90.1• MySQL – 5.5.17

The tests

• Workload R (95% reads)• Workload RW (50% writes, 50% reads)• Workload W (99% writes)

Conclusion

• Cassandra – Highest Scalability, suffered in latency

• Redis – Highest initial troughput in read-intensive workloads. Latency very low

Conclusion (cont.)

• MySQL – Almost the same as Cassandra, latency is better

• HBase – Lowest throughput. Highest latency for reading. Lower latency for writing

EndPoint: Benchmarking Top NoSQL Databases

• Published: April 13, 2015• Updated: May 27, 2015• Cassandra (2.1.0)• Couchbase (3.0.1)• MongoDB (3.0)• Hbase(0.98.6-1 and Hadoop (2.6.0))

What was updated?

• Cassandra’s and Hbase’s performance went far up after updating results

Workload selection

• Workloads selected to be similar to today’s applications

• Database nodes: (30.5 GB RAM, 4 CPU cores, and a single volume of 800 GB of SSD local storage)

• All data had no data loss• Used data volumes that exceeded RAM

capacity on each node

Workloads

• Read-mostly: 95% read, 5% update ratio• Read/write: 50% read, 50% update• Read-modify-write: 50% read to 50% read-

modify-write ratio• Insert mostly: 90% insert, 10% read• 9 million operations per workload

Problems

• Couchbase• HBase• MongoDB

Conclusion

• Cassandra outperform everyone heavily in latency and troughput

• Hbase or CouchDB came second• MongoDB came last in most test cases

Altoros: The NoSQL Technical Comparison Report

• Published September 2014• Pretty unbiased• Couchbase: 2.5.1• MongoDB: 2.6.1• Cassandra: 2.0.8

Workload B

• 50% read operations • 40% update operations • 5% insert operations• 5% delete operations• 50 million 1 KB records

Workload B

• 3 million 10 KB records

Workload C

• 90% read operations• 8% update operations• 1% insert operations• 1% delete operations.• 3 million 10 KB records (50 million records is

similar to workload B results)

Scalability

Conclusions

• Cassandra has amazing scalability again• Cassandra is weaker at reading in terms of

latency• MongoDB has the worst latency results in

almost all fields

Overall conclusion

• Can’t state a single noSQL structure beats all• How about combining?• POLYGOT PERSISTENCE

Example: Shopping Site

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

Key/value

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

Key/value

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

RDBMS

Key/value

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

RDBMS Document

Key/value

OrdersCart Catalog &

Reviews

Suggestions

E-Commerce platform

RDBMS Document Graph

Recommended