29
To SQL or NoSQL? That is the question Krishnakumar S D E V C O N Kochi 28 th March 2015

To SQL or NoSQL, that is the question

Embed Size (px)

Citation preview

To SQL or NoSQL?That is the question

Krishnakumar S

D E V C O N Kochi

28th March 2015

History of Database Systems

1960’s : Hierarchical and Network (IMS, CODASYL etc.)

1970’s : Beginning of theory of relational model of database

1980’s : Rise of RDBMS and SQL

1990’s : Spreadsheets and MySQL; evolution of web

2000’s : Large enterprise & open source; Google & Amazon

2010’s : Emergence of NoSQL systems

2020’s : NewSQL?

CAP theore

m

RDBMS

Strong foundation – Relational Model Highly Structured – rows, columns, data

types Structured Query Language - standardized ACID properties – all or nothing Joins – new views from relationships

RDBMS – Weakness

Joins – Not scalable Transactions – Read & write operations will

be slow because of locking resources Fixed definitions – Difficult to work with

highly variable data Document integration – difficult create

reports based on structured & unstructured data

Changing scenarios...

Changing scenarios...

Changing scenarios...

Changing scenarios...

What drives to NoSQL?

Velocity Agility

Volume

Variability

Any existing solution?

• Data partitioning• Replication• Clustering• Query distribution• Load balancing• Consistency/Syncing• Latency/Concurrency• Network bottle neck• Multiple data centers• Distributed backups• Node failures• Voting algorithms for failure detection• Administration of many systems• Monitoring

RDBMS is scalable only if designed & administered correctly (Period)

NoSQL! What is in a name?

1998 :

• Carlo Strozzi developed a open-source relational database “Strozzi NoSQL”• Database stores tables as ASCII files; tuples as tab separated values• It doesn’t use SQL as query language – so given the name “NoSQL”• Instead it used UNIX shell script and pipeline to retrieve data

Irony! A relational database is named as NoSQL!

2009 :

• Johan Oskarsson organized a meetup of people developing open-source, distributed, non relational databases on June 11, 2009• He wanted a simple twitter hash tag for the meetup; quick, memorable, & helps Google search • Eric Evans come up with the name NoSQL, for the single meetup

NoSQL! What is in a name?

• The name is negative• The name does not describe the purpose of their meet up• The name does not define the new database system• But; the name just satisfied the twitter tag! And caught on like wildfireWhat does it stands for!

• “No to SQL”? Not exactly• “Not Only SQL”? Then what about SQL Server, Oracle etc.?

The answer is “You don’t worry about what it stands for!

NoSQL

• The NoSQL is a movement• The NoSQL is an ecosystem for future database technology• NoSQL is an accidental neologism. There is no prescriptive definition

Characteristics of NoSQL

• Not using the relational model• Running well in clusters• Open-source• Built for 21st century web estates• Schemaless

The most important result of NoSQL movement is; Polyglot Persistence

Theorems Ahead!

Brewer’s CAP theorem

• In 2000, Eric Brewer presented the CAP principle as conjuncture• In 2002, Seth Gilbert & Nancy Lynch published a formal proof and rendered the principle as CAP theorem

There are three essential system requirements necessary for the successful design, implementation, and deployment of applications in distributed computing

1. Consistency2. Availability3. Partition Tolerance

In majority of instances, a distributed system can only guarantee any two, not all three

Brewer’s CAP theorem

Consistency refers to whether a system operates fully or not. Do all nodes within a cluster see all the data they are supposed to? This is the same idea presented in ACID

Availability means just as it sounds. Is the given service or system available when requested? Does each request get a response outside of failure or success?

Partition Tolerance represents the fact that a given system continues to operate even under circumstances of data loss or system failure. A single node failure should not cause the entire system to collapse.

In large scale, distributed, non relational systems, they need availability and partition tolerance, so consistency suffers and ACID collapses

Brewer’s CAP theorem

Pick any two

CA AP

CP

RDBMS’sSQL ServerOracleMySQL etc.

Availability Each client can always read and

write

ConsistencyAll clients always have he same

viewof data

PartitionTolerance

The system works well despite physicalNetwork partitions

Bigtable, MongoDB, BerkleyDB, MemcacheDB, Hbase etc

CassandraCouchDBDynamoVoldemort

BASE

Basically Available : states that the system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state

Soft state : The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’

Eventual Consistency : The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one

It’s OK to use stale data; it’s OK to give approximate answers.

NoSQL Data Architecture Patterns

Key-Valuekey value

key value

key value

key value

Column-Family

Graph Document

Key-Value

Key-Valuekey value

key value

key value

key value

Keys used to access opaque blobs of data

Values can contain any type of data (images, video)

Pros: scalable, simple API (put, get, delete)

Cons: no way to query based on the content of the value

Column family

Column-Family Key includes a row, column

family and column name Store versioned blobs in one

large table Queries can be done on rows,

column families and column names

Pros: Good scale out Cons: Can not query blob

content, row and column designs are critical

Graph Store

Graph Data is stored in a series of nodes and properties

Queries are really graph traversals Ideal when relationships between

data is key: e.g. social networks

Pros: fast network search, works with public linked data sets

Cons: Poor scalability when graphs don't fit into RAM, specialized query language

Document Store

Document Data stored in nested

hierarchies Logical data remains stored

together as a unit Any item in the document can

be queried Pros: No object-relational

mapping layer, ideal for search Cons: Complex to implement,

incompatible with SQL

NoSQL & Functional Programming

NoSQL & Functional Programming

Polyglot Persistence

Different database systems are designed to solve different problemsUsing single database engine for all the requirements leads to non-performant solutions

The solution is polyglot persistence; a hybrid approach to data persistence

NoSQL - Evolution

© Natalino Busa

References

• Making Sense of NoSQL – Dan McCreary and Ann Kelly• NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot

Persistence - Pramod J. Sadalage and Martin Fowler• Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence - John Sharp, Douglas McMurtry, Andrew Oakley, Mani Subramanian, Hanzhong Zhang