92
RDBMS vs NoSQL Relational Database Management Systems versus Big Data Management (NoSQL) Systems ARNAB BHATTACHARYA [email protected] Department of Computer Science and Engineering, Indian Institute of Technology, Kanpur, India 9th August, 2017 TEQIP Short Course on Big Data

RDBMS vs NoSQL - IIT Kanpur vs NoSQL Relational Database Management Systems versus Big Data Management (NoSQL) Systems ARNAB BHATTACHARYA [email protected] Department of Computer

  • Upload
    haxuyen

  • View
    226

  • Download
    2

Embed Size (px)

Citation preview

RDBMS vs NoSQLRelational Database Management Systems

versusBig Data Management (NoSQL) Systems

ARNAB [email protected]

Department of Computer Science and Engineering,Indian Institute of Technology, Kanpur,

India

9th August, 2017TEQIP Short Course on Big Data

Database

Concept of a database

A database is a collection of interrelated dataA database management system (DBMS) provides anenvironment that is efficient and convenient to usePrograms and interface to

Store dataVisualize dataAccess (query) dataManipulate data

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 2 / 24

RDBMS

Relational DBMS

Table-based

Relational algebra as mathematical backgroundOperators precisely definedOperands are relationsRelations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuplesQuery across relations (joins) are naturalProcedures can be coded into RDBMS engineTriggers and views are supported

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 3 / 24

RDBMS

Relational DBMS

Table-basedRelational algebra as mathematical backgroundOperators precisely definedOperands are relations

Relations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuplesQuery across relations (joins) are naturalProcedures can be coded into RDBMS engineTriggers and views are supported

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 3 / 24

RDBMS

Relational DBMS

Table-basedRelational algebra as mathematical backgroundOperators precisely definedOperands are relationsRelations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuples

Query across relations (joins) are naturalProcedures can be coded into RDBMS engineTriggers and views are supported

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 3 / 24

RDBMS

Relational DBMS

Table-basedRelational algebra as mathematical backgroundOperators precisely definedOperands are relationsRelations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuplesQuery across relations (joins) are natural

Procedures can be coded into RDBMS engineTriggers and views are supported

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 3 / 24

RDBMS

Relational DBMS

Table-basedRelational algebra as mathematical backgroundOperators precisely definedOperands are relationsRelations are sets of tuplesTuples consist of named attributesConcept of candidate keys to uniquely identify tuplesQuery across relations (joins) are naturalProcedures can be coded into RDBMS engineTriggers and views are supported

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 3 / 24

RDBMS

SQL

Structured Query LanguageFormally defined programming language based on relationalalgebraDeclarative language

RDBMS engine free to choose implementation of operationsDecades of query optimizationIndexing

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 4 / 24

RDBMS

SQL

Structured Query LanguageFormally defined programming language based on relationalalgebraDeclarative languageRDBMS engine free to choose implementation of operationsDecades of query optimizationIndexing

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 4 / 24

RDBMS

Transactions

RDBMSs offer in-built transaction support

A transaction is a logical unit of a programACID properties to preserve data integrity

Atomicity: either all operations or noneConsistency: database remains consistent before and after atransactionIsolation: one transaction has no effect on other even if they runconcurrentlyDurability: effect of a transaction is permanent

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 5 / 24

RDBMS

Transactions

RDBMSs offer in-built transaction supportA transaction is a logical unit of a program

ACID properties to preserve data integrityAtomicity: either all operations or noneConsistency: database remains consistent before and after atransactionIsolation: one transaction has no effect on other even if they runconcurrentlyDurability: effect of a transaction is permanent

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 5 / 24

RDBMS

Transactions

RDBMSs offer in-built transaction supportA transaction is a logical unit of a programACID properties to preserve data integrity

Atomicity: either all operations or noneConsistency: database remains consistent before and after atransactionIsolation: one transaction has no effect on other even if they runconcurrentlyDurability: effect of a transaction is permanent

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 5 / 24

RDBMS

Schedules

A schedule is a chronological sequence of instructions fromconcurrent transactionsIf a transaction appears in a schedule, all instructions of thetransaction must appear in the scheduleOrder of instructions within a transaction must be maintained inthe schedule

To increase concurrencyMultiple transactions should be able to run simultaneously

Serializability ensures correctnessRecoverability ensures consistency despite failures

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 6 / 24

RDBMS

Schedules

A schedule is a chronological sequence of instructions fromconcurrent transactionsIf a transaction appears in a schedule, all instructions of thetransaction must appear in the scheduleOrder of instructions within a transaction must be maintained inthe scheduleTo increase concurrency

Multiple transactions should be able to run simultaneously

Serializability ensures correctnessRecoverability ensures consistency despite failures

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 6 / 24

RDBMS

Schedules

A schedule is a chronological sequence of instructions fromconcurrent transactionsIf a transaction appears in a schedule, all instructions of thetransaction must appear in the scheduleOrder of instructions within a transaction must be maintained inthe scheduleTo increase concurrency

Multiple transactions should be able to run simultaneously

Serializability ensures correctness

Recoverability ensures consistency despite failures

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 6 / 24

RDBMS

Schedules

A schedule is a chronological sequence of instructions fromconcurrent transactionsIf a transaction appears in a schedule, all instructions of thetransaction must appear in the scheduleOrder of instructions within a transaction must be maintained inthe scheduleTo increase concurrency

Multiple transactions should be able to run simultaneously

Serializability ensures correctnessRecoverability ensures consistency despite failures

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 6 / 24

RDBMS

Issues of RDBMS

Scalability of RDBMS is a problemIt is at most vertical, i.e., across relationsAll tuples in a relation must stay in one machine

Distributed design is harderIndexing across distributed machines is not provided naturallyHard to model complex data

HierarchicalSpatio-temporalGraphsSemi-structured

Unnatural way to model data

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 7 / 24

RDBMS

Issues of RDBMS

Scalability of RDBMS is a problemIt is at most vertical, i.e., across relationsAll tuples in a relation must stay in one machineDistributed design is harder

Indexing across distributed machines is not provided naturallyHard to model complex data

HierarchicalSpatio-temporalGraphsSemi-structured

Unnatural way to model data

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 7 / 24

RDBMS

Issues of RDBMS

Scalability of RDBMS is a problemIt is at most vertical, i.e., across relationsAll tuples in a relation must stay in one machineDistributed design is harderIndexing across distributed machines is not provided naturally

Hard to model complex dataHierarchicalSpatio-temporalGraphsSemi-structured

Unnatural way to model data

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 7 / 24

RDBMS

Issues of RDBMS

Scalability of RDBMS is a problemIt is at most vertical, i.e., across relationsAll tuples in a relation must stay in one machineDistributed design is harderIndexing across distributed machines is not provided naturallyHard to model complex data

HierarchicalSpatio-temporalGraphsSemi-structured

Unnatural way to model data

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 7 / 24

NoSQL

NoSQL

NoSQL is

not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 8 / 24

NoSQL

NoSQL

NoSQL is not “no-SQL”It is not only SQL

It does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 8 / 24

NoSQL

NoSQL

NoSQL is not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignore

Scalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 8 / 24

NoSQL

NoSQL

NoSQL is not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachines

Flexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 8 / 24

NoSQL

NoSQL

NoSQL is not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling data

Distribution support is in-built

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 8 / 24

NoSQL

NoSQL

NoSQL is not “no-SQL”It is not only SQLIt does not aim to provide the ACID propertiesOriginated as no-SQL thoughLater changed since RDBMS is too powerful to always ignoreScalability is horizontal, i.e., can put tuples across ditributedmachinesFlexibility to model any kind of dataNatural way of modeling dataDistribution support is in-built

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 8 / 24

NoSQL

CAP theorem

All of C, A, P cannot be satisfied simultaneouslyCA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistentNot a theorem – just a hypothesis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 9 / 24

NoSQL

CAP theorem

All of C, A, P cannot be satisfied simultaneouslyCA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistentNot a theorem – just a hypothesis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 9 / 24

NoSQL

CAP theorem

All of C, A, P cannot be satisfied simultaneously

CA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistentNot a theorem – just a hypothesis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 9 / 24

NoSQL

CAP theorem

All of C, A, P cannot be satisfied simultaneouslyCA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistent

Not a theorem – just a hypothesis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 9 / 24

NoSQL

CAP theorem

All of C, A, P cannot be satisfied simultaneouslyCA: single-site; partitioning is not allowedCP: what is available is consistentAP: everything is available but may not be consistentNot a theorem – just a hypothesis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 9 / 24

NoSQL

BASE properties

Basically Available: System guarantees availabilitySoft state: State of system is soft, i.e., it may change without inputto maintain consistencyEventual consistency: Data will be eventually consistent withoutany interim perturbation

Sacrifices consistencyTo counter ACID

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 10 / 24

NoSQL

BASE properties

Basically Available: System guarantees availabilitySoft state: State of system is soft, i.e., it may change without inputto maintain consistencyEventual consistency: Data will be eventually consistent withoutany interim perturbationSacrifices consistency

To counter ACID

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 10 / 24

NoSQL

BASE properties

Basically Available: System guarantees availabilitySoft state: State of system is soft, i.e., it may change without inputto maintain consistencyEventual consistency: Data will be eventually consistent withoutany interim perturbationSacrifices consistencyTo counter ACID

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 10 / 24

NoSQL

Types

Four main types of NoSQL data stores:1 Columnar families2 Bigtable systems3 Document databases4 Graph databases

http://nosql-database.org/

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 11 / 24

NoSQL

Columnar storage

Instead of rows being stored together, columns are storedconsecutivelyA single disk block (or a set of consecutive blocks) stores a singlecolumn familyA column family may consist of one or multiple columnsThis set of columns is called a super column

Two main typesColumnar relational modelsKey-value stores and/or big tables

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 12 / 24

NoSQL

Columnar storage

Instead of rows being stored together, columns are storedconsecutivelyA single disk block (or a set of consecutive blocks) stores a singlecolumn familyA column family may consist of one or multiple columnsThis set of columns is called a super columnTwo main types

Columnar relational modelsKey-value stores and/or big tables

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 12 / 24

NoSQL

Columnar relational models

Not NoSQL and is actually RDBMSColumn-wise storage on the disk

Allows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotablesNot good for updatesNot good when many columns of a few tuples are accessedGood for OLAP (online analytical processing)Not good for OLTP (online transaction processing)Example: MonetDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 13 / 24

NoSQL

Columnar relational models

Not NoSQL and is actually RDBMSColumn-wise storage on the diskAllows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotables

Not good for updatesNot good when many columns of a few tuples are accessedGood for OLAP (online analytical processing)Not good for OLTP (online transaction processing)Example: MonetDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 13 / 24

NoSQL

Columnar relational models

Not NoSQL and is actually RDBMSColumn-wise storage on the diskAllows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotablesNot good for updatesNot good when many columns of a few tuples are accessed

Good for OLAP (online analytical processing)Not good for OLTP (online transaction processing)Example: MonetDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 13 / 24

NoSQL

Columnar relational models

Not NoSQL and is actually RDBMSColumn-wise storage on the diskAllows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotablesNot good for updatesNot good when many columns of a few tuples are accessedGood for OLAP (online analytical processing)Not good for OLTP (online transaction processing)

Example: MonetDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 13 / 24

NoSQL

Columnar relational models

Not NoSQL and is actually RDBMSColumn-wise storage on the diskAllows faster querying when only few columns are touched on theentire dataAllows compression of columnsProvides better memory cachingJoins are faster since they are mostly on similar columns from twotablesNot good for updatesNot good when many columns of a few tuples are accessedGood for OLAP (online analytical processing)Not good for OLTP (online transaction processing)Example: MonetDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 13 / 24

NoSQL

Key-value stores

Two columns: a key and a valueKey is mostly textValue can be anything and is simply an object

Essentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 14 / 24

NoSQL

Key-value stores

Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”

Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 14 / 24

NoSQL

Key-value stores

Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-less

Can be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 14 / 24

NoSQL

Key-value stores

Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash table

All queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 14 / 24

NoSQL

Key-value stores

Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexed

Example: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 14 / 24

NoSQL

Key-value stores

Two columns: a key and a valueKey is mostly textValue can be anything and is simply an objectEssentially, actual data becomes “value” and an unique id isgenerated which becomes “key”Whole database is then just one big table with these two columnsBecomes schema-lessCan be distributed and is, thus, highly scalableIn essence, a big distributed hash tableAll queries are on keysKeys are necessarily indexedExample: Cassandra, CouchDB, HBase, Tokyo Cabinet, Redis

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 14 / 24

NoSQL

Bigtable systems

Started from Google’s BigTable implementationUses a key-value storeData can be replicated for better availability

Uses a timestampTimestamp is used to

Expire dataDelete stale dataResolve read-write conflicts

Same value can be indexed using multiple keysMap-reduce framework to computeExample: BigTable, HBase, Cassandra, HyperTable, SimpleDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 15 / 24

NoSQL

Bigtable systems

Started from Google’s BigTable implementationUses a key-value storeData can be replicated for better availabilityUses a timestampTimestamp is used to

Expire dataDelete stale dataResolve read-write conflicts

Same value can be indexed using multiple keysMap-reduce framework to computeExample: BigTable, HBase, Cassandra, HyperTable, SimpleDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 15 / 24

NoSQL

Bigtable systems

Started from Google’s BigTable implementationUses a key-value storeData can be replicated for better availabilityUses a timestampTimestamp is used to

Expire dataDelete stale dataResolve read-write conflicts

Same value can be indexed using multiple keysMap-reduce framework to compute

Example: BigTable, HBase, Cassandra, HyperTable, SimpleDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 15 / 24

NoSQL

Bigtable systems

Started from Google’s BigTable implementationUses a key-value storeData can be replicated for better availabilityUses a timestampTimestamp is used to

Expire dataDelete stale dataResolve read-write conflicts

Same value can be indexed using multiple keysMap-reduce framework to computeExample: BigTable, HBase, Cassandra, HyperTable, SimpleDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 15 / 24

NoSQL

Document databases

Uses documents as the main storage format of dataPopular document formats are XML, JSON, BSON, YAMLDocument itself is the key while the content is the valueDocument can be indexed by id or simply its location (e.g., URI)

Content needs to be parsed to make senseContent can be organised furtherExtremely useful for insert-once read-many scenariosCan use map-reduce framework to computeExample: MongoDB, CouchDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 16 / 24

NoSQL

Document databases

Uses documents as the main storage format of dataPopular document formats are XML, JSON, BSON, YAMLDocument itself is the key while the content is the valueDocument can be indexed by id or simply its location (e.g., URI)Content needs to be parsed to make senseContent can be organised further

Extremely useful for insert-once read-many scenariosCan use map-reduce framework to computeExample: MongoDB, CouchDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 16 / 24

NoSQL

Document databases

Uses documents as the main storage format of dataPopular document formats are XML, JSON, BSON, YAMLDocument itself is the key while the content is the valueDocument can be indexed by id or simply its location (e.g., URI)Content needs to be parsed to make senseContent can be organised furtherExtremely useful for insert-once read-many scenariosCan use map-reduce framework to compute

Example: MongoDB, CouchDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 16 / 24

NoSQL

Document databases

Uses documents as the main storage format of dataPopular document formats are XML, JSON, BSON, YAMLDocument itself is the key while the content is the valueDocument can be indexed by id or simply its location (e.g., URI)Content needs to be parsed to make senseContent can be organised furtherExtremely useful for insert-once read-many scenariosCan use map-reduce framework to computeExample: MongoDB, CouchDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 16 / 24

NoSQL

Graph databases

Nodes represent entities or objectsEdges encode relationships between nodesCan be directedCan have hyper-edges as well

Easier to find distances and neighborsExample: Neo4J, HyperGraph, Infinite Graph, Titan, FlockDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 17 / 24

NoSQL

Graph databases

Nodes represent entities or objectsEdges encode relationships between nodesCan be directedCan have hyper-edges as wellEasier to find distances and neighbors

Example: Neo4J, HyperGraph, Infinite Graph, Titan, FlockDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 17 / 24

NoSQL

Graph databases

Nodes represent entities or objectsEdges encode relationships between nodesCan be directedCan have hyper-edges as wellEasier to find distances and neighborsExample: Neo4J, HyperGraph, Infinite Graph, Titan, FlockDB

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 17 / 24

NoSQL Systems

NoSQL systems

Three most popular ones areHBaseCassandraMongoDBhttps://www.linkedin.com/pulse/real-comparison-nosql-databases-hbase-cassandra-mongodb-sahu

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 18 / 24

NoSQL Systems

HBase

Based on Hadoop, HDFS and BigTableKey-value store

Column storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be doneScales by adding nodesCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 19 / 24

NoSQL Systems

HBase

Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column families

Requires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be doneScales by adding nodesCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 19 / 24

NoSQL Systems

HBase

Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slaves

Strong consistencyVersioning can be doneScales by adding nodesCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 19 / 24

NoSQL Systems

HBase

Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistency

Versioning can be doneScales by adding nodesCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 19 / 24

NoSQL Systems

HBase

Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be done

Scales by adding nodesCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 19 / 24

NoSQL Systems

HBase

Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be doneScales by adding nodes

CP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 19 / 24

NoSQL Systems

HBase

Based on Hadoop, HDFS and BigTableKey-value storeColumn storesUses column familiesRequires Zookeeper to maintain distributed coordination,configuration and maintenanceCentralized master that dictates slavesStrong consistencyVersioning can be doneScales by adding nodesCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 19 / 24

NoSQL Systems

Cassandra

Column-store based on BigTable

Decentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 20 / 24

NoSQL Systems

Cassandra

Column-store based on BigTableDecentralized architecture

ReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 20 / 24

NoSQL Systems

Cassandra

Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any action

Strong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 20 / 24

NoSQL Systems

Cassandra

Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong security

Continuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 20 / 24

NoSQL Systems

Cassandra

Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availability

Extremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 20 / 24

NoSQL Systems

Cassandra

Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performance

Not fully consistentRequires quorum reads for consistencyAP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 20 / 24

NoSQL Systems

Cassandra

Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistency

AP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 20 / 24

NoSQL Systems

Cassandra

Column-store based on BigTableDecentralized architectureReplicatedAny node can perform any actionStrong securityContinuous availabilityExtremely good single-tuple read performanceNot fully consistentRequires quorum reads for consistencyAP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 20 / 24

NoSQL Systems

MongoDB

Document storeData stored in JSON or BSON format

Supports master-slave replicationStrong consistencyGood index supportData modeling flexibilityCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 21 / 24

NoSQL Systems

MongoDB

Document storeData stored in JSON or BSON formatSupports master-slave replication

Strong consistencyGood index supportData modeling flexibilityCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 21 / 24

NoSQL Systems

MongoDB

Document storeData stored in JSON or BSON formatSupports master-slave replicationStrong consistency

Good index supportData modeling flexibilityCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 21 / 24

NoSQL Systems

MongoDB

Document storeData stored in JSON or BSON formatSupports master-slave replicationStrong consistencyGood index support

Data modeling flexibilityCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 21 / 24

NoSQL Systems

MongoDB

Document storeData stored in JSON or BSON formatSupports master-slave replicationStrong consistencyGood index supportData modeling flexibility

CP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 21 / 24

NoSQL Systems

MongoDB

Document storeData stored in JSON or BSON formatSupports master-slave replicationStrong consistencyGood index supportData modeling flexibilityCP

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 21 / 24

NoSQL Systems

Issues of NoSQL systems

No join support (unless columnar RDBMS)Cannot work across tables

Requires unraveling of data values to answer deeper queriesNo natural or direct procedural supportConsistency

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 22 / 24

NoSQL Systems

Issues of NoSQL systems

No join support (unless columnar RDBMS)Cannot work across tablesRequires unraveling of data values to answer deeper queries

No natural or direct procedural supportConsistency

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 22 / 24

NoSQL Systems

Issues of NoSQL systems

No join support (unless columnar RDBMS)Cannot work across tablesRequires unraveling of data values to answer deeper queriesNo natural or direct procedural support

Consistency

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 22 / 24

NoSQL Systems

Issues of NoSQL systems

No join support (unless columnar RDBMS)Cannot work across tablesRequires unraveling of data values to answer deeper queriesNo natural or direct procedural supportConsistency

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 22 / 24

Discussion

Discussion

NoSQL, although started as anti-SQL, is no more soMore a realisation that, for some cases

RDBMS does not scale or distribute, orACIDity is an overkill

NoSQL is not good for every scenarioNot always consistency can be sacrificedMost legacy systems still use RDBMSMany NoSQL systems are increasingly using features of RDBMSNoSQL horizon is shifting rapidly with almost no control or senseHowever, trend is for NoSQL as cloud computing and big datarelies on ithttps://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB%3BPostgreSQL

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 23 / 24

Discussion

Discussion

NoSQL, although started as anti-SQL, is no more soMore a realisation that, for some cases

RDBMS does not scale or distribute, orACIDity is an overkill

NoSQL is not good for every scenarioNot always consistency can be sacrificedMost legacy systems still use RDBMS

Many NoSQL systems are increasingly using features of RDBMSNoSQL horizon is shifting rapidly with almost no control or senseHowever, trend is for NoSQL as cloud computing and big datarelies on ithttps://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB%3BPostgreSQL

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 23 / 24

Discussion

Discussion

NoSQL, although started as anti-SQL, is no more soMore a realisation that, for some cases

RDBMS does not scale or distribute, orACIDity is an overkill

NoSQL is not good for every scenarioNot always consistency can be sacrificedMost legacy systems still use RDBMSMany NoSQL systems are increasingly using features of RDBMS

NoSQL horizon is shifting rapidly with almost no control or senseHowever, trend is for NoSQL as cloud computing and big datarelies on ithttps://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB%3BPostgreSQL

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 23 / 24

Discussion

Discussion

NoSQL, although started as anti-SQL, is no more soMore a realisation that, for some cases

RDBMS does not scale or distribute, orACIDity is an overkill

NoSQL is not good for every scenarioNot always consistency can be sacrificedMost legacy systems still use RDBMSMany NoSQL systems are increasingly using features of RDBMSNoSQL horizon is shifting rapidly with almost no control or senseHowever, trend is for NoSQL as cloud computing and big datarelies on ithttps://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB%3BPostgreSQL

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 23 / 24

Discussion

Conclusions

RDBMS is still going strongNoSQL is catching up as a real choice and not just simply abuzzwordhttps://db-engines.com/en/

THANK YOU!

Questions?Answers!

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 24 / 24

Discussion

Conclusions

RDBMS is still going strongNoSQL is catching up as a real choice and not just simply abuzzwordhttps://db-engines.com/en/

THANK YOU!

Questions?Answers!

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 24 / 24

Discussion

Conclusions

RDBMS is still going strongNoSQL is catching up as a real choice and not just simply abuzzwordhttps://db-engines.com/en/

THANK YOU!

Questions?Answers!

Arnab Bhattacharya ([email protected]) RDBMS vs NoSQL 09/08/17 24 / 24