NoSQL Databases

NoSQL DatabasesAn efficient way to store and query heterogeneous

astronomical data in DACE Nicolas Buchschacher - University of Geneva - ADASS 2018

• Data and Analysis Center for Exoplanets.• Facility to store, exchange and analyse data related to exoplanets

(observations and synthetic populations).• Web front end and python API to query the database (in dev).• 3D visualisation (come to the demo booth !)

Nicolas Buchschacher - University of Geneva - ADASS 2018

DACEhttps://dace.unige.ch

• Store heterogeneous observational data produced by different instruments.

• Regularly adapt the data model (we are at the end of the chain).• Store synthetic population simulations data points (~500M / Pop).• Ensure high availability (load balancing) and persistence (replications).

The Requirements

CASSANDRA• Fully distributed database developed by Apache.• Column oriented storage => can have thousand columns in a table.• High availability (no Master-Slave). Every node is equivalent.• Asynchronous and automatic replication across the nodes based on

a replication factor.• Consistency level can be controlled for each query.

CAP (Brewer) TheoremThis is impossible for a distributed database to simultaneously provide more than 2 of the 3 following requirements: • Consistency: Every read receives the most recent value, or an error.• Availability: Every read or write request receives a non-error reply.• Partition tolerance: The system continues to work properly if

there are some missing nodes.

=> Cassandra is considered as an AP database. But consistency can be managed with the “consistency level” in each query.

CASSANDRA (Storage)

ID Param1 Param2 Param31 v 11 v 21 NULL2 v 12 NULL NULL3 v 13 NULL v 334 v 14 v 24 NULL

ID Param11 v 112 v 123 v 134 v 14

ID Param21 v 214 v 24

ID Param33 v 33

Row Oriented (SQL) Column Oriented

CASSANDRA (Storage)

ID Param1 Param2 Param3 Param41 v 11 v 21 NULL NULL2 v 12 NULL NULL NULL3 v 13 NULL v 33 NULL4 v 14 v 24 NULL NULL5 v 15 NULL v 35 v 45

ID Param11 v 112 v 123 v 134 v 145 v 51

Column Oriented

ID Param45 v 45

Row Oriented (SQL)

CASSANDRA Partitions & Replication

ID Param1 Param2 Param3 Param4

1 v 11 v 21 NULL v 41

2 v 12 NULL NULL NULL

3 v 13 NULL v 33 v 43

4 v 14 v 24 NULL v 44

DC 1 (rep = 2)

ID Param1 Param2 Param3 Param4

1 v 11 v 21 NULL v 41

2 v 12 NULL NULL NULL

3 v 13 NULL v 33 v 43

4 v 14 v 24 NULL v 44

query ID = 2 or 4

DC 1 (rep = 2)

DC 1 (rep = 2) DC 2 (rep = 3)

DC 3 (rep = 2)

CASSANDRA Pros & Cons+ Store a lot of heterogenous columns without performance impact.+ Fully distributed (no Master-Slave): all nodes are equivalent.+ Easily scalable by adding new nodes.+ Open source, compatible with major operating systems.+ Easily expandable to the PB scale (Apple ex: 75’000 nodes).

- No relation between tables => No JOIN operations.- Not transactional (not important in our case).- Poor set of search and filter operations.

Solr / SolrCloud• Open source enterprise search platform.• Powerful indexer (full-text search, spatial, filtering, sort …) based on

Apache Lucene.• REST API with a lot of supported formats (JSON, CSV, Python,…).• SolrCloud is the distributed version of Solr.

DSE combine both in a single software … but it’s not free !!!

SQL NoSQLFixed schema, avoid column / table manipulations

Flexible schema, easily add new columns / parameters

Vertical scalability. Increase CPU, RAM and disks Horizontal scalability (add nodes)

JOIN operations No JOIN, no subqueriesTransactional operations Not transactional

Centralised approach. Load balance using Master-Slave

De-centralised approach. Naturally load balanced

Long history and experience (1970) New generation (~2000)

SQL vs NoSQL comparisons

Conclusion

• Is NoSQL better ? No !!! It’s different …• Consider NoSQL databases if you need to store heterogeneous data

and want a flexible data model.• NoSQL is Big Data oriented.• Consider NoSQL databases if availability and scalability is a strong

requirement.• SQL is still on the stage and very powerful (we still use Postgres)• Are VO standards ready for NoSQL ?

Thank you

NoSQL Databases - UMD

Documents

SQL vs. NoSQL Databases

A Survey on NoSQL Databases - ایران عرضه · A Survey on NoSQL Databases Santhosh Kumar Gajendran 1. Abstract NoSQL databases have gained popularity in the recent years and

NoSQL Databases Introduction - UTN 2013

NoSQL Databases Oracle - Berkeley DB

Test Automation for NoSQL Databases

NoSQL Databases and Monitoring - IT Monitoring Working Groupjeromebelleman.gitlab.io/talks/belleman-nosql-itmon.pdf · NoSQL Databases and Monitoring IT Monitoring Working Group J

NoSQL Databases - CouchDB

NoSQL: Graph Databases. Databases Why NoSQL Databases?

Dynamo and NoSQL Databases

A Comparison of SQL and NoSQL Databases Databases

Query mechanisms for NoSQL databases

Introduction to NOSQL Databases

NoSql Databases and Polyglot Applications

Traditional Databases vs NOSQL

Scaling the Web: Databases & NoSQL

Front Range PHP NoSQL Databases

Introduction to NoSQL Databases - Uppsala University to NoSQL Databases Tore Risch ... Mongo-AS is slightly slower ...

INFO-H415 - Advanced Databases NoSQL databases and Cassandracs.ulb.ac.be/public/_media/teaching/cassandra_2017.pdf · INFO-H415 - Advanced Databases NoSQL databases and Cassandra

TECHNOLOGY NEWS Will NoSQL Databases Live Up to Their … · NOSQL PROS AND CONS NoSQL databases have numerous advantages and disadvantages. Advantages NoSQL databases generally pro-cess