28
Is multi-model the future of NoSQL? Max Neunhöffer SouthBay.NET Meetup, 5 March 2015

Is multi-model the future of NoSQL?

Embed Size (px)

Citation preview

Is multi-model the future ofNoSQL?

Max Neunhöffer

SouthBay.NET Meetup, 5 March 2015

www.arangodb.com

Max NeunhöfferI am amathematician“Earlier life”: Research in Computer Algebra(Computational Group Theory)Always juggled with big dataNow: working in database development, NoSQL, ArangoDBI like:

research,hacking,teaching,tickling the highest performance out of computer systems.

1

ArangoDB GmbHtriAGENS GmbH offers consulting services since 2004:

software architectureproject managementsoftware developmentbusiness analysisa lot of experience with specialised database systemshave done NoSQL, before the term was coined at all2011/2012, an idea emerged:to build the database one had wished to have all those years!development of ArangoDB as open source software since 2012ArangoDB GmbH: spin-off to take care of ArangoDB (2014)

2

Document and Key/Value StoresDocument storeA document store stores a set of documents, which usuallymeans JSON data, these sets are called collections. Thedatabase has access to the contents of the documents.each document in the collection has a unique keysecondary indexes possible, leading to more powerful queriesdifferent documents in the same collection: structure can varyno schema is required for a collectiondatabase normalisation can be relaxedKey/value storeOpaque values, only key lookup without secondary indexes:

=⇒ high performance and perfect scalability3

Graph databasesGraph databaseA graph database stores a labelled graph. Vertices andedges can be documents. Graphs are good to modelrelations.graphs often describe data very naturally (e.g. the facebookfriendship graph)graphs can be stored using tables, however, graph queriesnotoriously lead to expensive joinsthere are interesting and useful graph algorithms like “shortestpath” or “neighbourhood”need a good query language to reap the benefitshorizontal scalability is troublesomegraph databases vary widely in scope and usage, no standard

4

Column-oriented data storesColumn-oriented data astoresA column-oriented database stores tables but “keepscolumns together” rather than rows.access to a whole column is fastsparse rows are handled efficientlyparticularly good for certain types of data analysisoften implemented in a key/value-like fashionrow access can be slowcolumns have homogeneous data, so compression works wellprominent examples: C-Store and Cassandra

5

Massively parallel: map-reduce and friendsThe area of massively parallelA massively parallel database can use thousands of serversdistributed all over the world and still appears as a singleservice.Humongous data capacity and very high read/writeperformanceexamples are Apache Cassandra, Apache Hadoop, Google’sSpanner, Riak and othersthese systems have important use cases, in particular in theanalytic domainquery capabilities are somewhat limited like for example only“map/reduce”

⇒ good horizontal scalability at the cost of reduced query flexibility6

Polyglot PersistenceIdeaUse the right data model for each part of a system.

For an application, persistan object or structured data as a JSON document,a hash table in a key/value store,relations between objects in a graph database,a homogeneous array in a relational DBMS.If the table has many empty cells or inhomogeneous rows, usea column-oriented database.

Take scalability needs into account!7

A typical Use Case— an Online ShopWe need to hold

customer data: usually homogeneous, but still variations=⇒ use a relational DB: MySQLproduct data: even for a specialised business quiteinhomogeneous=⇒ use a document store:shopping carts: need very fast lookup by session key=⇒ use a key/value store:order and sales data: relate customers and products=⇒ use a document store:recommendation engine data: links between different entities=⇒ use a graph database:

8

Polyglot Persistence is nice, but . . .Consequence: One needs multiple database systems in the persis-tence layer of a single project!Polyglot persistence introduces some friction through

data synchronisation,data conversion,increased installation and administration effort,more training needs.Wouldn’t it be nice, . . .. . . to enjoy the benefits without the disadvantages?

9

The Multi-Model ApproachMulti-model databaseA multi-model database combines a document store with agraph database and is at the same time a key/value store.Vertices are documents in a vertex collection,edges are documents in an edge collection.a single, common query language for all three data modelsis able to compete with specialised products on their turfallows for polyglot persistence using a single databasequeries can mix the different data modelscan replace a RDMBS in many cases

10

Why is this possible at all?

Document stores and key/value storesDocument stores: have primary key, are key/value stores.Without using secondary indexes, performance is nearly asgood as with opaque data instead of JSON.Good horizontal scalability can be achieved for key lookups.

11

Why is this possible at all?

Document stores and graph databasesgraph database: would like to associate arbitrary data withvertices and edges, so JSON documents are a good choice.

A good edge index, giving fast access to neighbours.This can be a secondary index.Graph support in the query language.Implementations of graph algorithms in the DB engine.

12

A Map of the NoSQL LandscapeTransaction Processing DBs

Analytic processing DBs

Map/reduce

Column Stores

Extensibility

Documents

Massively distributed

Graphs

Structured

Data

Key/Value

Complex queries

13

Use case: Aircraft fleet managementOne of our customers uses ArangoDB to

store each part, component, unit or aircraft as a documentmodel containment as a graphthus can easily find all parts of some componentkeep track of maintenance intervalsperform queries orthogonal to the graph structurethereby getting good efficiency for all needed queries

14

Use case: Family tree management

For genealogy, the natural object is a family tree.data naturally comes as a (directed) graphmany queries are traversals or shortest pathbut not all, for example:

“all people with name James” in a family tree, sorted by birthday“all family members who studied at Berkeley”, sorted bynumber of children

quite often, queries mixing the different models are useful

15

Use case: knowledge bases

encode nearly arbitrary knowledgeoften produced by machine learningqueried in very complex ways by expert systemsoften in connection to an inference engineneed linked data with lots of associationstypical queries have unpredictable path length, thus graphqueries shinenevertheless, often queries orthogonal to the links are needed

16

Recently: Key/Value stores adding other models(by Basho), originally a key/value store, adds support fordocuments with their 2.0 version (late 2014)(sponsored by Pivotal), originally an in-memorykey/value store, has over time added more data types andmore complex operations

FoundationDB (by FoundationDB) is a key/value store, but isnow marketed as a multi-model database by adding additionallayers on topOrientDB (by Orient Technologies) started as an objectdatabase and nowadays calls itself a multi-model database

17

Recently: DataStax acquired AureliusIn February 2015, DataStax (commercialised version of Cassan-dra (column-oriented)), announced the acquisition of Aurelius, thecompany behind TitanDB (a distributed graph database on top ofCassandra).In their own words:

“Bringing Graph Database Technology To Cassandra.”“Will deliver massively scalable, always-on graph databasetechnology.”“Will simplify the adoption of leading NoSQL technologies tosupport multi-model use case environments.”

18

Recently: MongoDB 3.0 adds pluggable DB engineis one of the most popular document stores.In February 2015, they announced their 3.0 version, to be releasedin March, featuring

a pluggable storage engine layertransparent on-disk compressionetc.

This indicates their interest to support more data models than “justdocuments”.It will be very interesting indeed to see if and how they extend theirquery-language . . .

19

is a multi-model database (document store & graph database),is open source and free (Apache 2 license),offers convenient queries (via HTTP/REST and AQL),memory efficient by shape detection,uses JavaScript throughout (Google’s V8 built into server),API extensible by JavaScript code in the Foxx framework,offers many drivers for a wide range of languages,is easy to use with web front end and good documentation,enjoys good professional as well as community supportand has sharding since Version 2.0.

20

Configurable consistencyArangoDB offers

atomic and isolated CRUD operations for single documents,transactions spanning multiple documents and multiplecollections,snapshot semantics for complex queries,very secure durable storage using append only and storingmultiple revisions,all this for documents as well as for graphs.

In the near future, ArangoDB willimplement complete MVCC semantics to allow for lock-freeconcurrent transactionsand offer the same ACID semantics even with sharding.

21

Replication and Sharding— horizontal scalabilityRight now, ArangoDB provides

easy setup of (asynchronous) replication,which allows read access parallelisation (master/slaves setup),sharding with automatic data distribution to multiple servers.

Very soon, ArangoDB will featurefault tolerance by automatic failover and synchronousreplication in cluster mode,zero administration by a self-reparing and self-balancingcluster architecture,full integration with Apache Mesos and Mesosphere.

22

Powerful query language: AQLThe built in Arango Query Language AQL allows

complex, powerful and convenient queries,with transaction semantics,allowing to do joins,with user definable functions (in JavaScript).AQL is independent of the driver used andoffers protection against injections by design.

For Version 2.3, we have reengineered the AQL query engine:use a C++ implementation for high performance,optimise distributed queries in the cluster.

23

Extensible through JavaScript and FoxxThe HTTP API of ArangoDB

can be extended by user-defined JavaScript code,that is executed in the DB server for high performance.This is formalised by the Foxx microservice framework,which allows to implement complex, user-defined APIs withdirect access to the DB engine.Very flexible and secure authentication schemes can beimplemented conveniently by the user in JavaScript.Because JavaScript runs everywhere (in the DB server as wellas in the browser), one can use the same libraries in theback-end and in the front-end.

=⇒ implement your own micro services24

The Future of NoSQL: My ObservationsI observe

2 decades ago the most versatile solutions eventuallydominated the relational DB market(Oracle, MySQL, PostgreSQL),the rise of the polyglot persistence ideaa trend towards multi-model databasesspecialised products broadening their scopeeven relational systems add support for JSON documentsdevOps gaining influence (Docker phenomenon)

25

The Future of NoSQL: My Predictions

In 5 years time . . .the default approach is to use a multi-model database,the big vendors will all add other data models,the NoSQL solutions will conquer a sizable portionof what is now dominated by the relational model,specialized products will only survive, if they find a niche.

26

Links

https://www.arangodb.com

http://guesser.9hoeffer.de:8000

https://github.com/ArangoDB/guesser

https://github.com/triAGENS/ArangoDB-NET

27