74
NOSQL Overview Tobias Lindaaker Software Developer @ Neo Technology twitter: @thobe / @neo4j / #neo4j email: [email protected] web: http://neo4j.org/ web: http://thobe.org/ CON6449

NOSQL Overview

Embed Size (px)

DESCRIPTION

Presented at JavaOne 2013, Wednesday September 25.

Citation preview

Page 2: NOSQL Overview

Agenda

๏Key/Value Stores

๏Document Databases

๏NewSQL Databases

๏Graph Databases

๏Column Oriented Databases

๏Caches

๏Message Queues

๏Hadoop2

Page 3: NOSQL Overview

General

3

Page 4: NOSQL Overview

Two main categories

4

Aggregate oriented Graph

Distinctio

n defined by

Martin Fo

wler

Source: NoSQL Distilled

Page 5: NOSQL Overview

Trend: Less uniformity

5

Page 6: NOSQL Overview

6

α β γ δ ε ζ η θ ι κ λ μ

id π τ1337

2468

3145

3579

4468

7878

entity key value1337 a lorem ipsum

1337 b lorem ipsum

3145 b lorem ipsum

3578 a lorem ipsum

3579 f lorem ipsum

3579 j lorem ipsum

4468 c lorem ipsum

4468 f lorem ipsum

7878 g lorem ipsum

7878 f lorem ipsum

Sparse data - Relational mismatch

Page 7: NOSQL Overview

7

id foo

1337 bar

2468 baz

3145 quux

3579 quux

4468 waldo

7878 fred

Sparse data - Relational mismatch

id data

1337 {"foo":"bar", ...}

2468 {"foo":"bar", ...}

3145 {"foo":"bar", ...}

3579 {"foo":"bar", ...}

4468 {"foo":"bar", ...}

7878 {"foo":"bar", ...}

id bar

1337 foo

2468 baz

3145 quux

3579 quux

4468 waldo

7878 fred

Search Tables

Data Table

Page 8: NOSQL Overview

Trend: Exponential data growth

8

2005 2006 2007 2008 2009 2010 20112012

Page 9: NOSQL Overview

Con

nect

edne

s

Time

Trend: Data becomes more connected

9

Page 10: NOSQL Overview

Nothing is new - everything changes

10

Then๏Navigational databases

IDS (Codasyl), IMS (IBM)

๏Multivalued databasesPICK/BASIC

๏Key/Value databasesMUMPS/M

๏COPYBOOKCOBOL

๏Object databasesObjectivity, db4o

๏XML databases

Now๏Graph databases

Neo4j,

๏Column databasesCassandra

๏Key/Value databasesCouchbase

๏Document databasesMongoDB, Redis

Still recent enoughto not have “new”counterparts...

Page 11: NOSQL Overview

Key/Value stores

11

Page 12: NOSQL Overview

Key/Value stores

12

๏Amazon SimpleDB

๏memcached

๏Oracle NoSQL Database

๏Redis

Page 13: NOSQL Overview

Key/Value stores

13

E D

CF

G B

A

Page 14: NOSQL Overview

Key/Value stores

13

E D

CF

G B

A

Page 15: NOSQL Overview

Key/Value stores

13

E D

CF

G B

A

Page 16: NOSQL Overview

Key/Value stores

13

E D

CF

G B

A

Page 17: NOSQL Overview

14

Sample use case: Content sharing

Page 18: NOSQL Overview

Document Databases

15

Page 19: NOSQL Overview

Document Databases

๏Lotus Notes

๏MongoDB

๏Riak

๏Redis

๏CouchDB

16

Page 20: NOSQL Overview

Document Databases

17

‣ id: 99CC

‣ fname: John

‣ lname: Smith

Page 21: NOSQL Overview

Document Databases

17

‣ id: 99CC

‣ fname: John

‣ lname: Smith

‣ clock:

‣ type: Fob watch

‣ make: Gallifreyan

‣ diameter: 2”

Page 22: NOSQL Overview

Document Databases

17

‣ id: 99CC

‣ fname: John

‣ lname: Smith

‣ clock:

‣ type: Fob watch

‣ make: Gallifreyan

‣ diameter: 2”

‣ id: 1337

‣ fname: Martha

‣ lname: Jones

‣ occupation: MD

Page 23: NOSQL Overview

Document Databases

17

‣ id: 99CC

‣ fname: John

‣ lname: Smith

‣ clock:

‣ type: Fob watch

‣ make: Gallifreyan

‣ diameter: 2”

‣ id: 1337

‣ fname: Martha

‣ lname: Jones

‣ occupation: MD

‣ id: 2468

‣ fname: Rose

‣ lname: Tyler

‣ in_love_with: 99CC

Page 24: NOSQL Overview

Document Databases

18

Page 25: NOSQL Overview

Document Databases

18

posttitle: ___text: ___tags: [...]

comments

text: ___

text: ___

Page 26: NOSQL Overview

The rise of REST for databases

19

๏ It’s actually all about Hypermedia:

•When one aggregate root references another

•Not necessarily on the same host

•Hyperlinks provide the desired decoupling,and can reference documents qualified by host

๏HTTP and the ease to develop client drivers a further driver

Page 27: NOSQL Overview

NewSQL

20

Page 28: NOSQL Overview

NewSQL defined

21

๏Relational Databases with (primarily) a SQL interface, that adopts the scaling benefits of NoSQL databases.

๏Automatic/Transparent sharding of data

๏Distributed, Fault Tolerant, Highly Available

Page 29: NOSQL Overview

NewSQL databases

22

๏Google Spanner

๏VoltDB

๏TokuDB (MySQL engine)

๏Clusterix

๏RethinkDB

Page 30: NOSQL Overview

Graph Databases

23

Page 31: NOSQL Overview

Neo4j is a Graph Database

24

Page 32: NOSQL Overview

24

IS_A

Neo4j Graph Database

Page 33: NOSQL Overview

Example Graph Databases

๏Neo4j

๏ Infinite Graph (by Objectivity)

๏AllegroGraph (by Franz inc.)

๏HypergraphDB

๏ InfoGrid

๏DEX

๏VertexDB

๏FlockDB

25

Page 34: NOSQL Overview

26

Page 35: NOSQL Overview

27

Page 36: NOSQL Overview

27

fromstole

Page 37: NOSQL Overview

27

fromstole

companioncompanion

companion

Page 38: NOSQL Overview

27

fromstole

companioncompanion

companion

married

Page 39: NOSQL Overview

27

fromstole

companioncompanion

companion

enemy

enemyenemy

married

Page 40: NOSQL Overview

27

fromstole

plays

plays

plays

plays

companioncompanion

companion

enemy

enemyenemy

married

Page 41: NOSQL Overview

27

A Good Man Goes to War

Bad Wolf

fromstole

plays

plays

plays

plays

companioncompanion

companion

enemy

enemyenemy

married

in

inin

inin

in

in

Page 42: NOSQL Overview

Graph Databases

30

Page 43: NOSQL Overview

Querying Graph Databases (Neo4j)

31

LOVESA B

Graph Patterns

Page 44: NOSQL Overview

Querying Graph Databases (Neo4j)

31

A -[:LOVES]-> B

LOVESA B

Graph PatternsASCII art

Page 45: NOSQL Overview

Querying Graph Databases (Neo4j)

31

A -[:LOVES]-> B

LOVESA B

Graph Patterns

START A=node:person(name=“A”)MATCHRETURN B as lover

ASCII art

Page 46: NOSQL Overview

Column Oriented Databases

32

Page 47: NOSQL Overview

Column Store

33

Page 48: NOSQL Overview

Column Oriented Databases

๏Cassandra

๏BigTable (internal at Google)

๏HBase (part of Hadoop)

๏Hypertable

34

Page 49: NOSQL Overview

Column DB - Classic example

35

Twitter clone

Page 50: NOSQL Overview

Column Databases

36

๏Use as underlying storage for a higher level data storage model

๏Eg. a graph database model implemented on top of Cassandra

•Notable example:Aurelius Titan

Page 51: NOSQL Overview

Caches

37

Page 52: NOSQL Overview

Caches - Improving Reads

38

๏Read from cache first, only read from DB on cache miss

๏Preferably cache aggregates, possibly after passing throughApp-level processing

๏memcached - mainly a cache, tried re-position as a NOSQL DB

• as has other cache products tried

Page 53: NOSQL Overview

Message Queues

39

Page 54: NOSQL Overview

Message Queues - Improving Writes

40

๏Write to Queue, process work from Queue in batches

•Alleviates transactional overhead by grouping writes

• Still guarantees writes if the Queue has durability guarantees

•Needs tx synchronization with DB (2PC)

๏Writes not immediately visible, delayed through queue

•Write-to-cache can be used to get around this,if a cache is used

๏Amazon SQS

๏RabbitMQ

๏ZeroMQ

Page 55: NOSQL Overview

41

Hadoop - Big Data processing

Page 56: NOSQL Overview

41

Hadoop - Big Data processing

Oracle

Neo4j

Cassandra

Page 57: NOSQL Overview

41

Hadoop - Big Data processing

Oracle

Neo4j

Cassandra

Page 58: NOSQL Overview

41

Hadoop - Big Data processing

MapReduce

Page 59: NOSQL Overview

Hadoop - Data Analysis/Processing

42

๏Batch process large amounts of datatypically offline or semi-online, not for interactive querying

๏ Ingest data from your DB, process and generate report

• Ex. Read Neo4j graph, generate centrality analysis report

๏ Ingest data from event stream, process and generate data for DB

•Ex. Read access logs, create Neo4j data for security analysis

๏ Ingest data from one DB, process and generate data for another

• Ex. Read MySQL transaction logs,create Neo4j data for query acceleration

Page 60: NOSQL Overview

More DB history

43

Page 61: NOSQL Overview

Building Databases is hard

44

๏The current NOSQL wave took off in 2009

๏ ... many much older databases still have issues...

๏Most likely there will be issues

๏https://github.com/aphyr/jepsen (by Kyle Kingsbury / @aphyr)

• ... most distributed databases fail in the event of Partitions

๏Test, Test, Test, and Test

•Test the database heavily before you put it in production

•Test for your use cases - generic benchmarks are useless

•Test with real load

•Test continuously

Page 62: NOSQL Overview

Serious Database Vendorstake Data Seriously

๏Make sure to test their product under “real” load

๏Make sure to test their product in the event of failures

๏But you still need to Test!

๏Report issues to the vendor

๏Data loss is too embarrassing - will be fixed!

๏Performance is important - you’ll be heard!

45

Page 63: NOSQL Overview

Polyglot Persistence:combining multiple databases

46

Page 64: NOSQL Overview

Polyglot Persistence - Multiple DBs

47

๏Real world examples:

•RDBMS as system of record,Neo4j for accelerating (join) queries

•Neo4j for storing metadata and structure,Cassandra for storing event logs,S3 for storing BLOB data

Page 65: NOSQL Overview

Conclusion

48

Page 66: NOSQL Overview

It is all about modelling

Simplify the world enough‣ to reason about‣ to store and process

Page 67: NOSQL Overview

Model mis-match

Real World Model

Page 68: NOSQL Overview

Complex problem? - right tool for each job!

51Image credits: Unknown :’(

Page 69: NOSQL Overview

Key/Value stores

๏Examples:

•Amazon SimpleDB, memcached, Oracle NoSQL, Redis

๏Use when Data is opaque

๏Scalability is important

๏Scale simply with the addition of more servers

• rebalance equally simply

52

Page 70: NOSQL Overview

Document Databases

๏Examples:

•MongoDB, Riak

๏Use when data is collections of similar entities

• But semi structured (sparse) rather than tabular

•When fields in entries have multiple values

53

Page 71: NOSQL Overview

Column Family Databases

๏Examples:

•Cassandra

๏Use when scalability is the main issue

•Both scaling size and scaling load

‣In particular scaling write load

๏Linear scalability (as you add servers) both in read and write

๏Low level - will require you to duplicate data to support queries

54

Page 72: NOSQL Overview

Graph Databases

๏Examples:

•Neo4j, DEX, InfiniteGraph

๏Use when (deep) traversals are important

๏For complex domains

๏When how entities relate is an important aspect of the domain

55

Page 73: NOSQL Overview

When not to use a NOSQL Database

๏RDBMSes have been the de-facto standard for years, and still have better tools for some tasks

• Especially for reporting

๏When maintaining a system that works already

๏Sometimes when data is uniform / structured

๏When aggregations over (subsets) of the entire dataset is key

๏But please don’t use a Relational database for persisting objects

56

Page 74: NOSQL Overview

http://neotechnology.com

Questions?