79
GraphAware TM by Michal Bachman plus a few best practices and lessons learned Modelling Data in Neo4j

Modelling Data in Neo4j (plus a few tips)

Embed Size (px)

DESCRIPTION

Modelling Data in Neo4j, bidirectional relationships, qualifying relationships with properties vs. relationship types (performance comparison), Neo4j hardware sizing, Cypher vs. Java API

Citation preview

Page 1: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

by Michal Bachman

plus a few best practices and lessons learned

Modelling Data in Neo4j

Page 2: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Page 3: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Contents

Page 4: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Quick intro

Contents

Page 5: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Quick intro

1x mistake

Contents

Page 6: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Quick intro

1x mistake

1x experiment

Contents

Page 7: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Quick intro

1x mistake

1x experiment

1x FAQ

Contents

Page 8: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Quick intro

1x mistake

1x experiment

1x FAQ

1x case-study

Contents

Page 9: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Data Has Changed

Page 10: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Larger Volumes

Data Has Changed

Page 11: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Larger Volumes

Less Structured

Data Has Changed

Page 12: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Larger Volumes

Less Structured

More Interconnected

Data Has Changed

Page 13: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Larger Volumes

Less Structured

More Interconnected

Polygot Persistence

Data Has Changed

Page 14: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

NoSQL

Page 15: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Key-Value Stores

NoSQL

Page 16: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Key-Value Stores

Column-Family Stores

NoSQL

Page 17: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Key-Value Stores

Column-Family Stores

Document Databases

NoSQL

Page 18: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Key-Value Stores

Column-Family Stores

Document Databases

Graph Databases

NoSQL

Page 19: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

The first three use aggregate data models, graph databases work with simple records and complex interconnections.

Graph Databases

Page 20: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Neo4j

Page 21: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Open-source

Neo4j

Page 22: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Open-source

Schema-less

Neo4j

Page 23: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Open-source

Schema-less

JVM-based

Neo4j

Page 24: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Open-source

Schema-less

JVM-based

Fully ACID

Neo4j

Page 25: Modelling Data in Neo4j (plus a few tips)

ipsum

name: "Drama"type: "genre"

name: "Triller"type: "genre"

name: "Pulp Fiction"year: 1994type: "movie"

DIRECTED

IS_OF_GENRE

name: "Quentin Tarantino"type: "person"

name: "Director"type: "occupation"

name: "Actor"type: "occupation"

IS_OF_GENRE

ACTED_IN

name: "Samuel L. Jackson"type: "person"

IS_A

IS_A

IS_A

ACTED_IN

role: "Jules Winnfield"

role: "Jimmie Dimmick"

GraphAwareTM

Property Graph

Page 26: Modelling Data in Neo4j (plus a few tips)

name: "Drama"type: "genre"

name: "Triller"type: "genre"

name: "Pulp Fiction"year: 1994type: "movie"

DIRECTED

IS_OF_GENRE

name: "Quentin Tarantino"type: "person"

name: "Director"type: "occupation"

name: "Actor"type: "occupation"

IS_OF_GENRE

ACTED_IN

name: "Samuel L. Jackson"type: "person"

IS_A

IS_A

IS_A

ACTED_IN

role: "Jules Winnfield"

role: "Jimmie Dimmick"

GraphAwareTM

Traversal

Page 27: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

There is no single correct way.

Modeling Data as Graphs

Page 28: Modelling Data in Neo4j (plus a few tips)

ipsum

name: "Drama"type: "genre"

name: "Triller"type: "genre"

name: "Pulp Fiction"year: 1994type: "movie"

DIRECTED

IS_OF_GENRE

name: "Quentin Tarantino"type: "person"

name: "Director"type: "occupation"

name: "Actor"type: "occupation"

IS_OF_GENRE

ACTED_IN

name: "Samuel L. Jackson"type: "person"

IS_A

IS_A

IS_A

ACTED_IN

role: "Jules Winnfield"

role: "Jimmie Dimmick"

GraphAwareTM

One Way

Page 29: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

name: "Pulp Fiction"year: 1994type: "movie"genres: "Drama", "Thriller"

DIRECTE

D

name: "Quentin Tarantino"type: "person"occupation: "Actor", "Director"

ACTED_AS

name: "Samuel L. Jackson"type: "person"occupation: "Actor"

ACTED_AS

name: "Jules Winnfield"type: "role"

name: "Jimmie Dimmick"type: "role"

CHAR

ACTE

R_IN

CHARACTER_IN

Another Way

Page 30: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

a common mistake

Bidirectional Relationships

Page 31: Modelling Data in Neo4j (plus a few tips)

DEFEATEDCzech Republic

Sweden

GraphAwareTM

Ice Hockey

Page 32: Modelling Data in Neo4j (plus a few tips)

DEFEATEDCzech Republic

Sweden

GraphAwareTM

Ice Hockey

Page 33: Modelling Data in Neo4j (plus a few tips)

DEFEATED

Czech Republic

Sweden

DEFEATED_BY

GraphAwareTM

Ice Hockey (Implied Relationship)

Page 34: Modelling Data in Neo4j (plus a few tips)

DEFEATED

Czech Republic

Sweden

DEFEATED_BY

GraphAwareTM

Ice HockeyIce Hockey (Implied Relationship)

Page 35: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

In Neo4j, the speed of traversal does not depend on the direction of the relationships being traversed.

Traversals

Page 36: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Why?

GraphAwareTM

Page 37: Modelling Data in Neo4j (plus a few tips)

Node Record in the Node Store (9 bytes), first bit = inUse flag

Relationship Record in the Relationship Store (33 bytes), first bit = inUse flag, second bit unused

next relationship

(35 bits)

next property (36 bits)

first node(35 bits)

second node (35 bits)

type(16 bits)

first node's previous

relationship (35 bits)

first node's next

relationship (35 bits)

second node's first relationship

(35 bits)

second node's next relationship

(35 bits)

next property (36 bits)

GraphAwareTM

Neo4j Data Layout

Page 38: Modelling Data in Neo4j (plus a few tips)

PARTNERNeo Technology GraphAware

PARTNERNeo Technology GraphAware

GraphAwareTM

Company Partnership (Naturally Bidirectional)

Page 39: Modelling Data in Neo4j (plus a few tips)

PARTNER

Neo Technology GraphAware

PARTNER

GraphAwareTM

Company Partnership (Naturally Bidirectional)

Page 40: Modelling Data in Neo4j (plus a few tips)

PARTNER

Neo Technology GraphAware

PARTNER

GraphAwareTM

Company Partnership (Naturally Bidirectional)

Page 41: Modelling Data in Neo4j (plus a few tips)

Neo Technology GraphAware

PARTNER

GraphAwareTM

Company Partnership (Naturally Bidirectional)

Page 42: Modelling Data in Neo4j (plus a few tips)

Neo Technology GraphAware

PARTNER

GraphAwareTM

Company Partnership (Naturally Bidirectional)

Page 43: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Neo4j APIs allow developers to completely ignore relationship direction when querying the graph.

Why?

Page 44: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

MATCH  (neo)-­‐[:PARTNER]-­‐>(partner)

Cypher

Page 45: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

MATCH  (neo)<-­‐[:PARTNER]-­‐(partner)

Cypher

Page 46: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

MATCH  (neo)-­‐[:PARTNER]-­‐(partner)

Cypher

Page 47: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

performance comparison

Qualifying Relationships

Page 48: Modelling Data in Neo4j (plus a few tips)

PulpFiction Michal

RATED

rating: 5

Mark

Daniela

RATEDrating: 1

RATED

ratin

g: 4

GraphAwareTM

Qualifying by Properties

Page 49: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

START      pulpFiction=node({id})MATCH      (pulpFiction)<-­‐[r:RATED]-­‐(fan)WHERE      r.rating  >  3RETURN    fan

Who liked Pulp Fiction? (Cypher)

Page 50: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

for  (Relationship  r  :  pulpFiction.getRelationships(INCOMING,  RATED))  {        if  ((int)  r.getProperty("rating")  >  3)          {                Node  fan  =  r.getStartNode();  //do  something  with  it        }}

Who liked Pulp Fiction? (Java)

Page 51: Modelling Data in Neo4j (plus a few tips)

PulpFiction Michal

LOVED

Mark

Daniela

HATED

LIKED

GraphAwareTM

Qualifying by Relationship Type

Page 52: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

START      pulpFiction=node({id})MATCH      (pulpFiction)<-­‐[r:LIKED|LOVED]-­‐(fan)RETURN    fan

Who liked Pulp Fiction? (Cypher)

Page 53: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

for  (Relationship  r  :  pF.getRelationships(INCOMING,  LIKED,  LOVED))  {        Node  fan  =  r.getStartNode();  //do  something  with  it}

Who liked Pulp Fiction? (Java)

Page 54: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Page 55: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Page 56: Modelling Data in Neo4j (plus a few tips)

PulpFiction Michal

LOVED

Mark

Daniela

HATED

LIKED

GraphAwareTM

Winner!

Page 57: Modelling Data in Neo4j (plus a few tips)

Other interesting info?

Page 58: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

frequently asked question

Hardware Sizing

Page 59: Modelling Data in Neo4j (plus a few tips)

HDD

Record Files

Transaction Log

Operating System

JVM

Neo4j

Object Cache

Core API

Other APIs

TransactionManagement

File System Cache

Node

s

Rela

tions

hips

Prop

ertie

s

Rela

tions

hip

Type

s

GraphAwareTM

Neo4j Architecture

Page 60: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

>  cd  data>  ls  -­‐ah

Disk Space

Page 61: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

drwxr-­‐xr-­‐x      5  bachmanm    wheel      170B  19  Oct  12:56  index-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        31K  19  Oct  12:56  messages.log-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        69B  19  Oct  12:56  neostore-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.id-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      8.8K  19  Oct  12:56  neostore.nodestore.db-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.nodestore.db.id-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        39M  19  Oct  12:56  neostore.propertystore.db-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      153B  19  Oct  12:56  neostore.propertystore.db.arrays-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.arrays.id-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.id-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        43B  19  Oct  12:56  neostore.propertystore.db.index-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.index.id-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      140B  19  Oct  12:56  neostore.propertystore.db.index.keys-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.index.keys.id-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      154B  19  Oct  12:56  neostore.propertystore.db.strings-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.propertystore.db.strings.id-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        31M  19  Oct  12:56  neostore.relationshipstore.db-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.relationshipstore.db.id-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel        38B  19  Oct  12:56  neostore.relationshiptypestore.db-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.relationshiptypestore.db.id-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel      140B  19  Oct  12:56  neostore.relationshiptypestore.db.names-­‐rw-­‐r-­‐-­‐r-­‐-­‐      1  bachmanm    wheel          9B  19  Oct  12:56  neostore.relationshiptypestore.db.names.id

Disk Space

Page 62: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Disk Space

node 9B

relationship 33B

property 41B

Page 63: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Disk Space (Example)

1,000 nodes x 9B = =

8.8 kB1,000,000 rels x 33B =

=31.5 MB

2,010,000 props x 41B = =

78.6 MBTOTAL 110.1 MB

Page 64: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

How about low level cache? Any guesses?

Low Level Cache

Page 65: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Same as disk space

Low Level Cache

Page 66: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

High Level Cache

node 344B

relationship 208B

property 116B

...

Page 67: Modelling Data in Neo4j (plus a few tips)

Other interesting info?

Page 68: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

case study

Java API vs. Cypher

Page 69: Modelling Data in Neo4j (plus a few tips)
Page 70: Modelling Data in Neo4j (plus a few tips)

User 2

User 1

User 3

TRAVELLED_WITH

User 4TRAVELLED_TOGETHER

FRIEND

TRAVELLED_WITH

weight: 5

weight: 1

weight: 3 weight: 4

GraphAwareTM

Data Model

Page 71: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

START        from=node:node_auto_index(user_id="{FROM}"),                  to=node:node_auto_index(user_id="{TO}")

MATCH        p  =  from-­‐[r*1..5]-­‐>to

RETURN      extract(n  in  nodes(p)  :  n.user_id),                    extract(rel  in  relationships(p)  :  rel.weight),                    extract(rel  in  relationships(p)  :  type(rel))

ORDER  BY  length(p),                    reduce(totalWeight  =  0,  rel  in  relationships(p)  :                  totalWeight  +  rel.weight)

LIMIT        3

Page 72: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

START        from=node:node_auto_index(user_id="{FROM}"),                  to=node:node_auto_index(user_id="{TO}")

MATCH        p  =  from-­‐[r*1..5]-­‐>to

RETURN      extract(n  in  nodes(p)  :  n.user_id),                    extract(rel  in  relationships(p)  :  rel.weight),                    extract(rel  in  relationships(p)  :  type(rel))

ORDER  BY  length(p),                    reduce(totalWeight  =  0,  rel  in  relationships(p)  :                  totalWeight  +  rel.weight)

LIMIT        3

> 1 second

Page 73: Modelling Data in Neo4j (plus a few tips)
Page 74: Modelling Data in Neo4j (plus a few tips)

10 - 20 ms

Page 75: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Java API vs. Cypher

Page 76: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Cypher is great!

Java API vs. Cypher

Page 77: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Cypher is great!

Cypher is improving

Java API vs. Cypher

Page 78: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

Cypher is great!

Cypher is improving

But don’t be afraid of writing some Java

Java API vs. Cypher

Page 79: Modelling Data in Neo4j (plus a few tips)

GraphAwareTM

www.graphaware.com@graph_aware

Thanks!