Titan @ Gitpro Conference 2014

Preview:

DESCRIPTION

Presents Titan and Faunus at the Gitpro conference help April 12, 2014.

Citation preview

AURELIUS THINKAURELIUS.COM

TITAN Scalable Graph Database

Matthias Broecheler @mbroecheler April 12th, MMXIV

Database

L?;F NCG?

BCAB NBLIOABJON

NL;HM;=NCIH;F

Graph Database

Graph Database

M=;F;<F?

CHN?AL;N?>

IJ?H MIOL=?

name: Newton type: user

name: Hercules type: user

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

name: Newton type: user

name: Hercules type: user

bought

bought

bought

viewed

in-Cart

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

name: Newton type: user

name: Hercules type: user

bought

time:24

bought

bought

time:22

time:20

viewed

in-Cart

time:05

time:09

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

name: Newton type: user

name: Hercules type: user

bought

time:24

bought

bought

time:22

time:20

viewed

in-Cart

time:05

time:09

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

1.  Home-grown solution

2.  Relational Database

3.  Graph Database

Home-grown Solution

!  Start with your favorite NoSQL database !  Cassandra, MongoDB, HBase, etc

1.  Error-prone

2.  Data model moves into application code

3.  Maintainability hazard

4.  No query language support

5.  No performance optimization

Relational Database

!  Relationship tables, SQL and joins

1.  Join processing is expensive

2.  Join processing on large tables does not scale

3.  Cumbersome query language

4.  Inflexible data model

SELECT P.title FROM

User U1 JOIN Purchase P1 ON P1.buyerid = U1.userid JOIN Purchase P2 ON P1.productid=P2.productid JOIN Purchase P3 ON P2.buyerid=P3.buyerid JOIN Product P ON P3.productid = P.productid

WHERE U1.name=“xyz” AND P1.time>T1 AND P2.time>T1

Relational Database

!  Relationship tables, joins, and SQL

1.  Join processing is expensive

2.  Join processing on large tables does not scale

3.  Cumbersome query language

4.  Inflexible data model

name: Newton type: user

name: Hercules type: user

bought

friends

time:24

bought

bought

time:22

time:20

viewed

in-Cart

time:05 duration: 60

time:09

name: Saturn type: author author

author

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

1.  Home-grown solution

2.  Relational Database

3.  Graph Database

UML

Entity Relationship Model

name: Hercules type: user

bought

time:24

6?LN?R

%>A? ,;<?F

%>A?

0LIJ?LNS t E?S q P;FO?

title: “Muscle building for beginners” type: book

name: Newton type: user

name: Hercules type: user

bought

friends

time:24

bought

bought

time:22

time:20

viewed

in-Cart

time:05 duration: 60

time:09

name: Saturn type: author author

author

title: “How to deal with Father issues” type: book

title: “Muscle building for beginners” type: book

title: “Dancing with the Stars” type: DVD

title: “Friends forever bracelet” type: Accessory

g.V.has(‘name’,’xyz’).outE(‘bought’).has(‘time’,gt,T1).inV .inE(‘bought’).has(‘time’,gt,T1).outV .out(‘bought’).title

http://gremlindocs.com/

Architecture Analogy

MyISAM

Flexible Persistence

Partitionability

Availability Consistency

Vertex-Centric Indices

!  Sort and index edges per vertex by sor tkey !  Sort key can be composite

!  Enables efficient focused traversals !  Only retrieve edges that matter

!  Uses push down predicates for quick, index-driven retrieval

Token Ring

Graph Partitioning

;MMCAHM C>M NI G;J P?LNC=?M CHNI “IJNCG;F” NIE?H L;HA?

,INM I@ CHN?L?MNCHA KO?MNCIHM @IL@ONOL? QILE

OM?M "/0

Educating the Planet

Person

Person Student Teacher

Course

Institution

Concept

Discussion

Comment

Share

enrolledIn

teaches

relatesTo

hasCourse

belongsTo

follows

author

references

hasComment relatesTo

author

partOf

relatesTo

121 Billion Edges 6.2 Billion Vertices

U -CFFCIH 5HCP?LMCNC?M W . Y "CFFCIH 3NO>?HNM

0F;=?G?HN 'LIOJ

BCU .4RF

Setup

1.1 million edges / sec

OMCHA <;N=B GI>?

Data Ingestion

\^ GU .G?>COG

10,200 transactions / sec

UZ L;H>IGFS =BIM?H =IGJF?R NL;P?LM;F N?GJF;N?M

Throughput

Transaction Description Avg (ms) Stdev (ms) Student retrieves all content for a single course in their course list 279.32 81.83

Student follows another student 193.72 22.77 Student is recommended people to follow 241.33 256.48

Student reads their stream and shares an item with followers 284.07 68.20

Student retrieves their profile 53.740 22.61 Student reads the most recent comments for their courses 211.07 45.56

x = [] as Set; m = [:]!m = user.out('follows').aggregate(x)[0..(num*2)]!!.out('follows').except(x)[0..limit]!!.groupCount(m);!

!m.sort{-it.value}[0..num]._()!!.transform{ [userid: it.key.id, !! ! ! ! ! ! points: it.value]};!

&IFFIQ 2?=IGG?H>;NCIH

AURELIUS THINKAURELIUS.COM

Faunus Batch Graph Analytics

!  Hadoop-based Graph Computing Framework

!  Graph Analytics

!  Breadth-first Traversals

!  Global Graph Computations

! Batch Big Graph Data

Faunus Features

Faunus Architecture

g._()!

Faunus Work Flow

hdfs://user/ubuntu/

output/job-0/

output/job-1/

output/job-2/ { graph*

sideeffect*

g.V.out .out .count()

Compressed HDFS Graphs !  stored in sequence files !  variable length encoding !  prefix compression

Degree Distribution

GitHub Network

g.V.sideEffect{ it.degree = it.out(‘follows’).count()

}.degree.groupCount

Degree Distribution

P(k) ~ k-γ

γ = 2.2

Global Recommendations

gremlin> g.E.has('label','pushed','to').keep.!! ! !V.out('pushed').out('to').!! ! !in('to').in('pushed').!! ! !sideEffect('{it.score =it.pathCounter}').!! ! !score.order(F.decr,'name')!

!# Top 5:!Jippi ! ! ! !60892182927!garbear ! ! !30095282886!FakeHeal ! ! !30038040349!brianchandotcom !24684133382!nyarla! ! !15230275746!

Aurelius Graph Cluster

OLTP OLAP

Hadoop MapReduce

Analysis results back into Titan

Apache 2

g.V.label.groupCount g.v(101).out

titan.thinkaurelius.com faunus.thinkaurelius.com

aureliusgraphs@googlegroups.com

AURELIUS THINKAURELIUS.COM

@AURELIUSGRAPHS

Recommended