Upload
matthias-broecheler
View
1.268
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Presents Titan and Faunus at the Gitpro conference help April 12, 2014.
Citation preview
AURELIUS THINKAURELIUS.COM
TITAN Scalable Graph Database
Matthias Broecheler @mbroecheler April 12th, MMXIV
Database
L?;F NCG?
BCAB NBLIOABJON
NL;HM;=NCIH;F
Graph Database
Graph Database
M=;F;<F?
CHN?AL;N?>
IJ?H MIOL=?
name: Newton type: user
name: Hercules type: user
title: “How to deal with Father issues” type: book
title: “Muscle building for beginners” type: book
title: “Dancing with the Stars” type: DVD
title: “Friends forever bracelet” type: Accessory
name: Newton type: user
name: Hercules type: user
bought
bought
bought
viewed
in-Cart
title: “How to deal with Father issues” type: book
title: “Muscle building for beginners” type: book
title: “Dancing with the Stars” type: DVD
title: “Friends forever bracelet” type: Accessory
name: Newton type: user
name: Hercules type: user
bought
time:24
bought
bought
time:22
time:20
viewed
in-Cart
time:05
time:09
title: “How to deal with Father issues” type: book
title: “Muscle building for beginners” type: book
title: “Dancing with the Stars” type: DVD
title: “Friends forever bracelet” type: Accessory
name: Newton type: user
name: Hercules type: user
bought
time:24
bought
bought
time:22
time:20
viewed
in-Cart
time:05
time:09
title: “How to deal with Father issues” type: book
title: “Muscle building for beginners” type: book
title: “Dancing with the Stars” type: DVD
title: “Friends forever bracelet” type: Accessory
1. Home-grown solution
2. Relational Database
3. Graph Database
Home-grown Solution
! Start with your favorite NoSQL database ! Cassandra, MongoDB, HBase, etc
1. Error-prone
2. Data model moves into application code
3. Maintainability hazard
4. No query language support
5. No performance optimization
Relational Database
! Relationship tables, SQL and joins
1. Join processing is expensive
2. Join processing on large tables does not scale
3. Cumbersome query language
4. Inflexible data model
SELECT P.title FROM
User U1 JOIN Purchase P1 ON P1.buyerid = U1.userid JOIN Purchase P2 ON P1.productid=P2.productid JOIN Purchase P3 ON P2.buyerid=P3.buyerid JOIN Product P ON P3.productid = P.productid
WHERE U1.name=“xyz” AND P1.time>T1 AND P2.time>T1
Relational Database
! Relationship tables, joins, and SQL
1. Join processing is expensive
2. Join processing on large tables does not scale
3. Cumbersome query language
4. Inflexible data model
name: Newton type: user
name: Hercules type: user
bought
friends
time:24
bought
bought
time:22
time:20
viewed
in-Cart
time:05 duration: 60
time:09
name: Saturn type: author author
author
title: “How to deal with Father issues” type: book
title: “Muscle building for beginners” type: book
title: “Dancing with the Stars” type: DVD
title: “Friends forever bracelet” type: Accessory
1. Home-grown solution
2. Relational Database
3. Graph Database
UML
Entity Relationship Model
name: Hercules type: user
bought
time:24
6?LN?R
%>A? ,;<?F
%>A?
0LIJ?LNS t E?S q P;FO?
title: “Muscle building for beginners” type: book
name: Newton type: user
name: Hercules type: user
bought
friends
time:24
bought
bought
time:22
time:20
viewed
in-Cart
time:05 duration: 60
time:09
name: Saturn type: author author
author
title: “How to deal with Father issues” type: book
title: “Muscle building for beginners” type: book
title: “Dancing with the Stars” type: DVD
title: “Friends forever bracelet” type: Accessory
g.V.has(‘name’,’xyz’).outE(‘bought’).has(‘time’,gt,T1).inV .inE(‘bought’).has(‘time’,gt,T1).outV .out(‘bought’).title
http://gremlindocs.com/
Architecture Analogy
MyISAM
Flexible Persistence
Partitionability
Availability Consistency
Vertex-Centric Indices
! Sort and index edges per vertex by sor tkey ! Sort key can be composite
! Enables efficient focused traversals ! Only retrieve edges that matter
! Uses push down predicates for quick, index-driven retrieval
Token Ring
Graph Partitioning
;MMCAHM C>M NI G;J P?LNC=?M CHNI “IJNCG;F” NIE?H L;HA?
,INM I@ CHN?L?MNCHA KO?MNCIHM @IL@ONOL? QILE
OM?M "/0
Educating the Planet
Person
Person Student Teacher
Course
Institution
Concept
Discussion
Comment
Share
enrolledIn
teaches
relatesTo
hasCourse
belongsTo
follows
author
references
hasComment relatesTo
author
partOf
relatesTo
121 Billion Edges 6.2 Billion Vertices
U -CFFCIH 5HCP?LMCNC?M W . Y "CFFCIH 3NO>?HNM
0F;=?G?HN 'LIOJ
BCU .4RF
Setup
1.1 million edges / sec
OMCHA <;N=B GI>?
Data Ingestion
\^ GU .G?>COG
10,200 transactions / sec
UZ L;H>IGFS =BIM?H =IGJF?R NL;P?LM;F N?GJF;N?M
Throughput
Transaction Description Avg (ms) Stdev (ms) Student retrieves all content for a single course in their course list 279.32 81.83
Student follows another student 193.72 22.77 Student is recommended people to follow 241.33 256.48
Student reads their stream and shares an item with followers 284.07 68.20
Student retrieves their profile 53.740 22.61 Student reads the most recent comments for their courses 211.07 45.56
x = [] as Set; m = [:]!m = user.out('follows').aggregate(x)[0..(num*2)]!!.out('follows').except(x)[0..limit]!!.groupCount(m);!
!m.sort{-it.value}[0..num]._()!!.transform{ [userid: it.key.id, !! ! ! ! ! ! points: it.value]};!
&IFFIQ 2?=IGG?H>;NCIH
AURELIUS THINKAURELIUS.COM
Faunus Batch Graph Analytics
! Hadoop-based Graph Computing Framework
! Graph Analytics
! Breadth-first Traversals
! Global Graph Computations
! Batch Big Graph Data
Faunus Features
Faunus Architecture
g._()!
Faunus Work Flow
hdfs://user/ubuntu/
output/job-0/
output/job-1/
output/job-2/ { graph*
sideeffect*
g.V.out .out .count()
Compressed HDFS Graphs ! stored in sequence files ! variable length encoding ! prefix compression
Degree Distribution
GitHub Network
g.V.sideEffect{ it.degree = it.out(‘follows’).count()
}.degree.groupCount
Degree Distribution
P(k) ~ k-γ
γ = 2.2
Global Recommendations
gremlin> g.E.has('label','pushed','to').keep.!! ! !V.out('pushed').out('to').!! ! !in('to').in('pushed').!! ! !sideEffect('{it.score =it.pathCounter}').!! ! !score.order(F.decr,'name')!
!# Top 5:!Jippi ! ! ! !60892182927!garbear ! ! !30095282886!FakeHeal ! ! !30038040349!brianchandotcom !24684133382!nyarla! ! !15230275746!
Aurelius Graph Cluster
OLTP OLAP
Hadoop MapReduce
Analysis results back into Titan
Apache 2
g.V.label.groupCount g.v(101).out
titan.thinkaurelius.com faunus.thinkaurelius.com
AURELIUS THINKAURELIUS.COM
@AURELIUSGRAPHS