AURELIUS THINKAURELIUS.COM
TITAN Distributed Graph Computing
Matthias Broecheler, CTO @mbroecheler June XI, MMXIII
#CASSANDRA13
This presentation introduces Titan, Faunus, and scalable graph computing in general. We present a case study of how Pearson builds an education social network on top of Titan, Faunus, and Cassandra to support learning in the 21st century.
Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Faunus is an open source global graph processing engine build on top of Hadoop and compatible with Cassandra that can analyze graphs, compute graph statistics, and execute global traversals. Titan and Faunus are components of the Aurelius Graph Cluster which enables scalable graph computation and powers applications in social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.
Thank You!
JOFF L?KO?MNM @?;NOL? MOAA?MNCIHM
<OA L?JILNM =IGGOHCNS MOJJILN
June 14th 2012
September 2012
December 2012
March 2013
May 2013
Alpha Release
Titan 0.1.0
Titan 0.2.0
Titan 0.3.0
Titan 0.3.1
%RJ?LCG?HN;F L?F?;M? I@ ; >CMNLC<ON?>m IJ?H rMIOL=? AL;JB >;N;<;M?
&CLMN MN;<F? L?F?;M?
2?QLCN? I@ =IL? )H>?RCHA h %F;MNC=3?;L=B
0?L@ILG;H=? "OA@CRCHA
June 14th 2012
September 2012
December 2012
March 2013
May 2013
Alpha Release
Titan 0.1.0
Titan 0.2.0
Titan 0.3.0
Titan 0.3.1
%RJ?LCG?HN;F L?F?;M? I@ ; >CMNLC<ON?>m IJ?H rMIOL=? AL;JB >;N;<;M?
&CLMN MN;<F? L?F?;M?
2?QLCN? I@ =IL? )H>?RCHA h %F;MNC=3?;L=B
0?L@ILG;H=? "OA@CRCHA
Faunus Release
Titan
Graph Database >CMNLC<ON?>
L?;F NCG?
IJ?H MIOL=?
name: Hercules type: demigod
name: Cerberus type: monster
battled
time:12
6?LN?R
%>A? ,;<?F
%>A?
0LIJ?LNS
Value in Relationships low high
Key-Value
7B?H MBIOF> SIO OM? ; 'L;JB $;N;<;M?g
K V
BigTable K V V V V
Document
Relational
Graph
"
Educating the Planet
Educating the Planet
Person
Person Student Teacher
Course
Institution
Concept
Discussion
Comment
Share
enrolledIn
teaches
relatesTo
hasCourse
belongsTo
follows
author references
hasComment relatesTo
author
partOf
relatesTo
Person
Person Student Teacher
Course
Institution
Concept
Discussion
Comment
Share
enrolledIn
teaches
relatesTo
hasCourse
belongsTo
follows
author references
hasComment relatesTo
author
partOf
relatesTo
Titan
Integrative Data Model CH ; JIFSAFIN MNIL;A? QILF>
Student
Person
Teacher
Course
Institution
Concept
Discussion
Comment
Share
enrolledIn
teaches
relatesTo
hasCourse
belongsTo
follows
author references
hasComment relatesTo
author
partOf
DiscussionRank
relatesTo
Titan
Analyze Relationships CH L?;F NCG?
Scaling Titan
HOG<?L I@ NL;HM;=NCIHM
MCT? I@ NB? AL;JB
121 Billion Edges 6.2 Billion Vertices
U -CFFCIH 5HCP?LMCNC?M
0F;=?G?HN 'LIOJ
BCU .4RF
1.1 million edges / sec
OMCHA <;N=B GI>?
Data Ingestion
\^ GU .G?>COG
x = [] as Set;!m = user.out('follows').aggregate(x)[0..(num*2-1)]!!.out('follows').except(x)[0..limit]!!.groupCount.cap.next();!
m.sort{-it.value}[0..(num-1)]!._().transform{ [userid: it.key.id, !! ! ! ! ! ! !points: it.value]};!
&IFFIQ 2?=IGG?H>;NCIH
Generic Graph API
Dataflow Processing
TraversalLanguage
Object-GraphMapper
GraphAlgorithms
GraphServer
=IIF MNO@@ =IGCHA
2%34 h *3/. 4CN;H’M %=IMSMN?G
KO?LS F;HAO;A?
10,200 transactions / sec
UZ L;H>IGFS =BIM?H =IGJF?R NL;P?LM;F N?GJF;N?M
Throughput
Transaction Description Avg (ms) Stdev (ms) Student retrieves all content for a single course in their course list
279.32 81.83
Student follows another student 193.72 22.77 Student is recommended people to follow
241.33 256.48
Student reads their stream and shares an item with followers
284.07 68.20
Student retrieves their profile 53.740 22.61 Student reads the most recent comments for their courses
211.07 45.56
Scaling Titan
N?=BHC=;F J?LMJ?=NCP?
Vertex Representation
time: 1
5
8
4
9
2
7
mother
battled
battled
battled
fought
time: 4
time: 7
CH>O=?> IL>?L
name: Hercules type: demigod
5
Property
Property
Edge
Edge
Edge
Edge
Edge
LIQ CH>C=?M @IL @;MN P?LN?R =?HNLC= KO?LC?M
label id + direction
primary key edge id Δ
vertex id signature
properties other
properties
Edge Representation
Column Value
=IGJL?MM?> M?LC;FCT?> I<D?=NM
P;LC;<F? FIHA ?H=I>CHA
Token Ring
Graph Partitioning
;MMCAHM C>M NI G;J P?LNC=?M CHNI “IJNCG;F” NIE?H L;HA?
,INM I@ CHN?L?MNCHA KO?MNCIHM @IL@ONOL? QILE
OM?M "/0
Aurelius Graph Cluster
Stores a massive-scale property graph allowing real-time traversals and updates
Batch processing of large graphs with Hadoop
Runs global graph algorithms on large, compressed,
in-memory graphs
Map/Reduce Load & Compress
Analysis results back into Titan
Bulk Load
TITAN FAUNUS FULGORA
Apache 2
titan.thinkaurelius.com faunus.thinkaurelius.com
Special Thanks
Steve Hill (@kindageeky) Director Architecture & Innovation
at Pearson Education
AURELIUS THINKAURELIUS.COM
We are Hiring