Upload
matthias-broecheler
View
603
Download
2
Embed Size (px)
DESCRIPTION
Slides from the meetup presentation in NYC (March 2014). Covers the current version of Titan and Faunus.
Citation preview
AURELIUSTHINKAURELIUS.COM
TITANScalable Graph Database
Matthias Broecheler@mbroechelerMarch 6th, MMXIII
Graph Database
distributed
real time
opensource
name: Herculestype: demigod
name: Cerberustype: monster
battled
time:12
Vertex
Edge Label
Edge
Property= key + value
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
g.Vg.E
v
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
v = g.V.has(‘name’,’Hercules’)
v
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
v.out(‘father’,’mother’)
v
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
v.out(‘father’).out(‘brother’).name
v
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
v.outE(‘battled’).has(‘time’,T.gt,5).inV.name
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
g.V.has(‘age’,T.gt,4200)
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
g.E.has(‘time’,T.lt,5)
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
saturn.as('x').in('father').loop('x'){it.loops < 3}.next()
name: Jupitertype: god
name: Herculestype: demigod
name: Cerberustype: monster
father father
mother
brother
brotherbattled
pet
time:12
name: Plutotype: godage: 4000
name: Neptunetype: godage: 4500
name: Alcmenetype: humanage: 45
name: Saturntype: titanage: 10000
name: Hydratype: monster
battledtime: 2
g.V.sideEffect{it.rank = it.both.both.both.count()
}
AURELIUSTHINKAURELIUS.COM
Titan DatabaseArchitecture Overview
Titan Features
I. Data Management
II. Vertex-Centric Indices
Titan Features
III. Graph Partitioning
IV. Edge Compression
Architecture Analogy
MyISAM
Flexible Persistence
Partitionability
AvailabilityConsistency
g.E.has(‘location’,WITHIN,Geoshape.circle(38,24,50)
Full text & Geo Search
I. Navigate Memory
Sequential Data Access
II. Manage Concurrency
Multiple users Units of work
Atomicity Isolation Consistency Distribution
Transactions
Vertex Representation
5
Property
Property
Out-Edge
In-Edge
Out-Edge
In-Edge
In-Edge
row indices for fastvertex centric queries
byte
ord
er
sort
ing
cell = column + value
row
key
Titan Storage Model
Adjacency list in onecolumn family
Row key = vertex id Each property and edge
in one column Denormalized, i.e. stored twice
Direction and label/key as column prefix Use slice predicate for quick retrieval
5
5
label id +
directionsort key
Δ vertex id
Δ edgeid
signaturepropertie
s
other propertie
s
Edge Representation
Column Value
compressed serialized objects
variable long encoding
Properties & Edges are atomic
Vertex-Centric Indices
Sort and index edges per vertex by sor tkey Sort key can be composite
Enables efficient focused traversals Only retrieve edges that
matter Uses push down
predicates for quick, index-driven retrieval
v
time: 1
foughtfoughtfather
mother
battled battled battled
battled
time: 3 time: 5
time: 9v.query()
v
time: 1
father
mother
battled battled battled
battled
time: 3 time: 5
time: 9v.query() .direction(OUT)
v
time: 1
battled battled battled
battled
time: 3 time: 5
time: 9v.query() .direction(OUT) .labels(‘battled’)
v
time: 1
battled battled
time: 3
v.query() .direction(OUT) .labels(‘battled’) .has(‘time’,T.lt,5)
v
time: 1
battled battled
time: 3
v.query() .direction(OUT) .labels(‘battled’) .has(‘time’,T.lt,5)
=
v.outE(‘battled’).has(‘time’,T.lt,5).inV
Query Optimization
Consistency
on eventually consistent storage backends, Titan can enforce consistency constraints by configuring types withUniquenessConsistency.LOCK Titan acquires locks to avoid conflicting
changes Acquiring locks is expensive use with
care Locking protocol used is configurable
reasonably safe implementation, not completely fail-safe
Token Ring
Graph Partitioning
assigns ids to map vertices into “optimal” token range
Lots of interesting questions forfuture work
uses BOP
Educating the Planet
Person
PersonStuden
tTeacher
Course
Institution
Concept
Discussion
Comment
Share
enrolledIn
teaches
relatesTo
hasCourse
belongsTo
follows
author
references
hasComment relatesTo
author
partOf
relatesTo
121 Billion Edges6.2 Billion Vertices
1 Million Universities3 . 5 Billion Students
Placement Group
hi1 .4xl
Setup
1.1 million edges / sec
using batch mode
Data Ingestion
80 m1 .medium
10,200 transactions / sec
16 randomly chosen complex traversal templates
Throughput
Titan Local Caching
Flexible Persistence
Partitionability
AvailabilityConsistency
Local Deployment
Application + TitanStorage Backend
Application + Titan + Storage Backend (embedded)
Remote Deployment
Application + Titan
Storage Backend Cluster
Server Deployment II
Application
Cluster of: (2 JVM)- Titan + Rexster- Storage Backend (via localhost)
Native BlueprintsImplementation
Gremlin QueryLanguage
Rexster Server any Titan graph can
be exposed as a REST endpoint
Titan Ecosystem
AURELIUSTHINKAURELIUS.COM
FaunusBatch Graph Analytics
Hadoop-based GraphComputing Framework
Graph Analytics Breadth-first Traversals Global Graph Computations Batch Big Graph Data
Faunus Features
Faunus Architecture
g._()
Faunus Work Flow
Compressed HDFS Graphs stored in sequence files variable length encoding prefix compression
Degree Distribution
GitHub Network
g.V.sideEffect{it.degree = it.out(‘follows’).count()
}.degree.groupCount
Degree Distribution
P(k) ~ k-γ
γ = 2.2
Global Recommendations
gremlin> g.E.has('label','pushed','to').keep.V.out('pushed').out('to').in('to').in('pushed').sideEffect('{it.score =it.pathCounter}').score.order(F.decr,'name')
# Top 5:Jippi 60892182927garbear 30095282886FakeHeal 30038040349brianchandotcom 24684133382nyarla 15230275746
AURELIUSTHINKAURELIUS.COM
Big PictureClosing Thoughts
Value in Relationshipslow high
Key-Value
Why Graph Databases?
K V
BigTable
K V V V V
Document
Relational
Graph
The value of data is proportional to the
number of meaningful relationships
Social Networks
Recommendations
Path Finding
Graph Search
Knowledge Graph
Markets & Risks
ECONOMY
Health & Medicine
HEALTH
June 14th
2012September
2012December
2012March2013
November2013
AlphaRelease
Titan0.1.0
Titan0.2.0
Titan0.3.0
Titan0.4.0
Experimental release of a distributed, open -source graph database
First stable release
Rewrite of coreIndexing & ElasticSearch
PerformanceFeature ExtensionFulgora
Faunus Release
What’s Coming
Creating and updating indexes Vertex-centric indexes Graph indexes
Log integration Tighter Titan-Faunus Integration Graph Partitioning Declarative Query Answering Usability Improvements
Aurelius Graph Cluster
OLTP OLAP
Hadoop MapReduce
Analysis resultsback into Titan
Apache 2
g.V.label.groupCountg.v(101).out
titan.thinkaurelius.com
faunus.thinkaurelius.com
AURELIUSTHINKAURELIUS.COM
@AURELIUSGRAPHS