Upload
jason-plurad
View
457
Download
1
Embed Size (px)
Citation preview
Jason Plurad • [email protected] • @pluradjIBM Open Technology • Apache TinkerPopJanuary 14, 2017 • Graph Day Texas • #ddtx17 #gdtx17
Enabling Multimodel Graphswith Apache TinkerPop™
Agenda
Apache TinkerPopMultimodel Graphs
Graph Traversal Strategies
Provider Optimizations
On the Horizon
2 @pluradj #ddtx17 #gdtx17
Apache TinkerPop
§ Open source, vendor-agnostic,graph computing framework
§ Gremlin graph traversal language
4
Apache TinkerPop™
Maintainer Apache Software Foundation
License Apache
LatestRelease
3.2.3October 2016
https://tinkerpop.apache.org@pluradj #ddtx17 #gdtx17
Multimodel Database
§ Graphs often are not alone in a data application
§ Multimodel: Combining capabilities of differentdatabase types
§ Choose the right tool for the job
§ Use graphs for highly connected data
§ Single persistence layer
7
OrientDB®
Maintainer OrientDB
License Apache
LatestRelease
2.2.14December 2016
https://orientdb.com
@pluradj #ddtx17 #gdtx17
Multimodel Platform
§ Graphs often are not alone in a data application
§ Multimodel: Combining capabilities of differentdatabase types
§ Choose the right tool for the job
§ Use graphs for highly connected data
§ Take advantage of existing storage architectures
8
DataStax Enterprise Graph
Maintainer DataStax
License Commercial
LatestRelease
5.0.5December 2016
https://datastax.com
@pluradj #ddtx17 #gdtx17
Gremlin Machine:Everything Is a Traversal
§ Traversal
§ Step
§ Traverser
§ Traversal Source
§ Traversal Strategy
10 @pluradj #ddtx17 #gdtx17
explain()
§ Details on how a traversal is compiled into a final execution plan
11 @pluradj #ddtx17 #gdtx17
withStrategies() / withoutStrategies()
§ Add or remove specific traversal strategies to a traversal source
12 @pluradj #ddtx17 #gdtx17
Traversal Strategy Types
1. Decoration
2. Optimization
3. Provider Optimization
4. Finalization
5. Verification
13 @pluradj #ddtx17 #gdtx17
Decoration
§ Application-level feature that can be embedded into the traversal logic
§ Event: raise events for graph mutations
§ Partition: use partition names to restrict element reads/writes
§ Sack: use a sack to store data that gets updated as traversers split/merge
§ Subgraph: restrict element reads based on traversals
14 @pluradj #ddtx17 #gdtx17
Finalization
§ Enforce final adjustment, cleanup, or analysis required before executing the traversal
§ MatchAlgorithm: used in match() step to reorder execution plan– CountMatchAlgorithm: largest traversal reduction goes first (default)– GreedyMatchAlgorithm: traversers drain in order
§ Profile: injects profile steps into traversal to measure runtime/counts
15 @pluradj #ddtx17 #gdtx17
Verification
§ Prevent traversals that are not legal for the application or traversal engine
§ LambdaRestriction: Do not allow use of lambdas
§ ReadOnly: Do not allow graph mutations
§ StandardVerification: Vertex computing steps must be executed by agraph computer. Reducing barrier steps cannot immediately followrepeat steps.
16 @pluradj #ddtx17 #gdtx17
Optimization
§ A more efficient way to express the traversal using TinkerPop steps only
§ AdjacentToIncident: replace out().count() with outE().count()
§ IncidentToAdjacent: replace outE().inV() with out()
§ Connective: rewrites binary conjunction (and/or steps)
§ FilterRanking: reorders filter and order steps to prioritize steps that willkeep traversers small and bulkable
§ InlineFilter: removes parent filters when child traversals are pure filters
§ PathRetraction: traversers shed unneeded path information,reducing path footprint, increasing likelihood of bulking
17 @pluradj #ddtx17 #gdtx17
Sqlg
§ Implementation of Apache TinkerPop over RDBMS– PostgreSQL– HSQLDB (HyperSQL Database)– H2 Database Engine
§ Optimizes Gremlin by reducing the number ofcalls to the RDBMS
§ Analyze the steps and where possible combinethem into a single SqlgGraphStepCompiled orSqlgVertexStepCompiled
19
Sqlg
Maintainer Pieter Martin
License MIT
Latest Release 1.3.2November 2016
https://github.com/pietermartin/sqlg
@pluradj #ddtx17 #gdtx17
TitanDB
§ Scalable graph database distributed onmulti-machine clusters
§ Pluggable storage backends– Apache Cassandra®
– Apache HBase®
§ Pluggable index backends– Apache Solr™– Elasticsearch™
21
TitanDB™
Maintainer DataStax
License Apache
Latest Release
1.0November 2015
https://titandb.io
@pluradj #ddtx17 #gdtx17
TitanDB + ScyllaDB storage backend
§ Scylla is a drop-in replacement for Apache Cassandra 2.1– Higher throughput, lower latency– C++ implementation, I/O scheduler
§ Scylla on IBM Compose (beta)– https://www.compose.com/scylladb
§ Titan 1.0 compatibility starting with Scylla 1.3
23
ScyllaDB™
Maintainer ScyllaDB
License AGPL
Latest Release
1.5December 2016
https://scylladb.com
@pluradj #ddtx17 #gdtx17
IBM Graph
§ Fully-managed, Apache TinkerPop compatibleOLTP graph database
§ Focus on your data, not on install and operations
§ #sleepMore
24
IBM Graph
Maintainer IBM
License Commercial
Latest Release
GAJuly2016
https://ibm.biz/IBMGraph
@pluradj #ddtx17 #gdtx17
Unipop
§ Data federation and virtualization engine– Elasticsearch®
– JDBC
§ Models your data as a "virtual" graph
§ Uses Gremlin as graph query language
26
Unipop
Maintainer Sean Barzilay,Ran Magen
License Apache
Latest Release 0.2September 2016
https://github.com/unipop-graph/unipop
@pluradj #ddtx17 #gdtx17
Apache S2Graph (incubating)
§ A graph database designed for distributed andscalable management of highly interconnecteddata at web scale
§ Built with Apache HBase, Scala
§ S2Graph powers 20+ services in productionat Kakao (mobile messaging app)
§ Apache TinkerPop support coming soon[JIRA S2GRAPH-72]
27
Apache S2Graph (incubating)
Maintainer Apache Software Foundation
License Apache
Latest Release 0.1October 2016
https://s2graph.incubator.apache.org
@pluradj #ddtx17 #gdtx17
HGraphDB
§ Apache HBase as an Apache TinkerPopGraph Database
§ Allows user-supplied ids
§ Integration with Apache Giraph for OLAP
28
HGraphDB
Maintainer Robert Yokota
License Apache
Latest Release 0.4.12January 2017
https://github.com/rayokota/hgraphdb
@pluradj #ddtx17 #gdtx17
JanusGraph
§ Fork of TitanDB code base
§ Scalable graph database distributed onmulti-machine clusters with pluggable storageand indexing
§ Vendor-neutral, open community withopen governance
29
JanusGraph™
Maintainer Linux Foundation
License Apache
First Release Planned1Q 2017
https://janusgraph.org
@pluradj #ddtx17 #gdtx17
Acknowledgements
30 @pluradj #ddtx17 #gdtx17
§ The Crew from Aurelius
§ The Apache Software Foundation
§ The Linux Foundation
§ Ketrina Yim