Upload
knoldus-software-llp
View
3.305
Download
0
Embed Size (px)
Citation preview
Introduction to TitanDB
Bharat Singh
Software Consultant
Knoldus Software LLP.
Agenda
● Graph Database● What is Graph Database● Need for Graph Database
● Titan DB● Why Titan DB● CAP theorem● Architecture overview ● Future of TitanDB
● Apache TinkerPop● What is Apache TinkerPop● Need for Apache TinkerPop
What is Graph Database
● A database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.
● Most graph databases are NoSQL in nature
● Store data in a key-value store or document-oriented database.
● Store relationships between values as first class citizens.
Need for Graph Database
● Data is more connected : Being shared across multiple applications on the web
● It is easier to query data stored in a graph structure where nodes are highly connected
● It removes the need to perform multiple join operations between adjacent neighbours
● It allows the use of many algorithms that helps in optimization
● Allows visualization of data and infer hidden relationships or derive predictions from data.
Why Titan DB
● Support for very large graphs. Titan graphs scale with the number of machines in the cluster.
● Support for ACID properties and eventual consistency.● Support for very many concurrent transactions and
operational graph processing.● Titan’s transactional capacity scales with the number of
machines in the cluster and answers complex traversal queries on huge graphs in milliseconds.
● Vertex-centric indices provide vertex-level querying to solve infamous super node problem.
● Provides an optimized disk representation to allow for efficient use of storage and speed of access.
● Open source with the liberal Apache 2 license.
Features of Titan DB● Support for various storage backends:
– Apache Cassandra– Apache HBase– Oracle BerkeleyDB
● Support for global graph data analytics, reporting, and ETL through integration with big data platforms:– Apache Spark– Apache Giraph– Apache Hadoop
● Support for geo, numeric range, and full-text search via:– ElasticSearch– Solr– Lucene
● Native integration with the TinkerPop graph stack:– Gremlin graph query language– Gremlin graph server– Gremlin applications
CAP Theorem
● CAP Theorem
– C=Consistency
– A=Availability
– P=Partitionability
● HBase favours consistency
– At expense of yield
– i.e. non completed requests
● Cassandra favours availability
– At expense of harvest
– i.e. completeness of answer
● Berkeley DB is non distributed
Architecture overview of Titan DB
Future of TitanDB
● Aurelius is the startup behind Titan, an open source graph database
● DataStax, the company that delivers Apache Cassandra™ to the enterprise have now acquired Aurelius on Feb 3rd, 2015
● The Aurelius team will join DataStax to build DataStax Enterprise (DSE) Graph, adding graph database capabilities into DSE alongside Apache Cassandra
What is Apache TinkerPop
● A Graph processing system, currently under Apache incubation
● Has Tinkerpop3 Structure API
● Graph, Element, Property
● Has Tinkerpop3 Process API● TraversalSource, GraphComputer
● Gremlin query language● A scripting language for graph traversal and mutation
● REST API
Need for Apache TinkerPop
Dealing with such complex databases, requires a well-implemented API by the vendor. But using a vendor specific API, makes migrating to another database impossible.
The solution is provided by Apache Tinkerpop
References
•https://en.wikipedia.org/wiki/Graph_database
•http://thinkaurelius.github.io/titan/
•http://tinkerpop.apache.org/docs/3.2.0-incubating/reference/
•http://www.datastax.com/2015/02/datastax-acquires-aurelius-the-experts-behind-titandb
Thank You