Upload
moumie-soulemane
View
34
Download
0
Embed Size (px)
Citation preview
Outline
Introduction
Graph in real world
What is a Graph ?
Data Model
Graph in RDBMS
Graph-based modeling
Graph Databases
Graph Query Languages
Demo with Neo4J and OrientDB
Conclusions
Introduction
We live in a connected world. There are no isolated pieces of information around us but rich ,connected domains all around us.
Interconnectivity of data is an important aspect.
Early adopters of graph technology re-imagined their businesses around the value of data relationship.
These companies quickly grew up from unknown startup to large industrial corporations.
Google, LinkedIn, PayPal, Facebook, Twitter.
Data model
Definition: A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world.
Ref: 7
Graph in RDMBS ?: Issues
While storing a graph in a relational
database is simple, querying it,
particularly traversing it,
can be time-inefficient due to the
number of potential joins with its
complex queries
Graph Database?: Definition
“A graph database is any storage system that
provides index-free adjacency. ”
• Each vertex serves as a “mini index” of its adjacent elements
•No index lookups are necessary.
• The cost of the local step remains constant as the graph grows
• Cheaper than global indexes
Ref: 10
Graph Database?: Definition
“A database that uses graph structures for semantic
queries with nodes,
edges and properties to represent and store data”
Independent of the way the data is stored internally.
It‟s really the model and the implemented algorithms that matter.
Ref: 12
Graph data model : Building blocks
Nodes : entities
Relationships: connect entities and structure domain
Properties: attributes and meta data
Labels: group node by role
Graph data model : Why ?
For applications where „Interconnectivity and topology‟ matters.
It allows for a more natural modeling of connected data. Graph structures are visible to the user and they allow a natural way of handling applications data, for example, hypertext or geographic data
Queries can refer directly to graph structure.
So, we can do specific graph operations like – shortest path, sub graph determining etc.
For implementation, graph databases may provide special graph storage structures, and efficient graph algorithms for realizing specific operations
Ref: 14
Graph data model : Motivation and Application
Critic on classical DB models – drawbacks + difficulty for user to see data connectivity.
For applications – where complexity exceed capabilities of relational database. e.g. Managing transport n/w.
Limited expressive power of current query language.
the appearance of on-line hypertext evidenced the need for other db-models.
In technological networks, the spatial and geographical aspects of the structure are dominant.
Ref: 14
Database model : Notions
Schema:
-Database schema is the skeleton of database.
- It is designed when the database doesn't exist at all.
- A database schema does not contain any data or information.
Instance
- It is a state of operational database with data at any given time.
- It contains a snapshot of the database.
- Database instances tend to change with time.
Graph database model : representation
Representation of database:
flat graph: has many interconnected nodes, not expressive, extendible ,
difficult to present the information to the user in a clear way.
hypernode: set of nested graphs, expressive, it is a graph whose nodes
can themselves be a graph. Offers the ability to represent each real-world
object as a separate database entity.
Graph database model : Data structures
1. Logical Data model
The schema uses two basic type nodes for representing data values (N,L), and two product
type nodes (NL,PP) to establish relations among data values in a relational
style. The instance is a collection of tables, one for each node of the schema
Ref: 14
Graph database model : Data structures
2. Hypernode Data Model
The schema defines a person as a complex object with the properties name
and lastname of type string, and parent of type person (recursively defined). The instance
shows the relations in the genealogy among different instances of person
Ref: 14
Graph database model : Data structures
3. Hypergraph-Based Data Model (GROOVY)
GROOVY: Graphically Represented Object-Oriented data model with Values
The schema level models an object PERSON as a hypergraph that relates the attributes
NAME, LASTNAME and PARENTS.
Ref: 14
Graph database model : Integrity constraints
Integrity constraints are general statements and rules that define the set of consistent
database states, or changes of state. In the case of graph db-models, it includes:
Schema instance integrity: Entity types and type checking
Schema instance separation: degree to which schema and instance are different objects
in the database
Redundancy of data: preserve uniqueness of data
Object identity and referential integrity: Entity Integrity assures that each hypernode is a
unique real world entity identified by its content; Referential Integrity requires that only
existing entities be referenced.Ref: 14
Graph Databases : Critics
Yes, graph model is more versatile than relational model, but it doesn't
make it universal - in some cases, this versatility is a roadblock for
optimizations.
In fact, modern graph databases are a niche solutions for a narrow set of
tasks - finding a route from A to B, working with friends in a social
network, information technology in medicine.
For most business applications relational databases continue to prevail.
Graph Databases : Critics
Relational databases were designed to aggregate
data, graph to find relations.
E.g: In the financial domain, all connections are known,
You only aggregate data by other data to find sums
and so on.
Graph Databases : Critics
Usually need to learn a new query language like
CIPHER, Gremlin, SparcQL
You have to use an API.
Fewer vendors to choose from, and smaller user
base, so harder to get support when you run into
issues
Graph Databases : Critics
Graph databases are relatively immature
compared to well-established RDBMS.
Requires conceptual shift
No standardization
Graph Query Language : Extended SQL &Gremlin
OrientDB is a 2nd Generation Distributed
Graph Database with the flexibility of
Documents in one product. OrientDB is
another great graph DB tool which also
operates as a document DB or an Object-
Oriented Database. Its query language is
based on SQL to make it 'more familiar to
TSQL developers'. Like Neo4J there is a
community edition available and licensing
for enterprise is very reasonable.
Ref: 19
Graph Query Language: Gremlin
A lot of graph databases support their custom languages (e.g. Cipher in Neo4j).
These languages are really useful, however they become useless on other databases.
Gremlin is a powerful domain specific traversal language for graph databases.
This language is supported by all popular graph databases.
Learning Gremlin for graph databases is equivalent to learning SQL for relational
databases.Ref: 21
References[1] https://neo4j.com/blog/rdbms-graphs-basics-for-relational-developer/
[2] http://images.google.de/imgres?imgurl=http%3A%2F%2Fcdn2.business2community.com%2Fwp-content%2Fuploads%2F2014%2F03%2Fistock_000006832296xsmall_small.jpg&imgrefurl=http%3A%2F%2Fwww.business2community.com%2Fcontent-marketing%2Fbrand-boring-content-marketing-0819786&h=232&w=300&tbnid=GUe7dYIZl9-29M%3A&docid=jRPxqmLQ1TLKWM&ei=2V1uV4-eG8yWgAad_baAAQ&tbm=isch&client=firefox-b&iact=rc&uact=3&dur=226&page=3&start=42&ndsp=27&ved=0ahUKEwjP7pz8_MLNAhVMC8AKHZ2-DRAQMwiLASgrMCs&bih=634&biw=1366
[3] http://www.slideshare.net/infinitegraph/an-introduction-to-graph-databases, slide 5
[4] Trees and Hierarchies in SQL for Smarties, Joe Celko, Morgan Kaufmann, ISBN: 1558609202
[5] http://www.slideshare.net/ehildebrandt/trees-and-hierarchies-in-sql
[6] http://www.slideshare.net/navicorevn/hierarchical-data-models-in-relational-databases
[7] https://en.wikipedia.org/wiki/Data_model
[8] http://www.slideshare.net/slidarko/graph-windycitydb2010/25-Representing_a_Graph_in_a
[9] GRAPH DATABASES AND ORIENTDB. INFO-H-415: Advanced Databases (Project). Professor: Esteban Zimányi,
cs.ulb.ac.be/public/_media/teaching/infoh415/student_projects/orientdb.pdf
References[10] http://systemg.research.ibm.com/database.html
[11] https://www.youtube.com/watch?v=kpLqfFGubKM
[12] https://www.arangodb.com/2016/04/index-free-adjacency-hybrid-indexes-graph-databases/
[13] https://neo4j.com/blog/rdbms-vs-graph-data-modeling/
[14] Angles, R., & Gutierrez, "Survey of graph database models", ACM Computing Surveys, Vol.40, No.1, Article 1, Feb.2008
[15] http://db-engines.com/en/ranking_trend/graph+dbms
[16] R.Campbell et al., "A performance evaluation of open source graph databases",ACM ,PPAA ‟14, February 16, 2014.
[17] http://www.tutorialspoint.com/neo4j/neo4j_cql_introduction.htm
[18] https://neo4j.com/developer/cypher-query-language/
[19] http://orientdb.com/docs/last/index.html
[20] http://pettergraff.blogspot.de/2014/01/getting-started-with-orientdb.html
[21] http://www.fromdev.com/2013/09/Gremlin-Example-Query-Snippets-Graph-DB.html
[22] http://sql2gremlin.com/
[23] http://gremlindocs.spmallette.documentup.com/