Upload
abdurrachman-mappuji
View
209
Download
1
Embed Size (px)
Citation preview
Introduction to Graph Databases and Neo4j
What Is a Graph Database?
Roland Guijtwww.rmgsolutions.nl
@rolandguijt
Agenda
What is a Graph?
What is a Graph Database?
Why a Graph Database?
Graph Databases vs
Relational Databases
Graph Databases vs
NosqlDatabases
Examples of Graph
Databases
Property Graph Model
Nodes and relationships contain
properties
Relationships are named and directed with a start and end
node
Contains nodes and relationships
JoannaName: Joanna
City: Salt Lake CityMarried: true
PluralsightName: PluralsightCity: Salt Lake City
Rocks: true
Works_ForSince: 2010/1/1
Why a Graph Database?
“Use a relational database for all applications”
“Consider the type of database for every application you’re writing”
Why a Graph Database?
Flexible schemaStructure and queries are brain friendly (=
easier)Highly related data
Graph Databases vs. Relational Databases
Relational Graph
Tables Nodes
Schema with nullables No schema
Relations with foreign keys Relation is first class citizen
Related data fetched with joins
Related data fetched with a pattern
Relational Databases Advantages
Calculations within one table Grouping of dataHighly structured
data
The Foreign Key System
Customer
CustomerId Name City
1 Joanna Salt Lake City
Order
OrderId CustomerId Date
1 1 2015/1/1
LineItem
OrderId ProductId Quantity
1 1 5
Product
ProductId Description Use
1 Candle Inside
Partner and Vukotic’s Experiment
• Social network• Friends of Friends structure• mySql and Neo4j• 1000.000 people• Each with an average of 50 friends• Depth 2: Find all friends of a user’s friends• Depth 3: Find all friends of friends of a user’s friends• Etc.
Depth Rel. Db (s) Neo4j (s) # records
2 0,016 0,01 ~2500
3 30,267 0,168 ~110000
4 1543,505 1,359 ~600000
5 Unfinished 2,132 ~8000000
A Document
Name: JoannaCity: Salt Lake CityOrder: {
id: 1,Date: 2015/1/1LineItems: [{
Quantity: 3,Product: {Description: “Candle”,Use: “Inside”
}]}
}
Customer
Document Databases
Duplication of data is not something to
avoidCopy master dataAll related data in
one entity
Documents
Name: JoannaCity: Salt Lake CityOrder: {
id: 1,Date: 2015/1/1LineItems: [{
Quantity: 3,Product: {Description: “Candle”,Use: “Inside”
}]}
}
Customer
Name: PeterCity: DallasOrder: {
id: 2,Date: 2015/2/1LineItems: [{
Quantity: 2,Product: {Description: “Matches”,Use: “Inside”
}]}
}
Customer
Graph Databases vs. Document Databases
Document Graph
Document Nodes
No schema No schema
Relations with foreign keys or embedded
Relation is first class citizen
Related data fetched with joins or embedded
Related data fetched with a pattern
A Social Graph
Graphs ALM Testing Java .Net Web API
John Cathy Deborah Jennifer Mike
Cyber ITActive
Who shares Cathy’s skills?
Who works in the same company as Cathy and shares the most skills?
Security
Login Read Insert Update Delete Grant rights
John Cathy Deborah Jennifer Mike
Admin Editor PosterReader
Which rights does Deborah have?
Who edited a blog post and when?
Summary
• A graph is a collection of nodes connected by relationships• Graph databases are flexible and performant with highly
related data• All database types have their place• Relational database suitable for reporting and calculation on a
single table. Weak point: related tables• Document database suitable to store objects. Weak point:
related documents• Graph databases are great in many scenarios, but not all
Building Block 1: Node
Schemaless entities/objects
Contain properties Key = string
Value = primitive data type
Indexing
Unique Constraint
Can have labels
Data Types
boolean (true/false)
byte (8 bits)
short (16 bits)
int (32 bits)
long (64 bits)
float (32 bits)
double (64 bits)
char (Unicode)
string (Unicode)
arrays
- Set implicitly- Automatic conversion when updating- No nulls
Building Block 2: Relationships
Connect nodes
Are directed
Are named
Can contain properties Same as in node
Installing Neo4j
Windows• Desktop app (installer)• Console app• Windows service
Linux• Unix console app• Linux service
Mac OSX• Homebrew• Terminal• OSX service
• Community Edition great way to start• Need JDK
Summary
• Neo4j is a reliable graph database implemented in Java with enterprise features and a REST API.
• It uses the property graph model and is by default schemaless.• Cypher is what Neo4j uses as it’s primary query language.• The community edition is free and open source.• Enterprise features are available in the other editions.• The properties of a node have an implicit data type when set
with a query and support indexing and a unique constraint.• Relationships implement the same properties.• Neo4j can be installed as a desktop app (Windows only),
command line app and as a service.• Neo4j is configurable using text files.
Cypher Is About Pattern Matching
Recipe to make a query:- Think of a whiteboard friendly pattern or
structure you would like to retrieve - Translate into ASCII art- Surround by clauses
Node NodePlayed
() –[:PLAYED]->()
Cypher Is About Pattern Matching
Actorname: Matt Smith
CharacterPlayed
(:Actor{name:’Matt Smith’}) –[:PLAYED]->(:Character)
Cypher Is About Pattern Matching
Actorname: Matt Smith
CharacterPlayed
(:Actor{name:’Matt Smith’}) –[:PLAYED]->(:Character)-[:COMES_FROM]->(:Planet{name:’Gallifrey’})
Planetname: Gallifrey
The MATCH and RETURN Clauses
Actorname: Matt Smith
CharacterPlayed
(:Actor{name:’Matt Smith’}) –[:PLAYED]->(:Character)(:Actor{name:’Matt Smith’}) –[:PLAYED]->(c:Character)MATCH(:Actor{name:’Matt Smith’}) –[:PLAYED]->(c:Character)RETURN c
Query Examples: 2 Loose Ends
MATCH (actors:Actor)-[:REGENERATED_TO]-> (others)
RETURN actors.name, others.name;
Return the name properties of all nodes with the Label property and put them side by side with the name properties of all nodes that are on the other end of the regenerated_to relation
MATCH (:Character{name:'Doctor'})<-[:ENEMY_OF]-(:Character)-[:COMES_FROM]->(p:Planet)
RETURN p.name as Planet, count(p) AS Count;
Collect all nodes with the Character label which have the enemy_of relation with the Doctor. Check if they have a comes_from relation with nodes with a Planet label. Return the name of the planets along
with the number of occurances
Query Examples: More Complex
MATCH (:Actor{name:"Matt Smith"}) -[:APPEARED_IN]-> (ep:Episode) <-[:APPEARED_IN]- (:Character{name:'Amy Pond'}),
(ep) <-[:APPEARED_IN]-(enemies:Character) <-[:ENEMY_OF]-(:Character{name:'Doctor'})
RETURN ep AS Episode, collect(enemies.name) AS Enemies;
Give me all the episodes the character Amy Pond and the Actor Matt Smith were in. List the enemies of the Doctor that were in that episode beside it.
Query Examples: More Complex
Where
MATCH(:Actor{name:’Matt Smith’}) –[:PLAYED]->(c:Character)RETURN c
MATCH(a:Actor) –[:PLAYED]->(c:Character)WHERE a.name = ‘Matt Smith’RETURN c
• Filters result set
Order By
MATCH(a:Actor) –[:PLAYED]->(c:Character)WHERE a.name = ‘Matt Smith’RETURN cORDER BY c.name
• Orders result set• Supports multiple properties• Use DESC to reverse order
Skip and Limit
MATCH(:Actor{name:’Matt Smith’}) –[:PLAYED]->(c:Character)RETURN cLIMIT 10SKIP 5
• Limits result set
Union
MATCH (a:Actor) RETURN a.nameUNIONMATCH (c:Character)RETURN c.name
• Glues result sets together• Use UNION ALL to include duplicates
With
MATCH(a:Actor)WITH a.name AS name, count(a) AS countORDER BY nameWHERE count > 10RETURN name
• Manipulate result set for the rest of the query• Can have ORDER BY clause
Predicates
• Return true or false for a given input• Input can be properties or patterns• Mostly used in WHERE clause• ALL, ANY, NONE, SINGLE, EXISTS
MATCH(a:Actor)WHERE EXISTS ((a)-[:PLAYED]->())RETURN a.name
Scalar Functions
• Return a single value• LENGTH, TYPE, ID, COALESCE, HEAD,• LAST, TIMESTAMP, TOINT, TOFLOAT, TOSTRING
MATCHp = (:Actor)-[:PLAYED]->(:Character)RETURN LENGTH(p)
Collection Functions
• Return collections of ‘things’• NODES, RELATIONSHIPS, LABELS• EXTRACT, FILTER, TAIL• RANGE, REDUCE
MATCHp = (:Actor)-[:PLAYED]->(:Character)RETURN NODES(p)
Advanced Syntax: Number of Hops
MATCH(:Actor)-[*2]->(p:Planet)RETURN p
MATCH(c:Character)-[:COMPANION_OF*1..2]-(:Character)RETURN c
Advanced Syntax: Shortest Path
MATCH (earth:Planet { name:"Earth" }),(gallifrey:Planet { name:"Gallifrey" }),p = shortestPath((earth)-[*..15]-(gallifrey))RETURN p
Summary
• Cypher is a powerful, declarative query language for Neo4j.• It uses patterns to query data.• Cypher’s main clauses are MATCH and RETURN.• There are more SQL-like clauses like WHERE.• Many powerful functions to be used in query complement
the language.• Going beyond the basic syntax opens up even more
powerful query possibilities.
Agenda
Creating, Updating and
DeletingImporting CSV
Indexes and Unique
Constraint
Advanced data manipulation
CREATE (n)
CreateCreates nodes and relationships
CREATE (n:Actor{name: ‘Peter Capaldi’}) RETURN n
MATCH (matt:Actor{name: ‘Matt Smith’}), chris:Actor{name: ‘Christopher Eccleston’}
CREATE (matt) [:REGENERATED_TO] (chris)
Create Complete Path
CREATE p =(:Actor{name: ‘Peter Capaldi’})-[:APPEARED_IN] ->(:Episode{name:’The Time of The Doctor’})
RETURN p
DeleteDeletes nodes and relationships
MATCH (matt:Actor{name: ‘Matt Smith’})
DELETE matt
MATCH (matt:Actor{name: ‘Matt Smith’})-[r]-()
DELETE matt, r
SetManipulates properties
MATCH (matt:Actor{name: ‘Matt Smith’})
SET matt.salary = 100000, matt.active = true
MATCH (matt:Actor{name: ‘Matt Smith’}), chris:Actor{name: ‘Christopher Eccleston’}
SET matt = chris
MATCH (matt:Actor{name: ‘Matt Smith’})
SET matt.salary = NULL
RemoveRemoves properties or labels
MATCH (matt:Actor{name: ‘Matt Smith’})
REMOVE matt:Doctor
MATCH (matt:Actor{name: ‘Matt Smith’})
REMOVE matt.salary
MergeMatch replacement: returns or creates (parts of) a pattern
MERGE (peter:Actor{name: ‘Peter Capaldi’})RETURN peter
MERGE (peter:Actor{name: ‘Peter Capaldi’, salary: 100000})RETURN peter
MATCH (peter:Actor{name: ‘Peter Capaldi’}), (doctor:Character{name: “Doctor”})MERGE (peter –[r:PLAYED]->doctor)RETURN r
ForeachHelper to set a property or label in a path
MATCH p=(actors:Actor)–[r:PLAYED]->others)WHERE actors.salary > 100000FOREACH (n IN nodes(p)| set n.done = true)
IndexPerformance gain when querying for a certain property value
CREATE INDEX ON :Actor(name)
- Keeps dictionary of values- Watch performance issues while writing- The use of an index is automatic
DROP INDEX ON :Actor(name)
MATCH (matt:Actor{name: ‘Matt Smith’})RETURN matt
Unique ConstraintEnsures uniqueness of a property value
- Currently the only constraint available- Watch performance issues while writing- Will also add an index
CREATE CONSTRAINT ON (a:Actor)ASSERT a.name IS UNIQUE
DROP CONSTRAINT ON (a:Actor)ASSERT a.name IS UNIQUE
Importing CSV
- Cypher supports importing CSV- CSV files can be loaded from the local file
system or via HTTPS, HTTP and FTP- Use CREATE and MERGE in conjunction with
LOAD CSV- Example: actors, movies, connections
Importing CSV: Step 1- Import actors- CSV looks like this:
id name
3 Michael Douglas
4 Martin Sheen
5 Morgan Freeman
LOAD CSV WITH HEADERS FROM“http://docs.neo4j.org/chunked/2.1.6/csv/import/persons.csv”
AS csvLine
CREATE (p:Person {id: toInt(csvLine.id), name: csvLine.name})
Importing CSV: Step 2- Import movies, normalize countries- CSV looks like this:
LOAD CSV WITH HEADERS FROM“http://docs.neo4j.org/chunked/2.1.6/csv/import/movies.csv”
AS csvLineMERGE (country: Country {name: csvLine.country})CREATE (movie:Movie {id: toInt(csvLine.id), title: csvLine.title})CREATE (movie)-[MADE_IN]->(country)
id title country
1 Wall Street USA
2 The American President USA
Importing CSV: Step 3- Import actor -> movies relationship- CSV looks like this:
LOAD CSV WITH HEADERS FROM“http://docs.neo4j.org/chunked/2.1.6/csv/import/roles.csv”
AS csvLineMATCH (actor:Person {id: toInt(csvLine.personId}), (movie:Movie {id: toInt(csvLine.movieId})CREATE (actor)-[:PLAYED {role: csvLine.role}]->(movie)
personId movieId role
4 1 Carl Fox
4 2 A.J. MacInerney
Summary
• Use CREATE to create nodes and relationships.• With DELETE you can remove them.• Set and update property values and add labels to nodes
with SET.• REMOVE deletes properties and labels.• MERGE only creates if needed.• An index on a property makes querying faster, but writing
slower.• Use the unique constraint to make property values unique.• Import data from other systems with Cypher’s support for
reading CSV.
Agenda
RESTIndexes and
Unique Constraint
Node and Relationship Operations
Service Root
Cypher via REST Client Access
A Typical Request and Response
POST http://someurlAccept: application/json; charset=UTF-8Content-Type: application/json
{name: “Peter Capaldi”
}
Request:
Response:201: CreatedContent-Length: 1239Content-Type: application/json; charset=UTF-8Location: http://localhost:7474/db/data/node/107
{<Some Data>
}
Service Root
GET http://localhost:7474/db/data/Accept: application/json; charset=UTF-8
- Provides a REST starting point- Returns list of hypermedia links
Node Operations: Get by Id
GET http://localhost:7474/db/data/node/1Accept: application/json; charset=UTF-8
- GET HTTP Method- On service root node URL- Returns data object with properties- And hypermedia links to get the rest
Node Operations: Create
POST http://localhost:7474/db/data/nodeAccept: application/json; charset=UTF-8
- POST HTTP Method- On service root node URL- Returns created node
Node Operations: Create with Properties
POST http://localhost:7474/db/data/nodeAccept: application/json; charset=UTF-8Content-Type: application/json
{name: “Peter Capaldi”
}
- Attach content to the POST request
Node Operations: Delete
DELETE http://localhost:7474/db/data/node/100Accept: application/json; charset=UTF-8
- DELETE HTTP Method
Node Operations: Properties
PUThttp://localhost:7474/db/data/node/1/properties/salaryAccept: application/json; charset=UTF-8Content-Type: application/json100000
- Use same base URL to GET all properties for a node- PUT HTTP method: SET property on node- Name in URL, value attached- PUT without property name replaces all- DELETE HTTP method: remove property from node
Node Operations: Labels
POST http://localhost:7474/db/data/node/1/labelsAccept: application/json; charset=UTF-8Content-Type: application/json
[“Person”, “Actor”]
- Like properties- GET lists, POST adds, PUT replaces
Relationship Operations: Get by node
GEThttp://localhost:7474/db/data/node/1/relationships/allAccept: application/json; charset=UTF-8
GEThttp://localhost:7474/db/data/node/1/relationships/all/PLAYED®ENERATED_TOAccept: application/json; charset=UTF-8
Relationship Operations: Create
POST http://localhost:7474/db/data/node/1/relationshipsAccept: application/json; charset=UTF-8Content-Type: application/json{"to" : "http://localhost:7474/db/data/node/19","type" : "LOVES","data" : {“intensity" : “medium"
}}
- POST- Include JSON with details
Node Operations: Traversals
- Traverse the graph- One node as starting point- Paged traversals are stored for later retrieval
Ingredients- URL of starting node- What to return as URL extension
path, fullpath, node, relationship- Further details in attachment
Node Operations: TraversalsPOST http://localhost:7474/db/data/node/1/traverse/nodeAccept: application/json; charset=UTF-8Content-Type: application/json{"order" : "breadth_first","return_filter" : {"body" : "position.endNode().getProperty('name').toLowerCase().contains(‘p')","language" : "javascript"
},"prune_evaluator" : {"body" : "position.length() > 10","language" : "javascript"
},"uniqueness" : "node_global","relationships" : [ {"direction" : “out","type" : “REGENERATED_TO"
}, {"direction" : "all","type" : “PLAYED"
} ],"max_depth" : 3
}
Batch Operations
POST http://localhost:7474/db/data/batchAccept: application/json; charset=UTF-8Content-Type: application/json[ {
"method" : "POST","to" : "/node","id" : 0,"body" : {
"name" : "bob"}
}, {"method" : "POST","to" : "/node","id" : 1,"body" : {
"age" : 12}
},
{"method" : "POST","to" : "{0}/relationships","id" : 3,"body" : {
"to" : "{1}","data" : {
"since" : "2010"},"type" : "KNOWS"
}}
Indexes
GET http://localhost:7474/db/data/schema/index/ActorAccept: application/json; charset=UTF-8
• List all indexes for a label
POST http://localhost:7474/db/data/schema/index/ActorAccept: application/json; charset=UTF-8{
"property_keys" : [ "name" ]}
• Create an index on a label
DELETEhttp://localhost:7474/db/data/schema/index/ActorAccept: application/json; charset=UTF-8
• Drop index
Transactional Cypher Endpoint
• Execute Cypher via the REST API• Support different output styles, all in JSON• Transaction can remain open between requests• Transaction can timeout
Begin a Transaction and Commit in One Request
POST http://localhost:7474/db/data/transaction/commitAccept: application/json; charset=UTF-8Content-Type: application/json{
"statements" : [ {"statement" : "CREATE (n {props}) RETURN n","parameters" : {
"props" : {"name" : “Peter Capaldi“,“salary” : 100000
}}
} ]}
Output Styles
• Specify style after statement{
"statements" : [ {"statement" : "CREATE (n) RETURN n","resultDataContents" : [ "REST" ]
} ]}
• Default: Columns and contents• REST: Same output as REST operations• Graph: To reconstruct a graph
Begin a Transaction
• POST to transaction base URLPOST http://localhost:7474/db/data/transaction
• Returns info about the transaction201: CreatedContent-Type: application/jsonLocation: http://localhost:7474/db/data/transaction/7{
"commit" : "http://localhost:7474/db/data/transaction/7/commit","results" : …..,"transaction" : {
"expires" : "Mon, 2 Feb 2015 20:53:51 +0000"}}
Execute Subsequent Request in Transaction
• POST to transaction returned earlierPOST http://localhost:7474/db/data/transaction/7
• POST to commit url for final statements in transactionPOST http://localhost:7474/db/data/transaction/7/Commit
• To rollback: DELETE to transaction URL DELETE http://localhost:7474/db/data/transaction/7
• or let timeout expire
Client Access
Create HTTP requests and parse JSON
Use client library
- Do it yourself- More work- No dependency- Total freedom
- Someone else does the work- Ready to go- Dependency- Maybe not entirely what you want
Client Access Demo
Create HTTP requests and parse JSON
Use client library
- C# .Net console app- Microsoft HTTPClient- Class models for request/response
- C# .Net console app- Readify neo4jclient- Class models for actor node
Summary
• The REST API provides access from various platforms.• REST accomplishes this by leveraging HTTP.• Call the service root to get a list of URLs, called hypermedia
controls, that provide a starting point.• There are two ways to do operations on the REST API: Use
pure REST operations or execute Cypher.• To access Neo4j from your app, a client library is the easiest
way, but low level HTTP calls are also a possibility.