Graph Database workshop

Preview:

Citation preview

Graph  Database  Workshop  

Jeremy  Deane    h3p://jeremydeane.net  

(UberConf:Conference)-­‐[:HOSTS]-­‐>(session:Session)  (developer:Person)-­‐[:ATTENDS]-­‐>(session)  (session)-­‐[:PROVIDES]-­‐>(skill:Skill)  (developer)-­‐[:LEARNS]-­‐>(skill)  

Cover.cql  

Agenda  

 Environment  Setup  

 IntroducBon  

 Fundamentals  

 Architecture  

 Advanced  Concepts  

Generated  with  Graphgen  -­‐  h3p://bit.ly/1HkTP20  

Environment  Setup  

①  Download  Neo4j  (2.2.3)  -­‐  h3p://neo4j.com/download/  

②  Install  to  $NEO4J_HOME  

③  Start  Neo4j  (%NEO4J_HOME%/bin\Neo4j  start    or  %NEO4J_HOME%\bin\Neo4j.bat)  

④  Launch  Browser  -­‐  h3p://localhost:7474    

⑤  Default  UID/PW  -­‐  neo4j/neo4  

Cypher  Syntax  HighlighBng:  

  Sublime  2  Package  (Sublime  3  Manual  Install)  

  Vim  Bundle  

  intelliJ  Plug-­‐in  

#Start  Neo4j  Bash    function  neorun()  {  cd  $NEO4J_HOME/bin  ./neo4j  start  cd  $HOME  }  

#Start  Neo4j  Bash    function  neostop()  {  cd  $NEO4J_HOME/bin  ./neo4j  stop  cd  $HOME  }  

Workshop  Setup  

①  Clone  or  Download  Github  Repo  -­‐  h3ps://github.com/jtdeane/graph-­‐workshop  

②  Unpack  to  $HOME/$WORKSHOP_HOME  

③  Open  $HOME/$WORKSHOP_HOME/Data  Cheat  Sheet  

④  Bookmark  or  Open  -­‐  h3p://neo4j.com/docs/stable/cypher-­‐refcard/  

⑤  Bookmark  or  Open  -­‐  h3p://neo4j.com/docs/stable/  

Suggested  Naming  ConvenBons     Labels  -­‐  CamelCase     RelaBonships  -­‐  SNAKE_CASE_UPPER_CASE     ProperBes  -­‐  snake_case_lower_case     Indexes  -­‐  snake_case_lower_case  

Domain  Model  

PracBBoner  

PaBent  

WORKS_FOR   OrganizaBon  

LOCATION  

TREATED_B

Y  MAINTAINS  

PracBBoner  

PaBent  

TREATED_B

Y  

Explore  Web  Console  

//Create  Node  CREATE  (:Practitioner  {name:"Zachary  Smith",  specialty:"General  Medicine"})  

//Retrieve Node MATCH (p:Practitioner) RETURN p

//Update Node MATCH (p) WHERE p.name="Zachary Smith" SET p.specialty="Neurosurgery"

//Retrieve Updated Node MATCH (p:Practitioner) WHERE p.specialty="Neurosurgery" RETURN p.name, p.specialty

//Retrieve Node by ID MATCH (p) WHERE ID(p)=0 RETURN p

//Delete Node By ID MATCH (p) WHERE ID(p)=0 DELETE p

//Merge Node MERGE (p:Practitioner {name:"Zachary Smith"}) ON CREATE SET n.created=timestamp() ON MATCH SET n.updated=timestamp()

Hello.cql  

Explore  Web  Console  //Create Node CREATE (:Patient {name:"Tim Smith", birth_date:"1965-06-27", conditions:["Diabetes", "Epilepsy"]})

//Create Relationship Long - Requires Patient Tim Smith and Practitioner Zachary Smith MATCH (p:Practitioner {name:"Zachary Smith"}) MATCH (m:Patient {name:"Tim Smith"}) CREATE (m)-[r:TREATED_BY]->(p) RETURN m, r, p

//Create Relationship Medium - Requires Practitioner Zachary Smith MATCH (p:Practitioner {name:"Zachary Smith"}) CREATE (m:Patient {name:"Holly Goodwin", birth_date:"1991-11-17"})-[r:TREATED_BY]->(p) RETURN m, r, p

//Create Nodes and Relationship Short CREATE (m:Patient {name:"Jackie Bonk", birth_date:"1978-12-15"})-[r:TREATED_BY]->(p:Practitioner {name:"Yuri Zhivago", specialty:"Immunology"}) RETURN m, r, p

//Clean out all Nodes and Relationships (careful!) MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r

Hello.cql  

IntroducBon  

Neuron  from  mouse  cerebellum  (160x)  -­‐  h3p://bit.ly/1Ja1VrJ  

What  are  Graphs?  

Graph  Theory:  a  graph  is  a  representaBon  of  a  set  of  Objects  where  some  pairs  of  objects  are  connected  by  Links  

Seven  Bridges  of  Königsberg  h3p://bit.ly/1Lv7C66  

Objects  are  Ver$ces  (Nodes)  

Links  are  Edges  (RelaBonships)  

Property  Graph:  Nodes  &  RelaBonships  with  key-­‐value  pairs  (ProperBes)  

Neo4j  Property  Graph:  Nodes  grouped  by  Labels  

NoSQL  Landscape  

Sadalage/Fowler  

h3p://amzn.to/1Lv8W8Z  

Column   Key-­‐Value  

Document  

Graph  

Graph  –  RelaBonal  Database  Comparison  

RelaBonal  Databases  are  great  for  storing  transac'onal  data  in  tabular  tables  

Graph  Databases  are  great  for  storing  semanBcally  rich  connected  data  in  nodes  and  relaBonships  

Depth& RDB&)me&(ms)& GDB&)me&(ms)& #&records&

2" 16" 10" ~2,500"

3" 30,267" 168" ~110,000"

4" 1,543,505" 1,359" ~600,000"

5" hang" 2,132" ~800,000"

From  “Graph  Databases”  by  Robinson,  Webber  and  Eifrem,  2013,  page  20  

Degrees  of  separaBon  between  you  and  Kevin  Bacon;  RelaBonship  Database  falls  over….  

RelaBonal  Databases  require  considerable  up-­‐front  design  (e.g.  NormalizaBon)  resulBng  in  a  ridged  schema  

Graph  Databases  require  no  schema  and  support  an  emergent  design  approach  

Graph  Database  Use  Cases  

Social  (Professional)  Network  

Route  Finding  and  LogisBcs  

Network  and  System  OperaBons  

Security  and  Advanced  AnalyBcs  

h3p://bit.ly/1fYwEOO  

Fundamentals  

Custom  Circuit  Board  Design  -­‐  h3p://bit.ly/1Ja4kTb  

Domain  Model  

PracBBoner  

PaBent  

TREATED_B

Y  

WORKS_FOR   OrganizaBon  

LOCATION  

MAINTAINS  

PracBBoner  

PaBent  

TREATED_B

Y  

WORKS_FOR   OrganizaBon  

LOCATION  

MAINTAINS  

IniBal  Data  Load  

①  Execute  Favorite  “Clean  database  or  nodes  and  relaBonships”  OR  execute:  

②  Import  new  Favorite  “IniBal  Data  Load”  

③  Execute  “IniBal  Data  Load”  OR  

④  Open  data.cql  and  copy  contents  

⑤  Paste  and  execute  in  Web  Console  

//Clean out all Nodes and Relationships (careful!) MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n,r

Clean.cql  

Nodes  

Smith  

(Node)  is  a  thing  or  noun  

(Node)  has  :ProperBes  

{name:  “Zachary  Smith”  specialty:"General  Medicine”}  

(:Label)  groups  (Node)s   :PracBBoner  

//Retrieve a Node with Label Practitioner with a property equal to Zachary Smith MATCH (p:Practitioner) WHERE p.name="Zachary Smith" RETURN p

//Retrieve all Nodes with Label Patient and order by birth date MATCH (m:Patient) RETURN UPPER(m.name), m.birth_date ORDER BY m.birth_date

//Retrieve all Nodes with Label Patient and with diabetes MATCH (m:Patient) WHERE "Diabetes" IN m.conditions RETURN m

//Retrieve all Nodes with Label Patient and without diabetes MATCH (m:Patient) WHERE NOT("Diabetes" IN m.conditions) RETURN m

Fundamentals.cql  

RelaBonships  

(:RelaBonship)  describes  how  (Node)s  are  related  

PracBBoner  

PaBent  

TREATED_B

Y  

(:RelaBonship)  are  direcBonal  and  cannot  exist  without  both  (Node)s  

//Retrieve all Nodes with WORKS_AT Relationship MATCH (a)-[r:WORKS_AT]->(b) RETURN a,r,b

(:RelaBonship)  are  verbs  and  can  have  :ProperBes  

{pcp:  true}  

//Retrieve all Nodes with TREATED_BY Relationship with PCP false MATCH (a)-[r:TREATED_BY {pcp:false}]->(b) RETURN a,r,b

//Retrieve all distinct list of Nodes that MAINTAIN a Node MATCH (a)-[:MAINTAINS]->(b) RETURN COUNT(DISTINCT a)

Fundamentals.cql  

Modeling  

 Graphs  read  as  natural  language  

Acts  Upon  {Verb}   Object  {Noun}  

Subject  {Noun}  

Graphs  are  modeled  with  Circles,  Boxes  and  Arrows  

Graphs  models  translate  to  Ascii-­‐Art  

MATCH(Identifier:Label)-­‐[Identifier:Relationship]-­‐>(Identifier:Label)  

Graph  modeling  is  very  expressive  and  white  board  friendly  

Modeling  Strategies  –  model  using  Domain  Driven  Design  (DDD)  or  model  by  QuesBons  (e.g.  What  do  want  to  do?)  

h3p://amzn.to/1GUkNKA  

Modeling  Guidelines  

•   Do  not  replicate  all  enBty  details  into  Node  ProperBes.  Leverage  a  relaBonal  or  document  database  as  System  of  Record  or  History.  

•   Create  semanBcally  rich  relaBonships  avoiding  words  verbs  like  HAS,  CONTAINS,  or  IS.  

•   When  possible  qualify  relaBonship  with  addiBonal  informaBon  (e.g.  weight,  origin,  or  date  range)  –  “Strengthen  vs.  Atrophy”  

•   Avoid  duplicate  relaBonships  –  (a)-­‐[:likes]-­‐>(b)-­‐[:likes]-­‐>(a)  

•   Use  Linked  Lists  to  increase  performance  (e.g.  head,  previous)  

•   Leverage  intermediate  Node  for  n-­‐ary relaBonships  (e.g.  Sorware,  Version,  Developer,  OrganizaBon)  

ApplicaBon  Programming  Interfaces  

REST  Web  Service  API  

Java  Plasorm  Support    

Other  Popular  Languages  (C#,  Ruby,  Python,  PHP)  

Under  the  covers  –  Java  OpBons:  •   Core  API  •   Traversal  Framework  •   Cypher  Query  Language  (CQL)  

Cypher  TransacBonal  HTTP  Endpoint  POST  http://localhost:7474/db/data/transaction/commit  

GET  http://localhost:7474/db/data  

HTTP  InteracBons  

①  Install  Postman  Chrome  Plug:  h3p://bit.ly/1NooOJr  (or  similar)  

②  Set  AuthorizaBon  Header  (HTTP  Basic)  

③  Issue  GET  http://localhost:7474/db/data  and  follow  explore  links  

④  Explore  links  (e.g.  GET  http://localhost:7474/db/data/relationship/types)  

⑤  Query  via  HTTP  TransacBonal  Endpoint:    

POST  http://localhost:7474/db/data/transaction/commit  Accept:  application/json  Content-­‐Type:  application/json  {      "statements"  :  [  {          "statement"  :  "MATCH  (p:Practitioner)  WHERE  p.name={name}  RETURN  p",          "parameters"  :  {              "name"  :  "Zachary  Smith"          }      }  ]  }  

TesBng  

OpBons:  

•   Manual  tesBng  via  REST  Clients  

•   Unit  TesBng  via  Framework  (e.g.  JUnit)  

•   FuncBonal  TesBng  via  Framework  (e.g.  RobotFramework  or  SoapUI)  

①  Requires  -­‐  h3ps://github.com/jtdeane/graph-­‐workshop    

②  Navigate  to  $HOME/$WORKSHOP_HOME/testing  

③  To  execute  tests  enter  mvn  test  

④  OpBonally  update  Java  to  output  results  to  console  

⑤  Re-­‐execute  tests  enter  mvn  test  

Architecture  

h3p://bit.ly/1Ja2npT  

Graph  Database  –  Architecture  

Language  APIs  

Caches  

Files  

HA  Support  Logging  

Plug-­‐ins  and  Extensions  

Neo4j  

Java  RunBme  Environment  

Community  &  Enterprise  EdiBon  

Community  is  GPLv3  

Enterprise  EdiBon  relaxes  Consistency  (ACID)  

$NEO4J_HOME    

Graph  Database  –  Server  Modes  

Java  RunBme  Environment  

Server  Libraries  

Embedded  Neo4j  

ApplicaBon  

Embedded  Web  Server  

Java  RunBme  Environment  

Server  Libraries  

Neo4j  Server  

Extensions  &  Plug-­‐ins  

External  ApplicaBon  (Client)  

Graph  Database  –  Server  Extension  

①  Requires  -­‐  h3ps://github.com/jtdeane/graph-­‐workshop  

②  Navigate  to  $HOME/$WORKSHOP_HOME/extension  

③  Build  the  extension  JAR  -­‐-­‐  graph-­‐extension-­‐1.0.0.jar  

④  Copy  the  JAR  from  ../target  to  $NEO4J_HOME/plugins  

⑤  Register  the  extension  by  updaBng  $NEO4J_HOME/Conf  

⑥  Restart  Neo4j  

⑦  Using  REST  Browser  Client  (e.g.  Postman)  query  pracBBoners  

org.neo4j.server.thirdparty_jaxrs_classes=ws.cogito.graphs=/extensions/  

h3p://localhost:7474/extensions/directory/pracBBoners  

Deployment  Topologies  

Single  Community  Server  (Non-­‐Produc$on  Environments)  

Non-­‐Clustered  Community  Servers  –  Cold  Standby  

HA  Clustered  Enterprise  Servers  (Master-­‐Slave)  

Linux VM

<Java Runtime Environment>

Neo4j (Master)

Linux VM

<Java Runtime Environment>

Neo4j (Slave)

Linux VM

<Java Runtime Environment>

Neo4j (Slave)

Enterprise Edition High Availability

Read  Consistent  –  Write  Lock    

Read  Write  Consistent    

OperaBons  &  Security  

•   OperaBng  System  and  Server  Process  Monitoring  (e.g.  Zabbix)  

•   Log  Monitoring  and  AlerBng  (e.g.  Splunk  or  Logstash)  

•   Secure  CommunicaBons  via  SSL  

•   Use  HTTP  Basic  AuthenBcaBon  for  Console  and  REST  API  Access  

•   Web  Console  and  REST  API  are  on  the  same  Port  

•   HTTP  Basic  requires  HTTP/S  

•   Graph  Governance  is  up  to  you!  

Advanced  Concepts  

Bee  PollinaBon  -­‐  h3p://bit.ly/1HkAa2c  

Domain  Model  PracBBoner  

PaBent  

TREATED_B

Y  

WORKS_FOR   OrganizaBon  

LOCATION  

Caregiver  

WORK

S_FO

R  

Bulk  Loads  

Batch  API  (transacBonal)  -­‐  POST  http://localhost:7474/db/data/batch  

Batch  Inserter  (by-­‐pass  TransacBons)  –  Java  Only  

ImporBng  Comma  Separated  Values  (CSV)  

//load caregiver nodes LOAD CSV WITH HEADERS FROM "file:///YOUR_LOCATION/CaregiverNodes.csv" AS csvLine CREATE (g:Caregiver {name: csvLine.name, guardian: csvLine.guardian}) RETURN *

//load caregiver patient relationships LOAD CSV WITH HEADERS FROM "file:///YOUR_LOCATION/PatientRelationships.csv" AS csvLine MATCH (giver:Caregiver { name:(csvLine.giver)}), (patient:Patient { name:(csvLine.patient)}) CREATE (giver)-[:CARES_FOR { type:(csvLine.type) }]->(patient) RETURN *

//load caregiver organization relationships LOAD CSV WITH HEADERS FROM "file:///YOUR_LOCATION/OrganizationRelationships.csv" AS csvLine MATCH (giver:Caregiver { name:(csvLine.giver)}), (org:Organization { name:(csvLine.organization)}) CREATE (giver)-[:WORKS_FOR { type:(csvLine.status) }]->(org) RETURN *

Advanced.cql  

More  Graph  Queries  

//Find patients who are also a practitioners MATCH (m:Patient), (p:Practitioner) WHERE m.name=p.name RETURN p

//All paths to Lovee Johnson MATCH paths = (m:Patient)-[*]-(node) WHERE m.name="Lovee Johnson" RETURN paths

//Shortest path from Lovee Jonhnson to Florence Nightingale MATCH (m:Patient {name:"Lovee Johnson"}), (g:Caregiver {name:"Florence Nightingale"}), path = shortestPath((m)-[*..10]-(g)) RETURN path

//Patients with more than one practitioner MATCH (patient:Patient)-[:TREATED_BY]->(practitioner) WITH patient, count (practitioner) AS practitioners WHERE practitioners > 1 RETURN patient

//All patients with a PCP having a name ending in ‘y’ ( REGEX) MATCH (m:Patient)-[TREATED_BY {pcp:true}]->(p:Practitioner) WHERE p.name=~ ".*y" RETURN m,p

Java  1.7  Regex  -­‐  h3p://bit.ly/1LEvt3j  

//Return the patients with a family cargiver and their practitioners MATCH (g:Caregiver)-[CARES_FOR {type:"Family"}]->(m:Patient)-[TREATED_BY]->(p:Practitioner) RETURN m, p, g

Advanced.cql  

Traversals  

Depth-­‐first  search  (DFS)  –  Default  Neo4j  Behavior  

1  

2  

5   6  

3   4  

8  7  Breadth-­‐first  search  (BFS)  

1,2,5,6,3,4,7,8  

1,2,3,4,5,6,7,8  

Evaluators  –  e.g.  Maximum  Depth    

Filters  –  e.g.  Uniqueness  

Path  Expanders  –  e.g.  DirecBon  

•   REST  API  –  Executes  arbitrary  JavaScript  code  

•   Java  API  –  Require  in-­‐depth  knowledge  of  your  Graph  

Indexes  

AutomaBc  Indexing  -­‐  $NEO4J_HOME/conf/neo4j.properties  

# Enable auto-indexing for nodes, default is false. node_auto_indexing=true

# The node property keys to be auto-indexed, if enabled. node_keys_indexable=name

# Enable auto-indexing for relationships, default is false. relationship_auto_indexing=true

# The relationship property keys to be auto-indexed, if enabled. relationship_keys_indexable=pcp,type

Cypher  Index  Commands  

//create index on Patient Label CREATE INDEX ON :Patient(name)

//drop index on Patient Label DROP INDEX ON :Patient(name)

GET  http://localhost:7474/db/data/schema/index/Patient  

Advanced.cql  

Constraints  

Create  and  Drop  Constraints  

//create Unique Practitioner constraint CREATE CONSTRAINT ON (practitioner:Practitioner) ASSERT practitioner.name IS UNIQUE

//attempt to create duplicate Practitioner - should fail CREATE (McCoy:Practitioner {name:"Leonard McCoy", specialty:"General Medicine"})

//drop Unique Practitioner constraint DROP CONSTRAINT ON (practitioner:Practitioner) ASSERT practitioner.name IS UNIQUE

No  way  to  retrieve  list  of  Indexes  or  Constraints  via  Cypher  (yet)  

GET  http://localhost:7474/db/data/schema/constraint/Practitioner  

Advanced.cql  

VisualizaBon  

Neo4j  Web  Console  http://localhost:7474/browser  

Data  Driven  Documents  (D3.js)  http://d3js.org/  

Alchemy.js  http://bit.ly/1NwH7fB  

Linkurious.js  http://linkurio.us/toolkit/    

VivaGaph.js  https://github.com/anvaka/VivaGraphJS    

Boston  Hubway  Graph  -­‐By  Max  De  Marzi  

ExecuBon  from  Scripts  <script>  or  Node.JS    

Require  data  transformaBon  (e.g.  Nodes  and  RelaBonship  Arrays)  

QuesBons  &  Feedback  QuesBons  &  Feedback  

My  Contact  informaEon:  

Jeremy  Deane    Director  of  Sorware  Architecture  NaviNet  jeremy.deane@gmail.com  h3p://jeremydeane.net  

h3ps://github.com/jtdeane/graph-­‐workshop  

Supplemental    //Aggregate all providers MATCH (c:Caregiver) RETURN c.name AS names UNION MATCH (p:Practitioner) RETURN p.name AS names

Supplemental.cql  

//Practitioners with patient counts MATCH (m:Patient) -[:TREATED_BY]-> (p:Practitioner) WITH p, COUNT(m) AS patients RETURN p.name, patients

//Patients with provider counts (Practitioner and/or Care Giver) MATCH (m:Patient) -[:TREATED_BY|:CARES_FOR]- (r) WITH DISTINCT (m), COUNT(r) AS providers RETURN m.name, providers

//All Patients with Caregiver (and without = null) MATCH (m:Patient) OPTIONAL MATCH (m) <-[:CARES_FOR]- (c:Caregiver) RETURN m.name, COALESCE(c.name,"INDEPENDENT")

//Profile simple query PROFILE MATCH (p:Practitioner) WHERE p.name="Zachary Smith" RETURN p

//Profile complex query PROFILE MATCH (m:Patient), (p:Practitioner) WHERE m.name=p.name RETURN p