28
Connections to the Real World Graph Databases and Applications Achim Friedland <[email protected]>, Aperis GmbH 1st University-Industrial Meeting on Graph Databases - 7.-8. Feb.. 2011, Barcelona , Spain

1st UIM-GDB - Connections to the Real World

Embed Size (px)

Citation preview

Page 1: 1st UIM-GDB - Connections to the Real World

Connections to the Real WorldGraph Databases and Applications

Achim Friedland <[email protected]>, Aperis GmbH 1st University-Industrial Meeting on Graph Databases - 7.-8. Feb.. 2011, Barcelona , Spain

Page 2: 1st UIM-GDB - Connections to the Real World

2

Let’s change out point of view...

Page 3: 1st UIM-GDB - Connections to the Real World

3

Welcome on the customer side... ;)

www.graph-database.org

Page 4: 1st UIM-GDB - Connections to the Real World

4

The Graph Representation Problem

Adjacency matrix vs. Incidence matrix vs. Adjacency list vs. Edge list vs. Classes,

Index-based vs. Index-free Adjacency, Dense vs. Sparse graphs, On-disc vs. In-memory

graphs, All-Indexed vs. Specific-Index-Creation, directed vs. undirected edges,

hypergraphs?, hierarchical graphs?, dynamicgraphs?

• Different levels of expressivity• Sometimes very application specific• Hard to optimize a single one for every use-case

Page 5: 1st UIM-GDB - Connections to the Real World

The GraphDB Vendor Problem

5

• Multiple APIs from different vendors• Unknown internal graph representation• Unclear design goals• Community involvement?

Page 6: 1st UIM-GDB - Connections to the Real World

6

Step 1) Define a common API

Page 7: 1st UIM-GDB - Connections to the Real World

The Property-Graph Model

• directed:• attributed:• edge-labeled:• multi-graph:

The most common graph model withinthe NoSQL GraphDB space

Each edge has a source and destination vertexVertices and edges carry key/value pairsThe label denotes the type of relationshipMultiple edges between any two vertices allowed

7

Id: 1name: Alice

age: 21

vertex properties

Id: 2name: Bob

age: 23since: 2009/09/21

edge properties

Friends

edge label

Page 8: 1st UIM-GDB - Connections to the Real World

8

• Vertex type vs. vertex interfaces?• Edge label/type vs. edge interfaces?• Vertex<->Edge constraints?• Extension: Undirected Edges?• Extension: Hyperedges?• Extension: Semantic graphs?• Extension: Dynamic graphs?

Property-Graph Constraints?

Page 9: 1st UIM-GDB - Connections to the Real World

9

// Use a class-based in-memory graphvar graph = new InMemoryGraph();

var v1 = graph.AddVertex(new VertexId(1));var v2 = graph.AddVertex(new VertexId(2));v1.SetProperty("name", "Alice");v1.SetProperty("age" , 21);v2.SetProperty("name", "Bob");v2.SetProperty("age" , 23);

var e1 = graph.AddEdge(v1, v2, new EdgeId(1), "Friends");e1.SetProperty(“since”, ”2009/09/21”);

A Property Graph Model Interface for Java and .NET

structured data (XML, JSON)

Page 10: 1st UIM-GDB - Connections to the Real World

10

• Strings• Integers• DataTime?• byte[]?• structured data like XML/JSON?• List<...>• ...

Supported datatypes?

Page 11: 1st UIM-GDB - Connections to the Real World

11

Step 2) Declarative ways for querying

Page 12: 1st UIM-GDB - Connections to the Real World

Querying a Graph Database

12

• Programmatic / API• From any programming language, Pipes, ...• Synchronous or Asynchronous • Allow bypassing all optimizations• Do not try to be smarter than the application

developer

• Ad hoc / Explorative• Gremlin aka. “high-level pipes”?• sones GQL, OrientDB QL aka. “SQL style”?• Pattern matching aka. “SPARQL style”?• Easy embedding of domain specific query languages?

Page 13: 1st UIM-GDB - Connections to the Real World

13

A data flow framework for property graph models

ISideEffectPipe<in S, out E, out T>S ESource

ElementsEmitted

ElementsTSide Effect

: IEnumerator<E>, IEnumerable<E>

Page 14: 1st UIM-GDB - Connections to the Real World

Pipeline<S, E>

14

pipe1<S,A> pipe2<B,C> pipe3<C,E>

SSource

Elements

EEmitted

Elements

Create complex pipes by combining pipes to pipelines

Page 15: 1st UIM-GDB - Connections to the Real World

15

// Friends-of-a-friendvar pipe1 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);var pipe2 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);var pipe3 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);var pipe4 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);var pipe5 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);var pipe6 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);var pipe7 = new PropertyPipe("name");

var pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6,pipe7);pipeline.SetSource(new SingleEnumerator( graph.GetVertex(new VertexId(1))));

g:id-v(1)/outE[@label='Friends']/inV/outE[@label='Friends']/inV/@name

A “perl”-style Ad Hoc query language for graphs

Page 16: 1st UIM-GDB - Connections to the Real World

16

// Friends-of-a-friendvar pipe1 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);var pipe2 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);var pipe3 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);var pipe4 = new VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);var pipe5 = new LabelFilterPipe("Friends", ComparisonFilter.EQUALS);var pipe6 = new EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);var pipe7 = new PropertyPipe("name");

var pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6,pipe7);pipeline.SetSource(new SingleEnumerator( graph.GetVertex(new VertexId(1))));

From User u SELECT u.Friends.Friends.nameWHERE u.Id = 1

sones GQL

A “SQL”-style Ad Hoc query language for graphs

Page 17: 1st UIM-GDB - Connections to the Real World

17

Step 3) Query result formats

Page 18: 1st UIM-GDB - Connections to the Real World

Query Result Formats

18

• Graphs• QR may be queried over and over again• QR may be stored/cached as a graph• But again: (Too) may graph representations available

• Other data structures• If result is just a list, why converting it to a graph?• Simple for programming languages• Much more complicated for Query Languages

Page 19: 1st UIM-GDB - Connections to the Real World

19

• Reduced 2-tier architecture (GraphDB -> Client)• Higher performance

• Avoids relational architecture anti-patterns

• Link-aware, self-describing hypermedia (see Neo4J)

• e.g. ATOM, XML + XLINK, RDFa

• User-defined/application specific protocols• E.g. serve HTML/GEXF directly (see CouchDB)

• Allows to create powerful embedded applications

Query Result Formats

Page 20: 1st UIM-GDB - Connections to the Real World

20

Step 4) Accessing remote graphs

Page 21: 1st UIM-GDB - Connections to the Real World

21

• rexster server• Exposes a graph via HTTP/REST• Vertices and edges are REST resources• Neo4J, OrientDB are available,

InfiniteGraph announced

• rexster client• Accessing remote graphs

A HTTP/REST interface for property graphs

Page 22: 1st UIM-GDB - Connections to the Real World

22

Common CRUD operations...

Page 23: 1st UIM-GDB - Connections to the Real World

23

Common CRUD operations...

Page 24: 1st UIM-GDB - Connections to the Real World

24

What about other HTTP verbs?

• PATCH for applying small changes?• NEIGHBORS?• EXPLORE (more neighbors...)• SHORTESTPATH• CENTRALITY

Page 25: 1st UIM-GDB - Connections to the Real World

25

Default resource representation: JSON

curl -H Accept:application/json http://localhost:8182/graph1/vertices/1{ "version" : "0.1", "results" : { "_type" : "vertex", "_id" : "1", "name" : "Alice", "age" : 21 }, "query_time" : 0.014235 }

Page 26: 1st UIM-GDB - Connections to the Real World

26

• HTTP caching support?• HTTP Authentication support?• Conditional PUT/POST requests?

Advanced HTTP/REST concepts

Page 27: 1st UIM-GDB - Connections to the Real World

27

The GraphDB Graph...

Neo4J for GIS

InfoGrid for WebApps In-Memory for Caching

OrientDB for Documents

OrientDB for Ad Hoc

ThinkerGraph & Gremlin for Ad Hoc

Neo4J for HA

InfiniteGraph for Clustering

Page 28: 1st UIM-GDB - Connections to the Real World

28

Questions?

http://www.graph-database.orghttp://www.twitter.com/graphdbs