24
Graph processing room FOSDEM, 2 Feb 2013 Petra Selmer [email protected] http://www.dcs.bbk.ac.uk/~lselm01/ Flexible querying of graph data

Fosdem 2013 petra selmer flexible querying of graph data

Embed Size (px)

DESCRIPTION

These are the slides from a talk I presented at the Graph Processing room at FOSDEM 2013, in which I discussed my PhD topic: a query language allowing for the flexible querying of complex paths within graph structured data

Citation preview

Page 1: Fosdem 2013 petra selmer   flexible querying of graph data

Graph processing room

FOSDEM, 2 Feb 2013

Petra Selmer

[email protected]

http://www.dcs.bbk.ac.uk/~lselm01/

Flexible querying of graph data

Page 2: Fosdem 2013 petra selmer   flexible querying of graph data

Introduction

2

I shall be presenting my PhD topic which involves

a declarative query language allowing for the

flexible querying of graph-structured data with

complex paths.

Page 3: Fosdem 2013 petra selmer   flexible querying of graph data

Agenda

3

Who (am I)?

Why (the motivation)?

Some background info

What (is the query language and what

can it do)?

Illustrative examples

How (is it done)?

Page 4: Fosdem 2013 petra selmer   flexible querying of graph data

Who?

4

Petra Selmer

Part-time PhD student:

Birkbeck College, University of London

Prof. Alexandra Poulovassilis

Dr. Peter T. Wood

Software Architect:

University College London’s Institute of Neurology

(Wellcome Trust Centre for Neuroimaging)

Page 5: Fosdem 2013 petra selmer   flexible querying of graph data

Why?

5

Amount of graph-structured data is growing fast

The structure of this data is becoming more complex, especially when multiple, heterogeneous data sources are integrated together

The structure of the data is also always subject to change...

Page 6: Fosdem 2013 petra selmer   flexible querying of graph data

Why?

6

Users of such systems may not be familiar with the underlying data

structure: available paths etc

The user may not be able to obtain meaningful answers (or indeed,

any answers) from the data IF the querying system is limited to exact

matching of users’ queries

Also, the user may wish to explore the data by starting from a set of

initial answers and proceeding from there

The user may additionally wish to derive some intelligence from the

connections....

The user

The data

The query

Page 7: Fosdem 2013 petra selmer   flexible querying of graph data

Background: Ontologies

7

Currently part of the Semantic Web stack (Tim Berners-

Lee, RDF, triple stores)

Models a domain of interest: inferences, reasoning...

It can be thought of as a “schema” for graph data

The following inference rules are included (among

others):

Subclass: ‘History’, ‘Languages’ are subclasses of

‘Humanities’

Subproperty, Domain, Range...

Page 8: Fosdem 2013 petra selmer   flexible querying of graph data

What?

8

Data model: G = (V, E) Very general model V : vertices (or nodes); each labelled with some

constant E : directed, labelled edges; labels drawn from an

alphabet {Ʃ U ‘type’}

The query language is called Flex-It (it is declarative)

The basis is that of conjunctive regular path

queries There are two operators which may be applied to the

original query

Page 9: Fosdem 2013 petra selmer   flexible querying of graph data

What?

9

Conjunctive regular path queries:

This is where the graph's paths to be traversed are expressed with a

regular expression

A single regular path query conjunct: (X, R, Y)

X, Y: either constants or variables

R: the regular expression

“Conjunctive”: joining multiple conjuncts; e.g. (X, R1, Y), (Y,

R2, Z), (Z, R3, A)

The Y’s are matched, the Z’s are matched etc

N1 N2 N3 N4 n n p

1) (N1, n+, ?Y):

• Y = N2, N3

2) (N1, n*p, ?Y):

• Y = N4

Page 10: Fosdem 2013 petra selmer   flexible querying of graph data

What?

10

Approximation allows for the approximate matching

of labels in the path

An edit operation is applied to each edge label in

the path denoted by the regular expression:

Edit operations: insertions, deletions, inversions,

substitutions and transpositions of labels

Each operation has a ‘cost’: usually 1

Example: Query conjunct: (X, a*.b, Y)

R = a*.b [answers returned at cost 0]

R’ = p.a*.b (insertion of ‘p’) [answers returned at cost 1]

R’’ = p.a*.b- (inversion of ‘b’) [answers returned at cost 2]

Page 11: Fosdem 2013 petra selmer   flexible querying of graph data

What?

11

Relaxation is applied by using inference rules from an ontology (if one exists). Achieved by applying logical relaxation of the query

conditions using the data’s ontology definition Relaxation operations: subclass, subproperty, domain

and range Each operation has a ‘cost’ – usually 1

Example: We have an ontology: Humanities (superclass) Languages and History (subclasses of Humanities)

Assume our query states Languages may be relaxed Languages is relaxed to Humanities: Instances of Languages will be returned at cost 0 Instances of History will be returned at cost 1

Page 12: Fosdem 2013 petra selmer   flexible querying of graph data

What?

12

Answers are ranked according to how

closely they match the original query;

higher-cost answers have a lower ranking

All answers at a certain distance d are

ranked the same and returned before

answers at a higher distance

We allow for incremental execution: exact

answers returned first; then answers at

distance 1; ...

Page 13: Fosdem 2013 petra selmer   flexible querying of graph data

Example – ‘Lifelong learner metadata’

13

History

sc

Page 14: Fosdem 2013 petra selmer   flexible querying of graph data

14

History

sc

Page 15: Fosdem 2013 petra selmer   flexible querying of graph data

15

Query: “What work positions can I reach, having a degree in English”?

Y = the episode; Z = the job

(?Y, ?Z)

(?X, type, University),

(?X, qualif.type, EnglishStudies),

(?X, prereq+, ?Y),

(?Y, type, Work),

(?Y, job.type, ?Z)

Page 16: Fosdem 2013 petra selmer   flexible querying of graph data

16

Query: “What work positions can I reach, having a degree in English”?

Y = the episode; Z = the job

(?Y, ?Z)

(?X, type, University),

(?X, qualif.type, EnglishStudies),

(?X, prereq+, ?Y),

(?Y, type, Work),

(?Y, job.type, ?Z)

No results from User 2 will be returned...even though it is relevant!

Page 17: Fosdem 2013 petra selmer   flexible querying of graph data

17

Allowing query approximation can yield some answers:

Replacing the edge label prereq by next, at an edit cost of 1, we get this variant of the

query:

(?Y, ?Z)

(?X, type, University),

(?X, qualif.type, EnglishStudies),

APPROX(?X, prereq+, ?Y),

(?Y, type, Work),

(?Y, job.type, ?Z)

prereq+ can be approximated by next.prereq* at edit distance 1:

Result: Y = ep22, Z = AirTravelAssistant

Page 18: Fosdem 2013 petra selmer   flexible querying of graph data

18

Allowing query approximation can yield some answers:

Replacing the edge label prereq by next, at an edit cost of 1, we get this variant of the query: (?Y, ?Z)

(?X, type, University),

(?X, qualif.type, EnglishStudies),

APPROX(?X, prereq+, ?Y),

(?Y, type, Work),

(?Y, job.type, ?Z)

next.prereq* can be approximated by next.next.prereq*, now at edit distance 2: Results:

Y = ep23, Z = Journalist

Y = ep24, Z = AssistantEditor

Page 19: Fosdem 2013 petra selmer   flexible querying of graph data

19

History

sc

Page 20: Fosdem 2013 petra selmer   flexible querying of graph data

20

Query: “What jobs are open to me if I study English, or something similar, at University”?

(?Y, ?Z)

(?X, type, University), (?X, qualif, ?D),

RELAX (?D, type, EnglishStudies),

APPROX (?X, prereq+, ?Y),

(?Y, type, Work), (?Y, job.type, ?Z)

In addition to the answers (from User 2) obtained by the previous query, we now also have

answers from the timeline of User 3

prereq+ can be approximated by next.prereq* (distance 1) and EnglishStudies can be relaxed

– via Languages - to Humanities (distance 2), encompassing History

Result: Y = ep32, Z = PersonalAssistant (distance of 3 from original query)

Page 21: Fosdem 2013 petra selmer   flexible querying of graph data

21

Query: “What jobs are open to me if I study English, or something similar, at University”?

(?Y, ?Z)

(?X, type, University), (?X, qualif, ?D),

RELAX (?D, type, EnglishStudies),

APPROX (?X, prereq+, ?Y),

(?Y, type, Work), (?Y, job.type, ?Z)

next.prereq* can be approximated by next.next.prereq* (distance 2), with EnglishStudies again relaxed to Humanities (distance 2)

Results: (both at distance 4 from the original query)

Y = ep33, Z = Author

Y = e34, Z = AssociateEditor

Page 22: Fosdem 2013 petra selmer   flexible querying of graph data

How?

22

Theory

Construction of a weighted non-deterministic finite

automaton (NFA) to represent the regular expression

We apply new states and transitions to the NFA to represent the

approximation and relaxation operations

Formation of a product automaton: NFA with data

graph G

We perform a lowest cost path traversal of the product

automaton; construct query tree, do joins etc

Polynomial time complexity

Correctness of algorithms proven

Page 23: Fosdem 2013 petra selmer   flexible querying of graph data

How?

23

Implementation of prototype

Graph database: DEX (http://www.sparsity-

technologies.com/dex)

Programming language: C#

Further work

New flexible operation combining APPROX and

RELAX FLEX

Optimisation!

Page 24: Fosdem 2013 petra selmer   flexible querying of graph data

24

Any questions?

Thank you for your attention!

[email protected]