Query-generation-for-provo-data-201406

Preview:

DESCRIPTION

Query-generation-for-provo-data for provAnalytics 2014 at Provenance Week: http://provenanceweek.org/2014/analytics/

Citation preview

Towards Query Generation for PROV-O Data

Jun Zhao1, HongHanWu2 and Jeff Z. Pan2

1Lancaster University@junszhao | j.zhao5 at lancaster.ac.uk

2University of Aberdeen

honghan.wu | jeff.z.pan at abdn.ac.uk

Outline

• Motivation• Profile-driven query generation

– K-Drive– ProvQ

• Result discussion• Future work

The Big Picture of PROV: A Motivation Scenario

http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png

The Big Picture of PROV: A Motivation Scenario

Adapted from: http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png

Provenance information

The Big Picture of PROV: A Motivation Scenario

http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png

Provenance in the Wild v.s. ProvBench

Taverna-PROV

Vistrails PROV

Wings PROV

Wikipedia-PROV

Twitter-PROV

OBIAMA (social

simulation)

Workflow / scientific domain

• 11 repositories so far• Various representations• Cross different domains• Openly accessible under

different open licenses

Web resources

Social domain

https://github.com/provbenchhttps://sites.google.com/site/provbench/home

Next Step: Access PROV Datasets

Taverna-PROV

Vistrails PROV

Wings PROV

Wikipedia-PROV

Twitter-PROV

OBIAMA (social

simulation)

Can we query across them?

Can we learn something by

querying across them?

What can we do with them?

……

Query Generation: A Bottom-up Approach

Taverna-PROV

Wings PROV

Wikipedia-PROV

OBIAMA (social

simulation)

Provenance Data Profile Generator

Provenance Query Builder

SPARQL queries for PROV-O

datasets

Example profiles:• Class associations• Property

associations

Query Generation: A First Step

APROV

Dataset

Provenance Data Profile Generator

Provenance Query Builder

SPARQL queries for the PROV-O

dataset

Example profiles:• Class associations• Property

associations

Big City:

Big Road:

Slide credit: Dr Wu at Scottish Linked Data Workshop 2014http://www.kdrive-project.eu EU FP7 Marie-Curie 286348Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116

• University of Aberdeen• A generic query generation

tool for semantic web data• Find key sub-graphs in the

RDF data– Big City: The most

instantialised concepts in the data

– Big Road: The most frequent relations connecting those big cities

K-Drive Query Generation

K-Drive Generator

Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html

Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html

SELECT ?Generation ?x4_1 ?x3_1 ?x0_1

WHERE {

?Generation rdf:type <http://www.w3.org/ns/prov#Generation>.

?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 .

?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 .

?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation .}

K-Drive Generator

ProvQ: Property Association Mining

APROV

Dataset

Provenance Data Profile Generator

Provenance Query Builder

SPARQL queries for the PROV-O

dataset

Discover properties that are used together with each PROV-O properties

Expand a set of “seed” PROV-O queries using the discovered associating properties

https://github.com/junszhao/ProvQ

ProvQ: Property Association Mining

• Advantages– Reduce the performance challenge usually faced

in association rule mining– Produce provenance-centric queries

• Disadvantages– Could miss queries that are not related to PROV-O

terms at all

Expanding Starting Queries

Approach Walk-Through

• Given a seed atomic query,

we have seed property: • We find all properties used together with

– http://purl.org/wf4ever/wfprov#describedByParameter – http://purl.org/wf4ever/wfprov#wasOutputFrom – http://www.w3.org/ns/prov#qualifiedGeneration

• Return resulting conjunctive SPARQL query

Results Comparison

• K-Drive Generator– 7 Queries– 3 of them are not exactly

provenance queries– Probably easier to

understand because classes are included in the queries

– But queries can be complex

• ProvQ– 7 Queries– 1 not returned by K-Drive

(prov:wasDerivedFrom)– Only provenance queries

are returned– Queries are simple, based

on properties associations starting from “seed” PROV-O properties

https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt

Future Work

• Define and evaluate usefulness• Test against more datasets• Experiment with reasoning• Query generation across multiple datasets

Thank you!

These slides have been created by Jun Zhao

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unportedhttp://creativecommons.org/licenses/by-nc-sa/3.0/