Upload
jun-zhao
View
202
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Query-generation-for-provo-data for provAnalytics 2014 at Provenance Week: http://provenanceweek.org/2014/analytics/
Citation preview
Towards Query Generation for PROV-O Data
Jun Zhao1, HongHanWu2 and Jeff Z. Pan2
1Lancaster University@junszhao | j.zhao5 at lancaster.ac.uk
2University of Aberdeen
honghan.wu | jeff.z.pan at abdn.ac.uk
Outline
• Motivation• Profile-driven query generation
– K-Drive– ProvQ
• Result discussion• Future work
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
The Big Picture of PROV: A Motivation Scenario
Adapted from: http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
Provenance information
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
Provenance in the Wild v.s. ProvBench
Taverna-PROV
Vistrails PROV
Wings PROV
Wikipedia-PROV
Twitter-PROV
OBIAMA (social
simulation)
Workflow / scientific domain
• 11 repositories so far• Various representations• Cross different domains• Openly accessible under
different open licenses
Web resources
Social domain
https://github.com/provbenchhttps://sites.google.com/site/provbench/home
Next Step: Access PROV Datasets
Taverna-PROV
Vistrails PROV
Wings PROV
Wikipedia-PROV
Twitter-PROV
OBIAMA (social
simulation)
Can we query across them?
Can we learn something by
querying across them?
What can we do with them?
……
Query Generation: A Bottom-up Approach
Taverna-PROV
Wings PROV
Wikipedia-PROV
OBIAMA (social
simulation)
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries for PROV-O
datasets
Example profiles:• Class associations• Property
associations
Query Generation: A First Step
APROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries for the PROV-O
dataset
Example profiles:• Class associations• Property
associations
Big City:
Big Road:
Slide credit: Dr Wu at Scottish Linked Data Workshop 2014http://www.kdrive-project.eu EU FP7 Marie-Curie 286348Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116
• University of Aberdeen• A generic query generation
tool for semantic web data• Find key sub-graphs in the
RDF data– Big City: The most
instantialised concepts in the data
– Big Road: The most frequent relations connecting those big cities
K-Drive Query Generation
K-Drive Generator
Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
SELECT ?Generation ?x4_1 ?x3_1 ?x0_1
WHERE {
?Generation rdf:type <http://www.w3.org/ns/prov#Generation>.
?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 .
?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 .
?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation .}
K-Drive Generator
ProvQ: Property Association Mining
APROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries for the PROV-O
dataset
Discover properties that are used together with each PROV-O properties
Expand a set of “seed” PROV-O queries using the discovered associating properties
https://github.com/junszhao/ProvQ
ProvQ: Property Association Mining
• Advantages– Reduce the performance challenge usually faced
in association rule mining– Produce provenance-centric queries
• Disadvantages– Could miss queries that are not related to PROV-O
terms at all
Expanding Starting Queries
Approach Walk-Through
• Given a seed atomic query,
we have seed property: • We find all properties used together with
– http://purl.org/wf4ever/wfprov#describedByParameter – http://purl.org/wf4ever/wfprov#wasOutputFrom – http://www.w3.org/ns/prov#qualifiedGeneration
• Return resulting conjunctive SPARQL query
Results Comparison
• K-Drive Generator– 7 Queries– 3 of them are not exactly
provenance queries– Probably easier to
understand because classes are included in the queries
– But queries can be complex
• ProvQ– 7 Queries– 1 not returned by K-Drive
(prov:wasDerivedFrom)– Only provenance queries
are returned– Queries are simple, based
on properties associations starting from “seed” PROV-O properties
https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
Future Work
• Define and evaluate usefulness• Test against more datasets• Experiment with reasoning• Query generation across multiple datasets
Thank you!
These slides have been created by Jun Zhao
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unportedhttp://creativecommons.org/licenses/by-nc-sa/3.0/