25
Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Prov4J: A Semantic Web Framework for Generic Provenance Management André Freitas, Arnaud Legendre, Sean O’Riain, Edward Curry

Prov4J: A Semantic Web Framework for Generic Provenance Management

Embed Size (px)

DESCRIPTION

Prov4J: A Semantic Web Framework for Generic Provenance Management André Freitas, Arnaud Legendre, Sean O’Riain, Edward Currypaper: http://andrefreitas.org/papers/Prov4J%20A%20Semantic%20Web%20Framework%20for%20Generic%20Provenance%20Management.pdf

Citation preview

Page 1: Prov4J: A Semantic Web Framework for Generic Provenance Management

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Prov4J: A Semantic Web Framework for Generic

Provenance Management André Freitas, Arnaud Legendre, Sean O’Riain, Edward

Curry

Page 2: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Outline

Motivation. Generic provenance management on the

Web. Prov4J:

Capture Representation Consumption Deployment

Page 3: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Motivation: Data on the Web

Accelerated by the adoption and uptake of Linked Data. Paradigm shift:

Change in the way information is consumed on the Web. Main Issue:

Quality Assessment & Trustworthiness.

Page 4: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Motivation

Provenance as a cornerstone element for quality assessment.

Expansion of the application of provenance into different domains and types of systems

Generic applications generating or consuming data on the Web need to become provenance-aware.

Provenance-aware: Ability to capture, represent and consume provenance information associated with the data.

Page 5: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Generic Provenance Management

Provenance management for this larger audience.

Covers the set of the most frequent requirements for provenance capture and consumption on the Web.

Page 6: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Generic Provenance Management

Provenance for the Masses

Page 7: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Research Questions

Are Semantic Web standards and tools appropriate for capturing, representing and consuming provenance on the Web?

What are the key software engineering aspects which need to be employed to reduce the barriers for the construction of provenance-aware applications?

Page 8: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Research Goals

Answer these questions.

Provide a Generic Provenance Management Framework for the Web.

Make it available for experimentation by the community.

Page 9: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Main Components

Provenance Representation Provenance Consumption Provenance Capture

W3PProv4J

Page 10: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

W3P

Lightweight provenance ontology for the Web. Focused on provenance for data quality assessment. Designed to be compatible with the Open Provenance

Model. Dimensions: Workflow, Publishing and Social Provenance. Building W3P:

Use cases; Data quality dimensions; Literature review; Requirements; Core provenance concepts; Use and refinement;

Page 11: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

W3P: Classes & Properties (excerpt)

Core Workflow Model

Page 12: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Building Prov4J

Core requirements for a generic provenance management framework. Capture Consumption

Provenance architecture. Core software engineering aspects for capturing

provenance. Deployment in a real world scenario. Core requirements coverage analysis.

Page 13: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Core Requirements

Provenance capture: Minimum number of software adaptations Low impact on performance Expressive interface Scalability Structured provenance data Publication of provenance data

Provenance consumption: Query expressivity Query performance & scalability Provenance discovery Mapping from different provenance models Usability

Page 14: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Core Requirements (cont’d)

Common requirements: User data representation independency Separation of concerns Reliable provenance storage Basic system administration support Security

Page 15: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

High-Level Architecture

Page 16: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Consumption: Components

Page 17: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Consumption: Query Types

Query Types SPARQL based queries Queries supported by reasoning Path queries Navigational queries Similarity queries

Query Type Distribution (API) 33% used transitivity 9% used rules reasoning 9% used path features 20% used SPARQL extensions 30% pure SPARQL 4% similarity

Page 18: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Capture: Software Engineering Principles

Aspect Oriented Programming & Annotations. Pushback capture. Minimization of Adaptations.

Context-based provenance construction. Provenance URIs.

Page 19: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Capture: Adaptations

Page 20: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Capture: Logging & Storage

Page 21: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Scenario

Page 22: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Core Requirements Coverage

Page 23: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Summary

Semantic Web standards and tools played a fundamental role in the construction of the framework.

Query expressivity over original SPARQL was improved.

Transitivity, path queries proved to be very important features.

Framework is usable in a realistic scenario. High coverage of core requirements. Available for download from early November/2010.

Page 24: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

Future Work

Evaluation of query expressivity and performance.

W3C Prov-XG requirements coverage analysis. Improvement of the coverage of the core

requirements.

Page 25: Prov4J: A Semantic Web Framework for Generic Provenance Management

Digital Enterprise Research Institute www.deri.ie

http://prov4j.org