Prov4J: A Semantic Web Framework for Generic Provenance Management

Preview:

DESCRIPTION

Prov4J: A Semantic Web Framework for Generic Provenance Management André Freitas, Arnaud Legendre, Sean O’Riain, Edward Currypaper: http://andrefreitas.org/papers/Prov4J%20A%20Semantic%20Web%20Framework%20for%20Generic%20Provenance%20Management.pdf

Citation preview

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Prov4J: A Semantic Web Framework for Generic

Provenance Management André Freitas, Arnaud Legendre, Sean O’Riain, Edward

Curry

Digital Enterprise Research Institute www.deri.ie

Outline

Motivation. Generic provenance management on the

Web. Prov4J:

Capture Representation Consumption Deployment

Digital Enterprise Research Institute www.deri.ie

Motivation: Data on the Web

Accelerated by the adoption and uptake of Linked Data. Paradigm shift:

Change in the way information is consumed on the Web. Main Issue:

Quality Assessment & Trustworthiness.

Digital Enterprise Research Institute www.deri.ie

Motivation

Provenance as a cornerstone element for quality assessment.

Expansion of the application of provenance into different domains and types of systems

Generic applications generating or consuming data on the Web need to become provenance-aware.

Provenance-aware: Ability to capture, represent and consume provenance information associated with the data.

Digital Enterprise Research Institute www.deri.ie

Generic Provenance Management

Provenance management for this larger audience.

Covers the set of the most frequent requirements for provenance capture and consumption on the Web.

Digital Enterprise Research Institute www.deri.ie

Generic Provenance Management

Provenance for the Masses

Digital Enterprise Research Institute www.deri.ie

Research Questions

Are Semantic Web standards and tools appropriate for capturing, representing and consuming provenance on the Web?

What are the key software engineering aspects which need to be employed to reduce the barriers for the construction of provenance-aware applications?

Digital Enterprise Research Institute www.deri.ie

Research Goals

Answer these questions.

Provide a Generic Provenance Management Framework for the Web.

Make it available for experimentation by the community.

Digital Enterprise Research Institute www.deri.ie

Main Components

Provenance Representation Provenance Consumption Provenance Capture

W3PProv4J

Digital Enterprise Research Institute www.deri.ie

W3P

Lightweight provenance ontology for the Web. Focused on provenance for data quality assessment. Designed to be compatible with the Open Provenance

Model. Dimensions: Workflow, Publishing and Social Provenance. Building W3P:

Use cases; Data quality dimensions; Literature review; Requirements; Core provenance concepts; Use and refinement;

Digital Enterprise Research Institute www.deri.ie

W3P: Classes & Properties (excerpt)

Core Workflow Model

Digital Enterprise Research Institute www.deri.ie

Building Prov4J

Core requirements for a generic provenance management framework. Capture Consumption

Provenance architecture. Core software engineering aspects for capturing

provenance. Deployment in a real world scenario. Core requirements coverage analysis.

Digital Enterprise Research Institute www.deri.ie

Core Requirements

Provenance capture: Minimum number of software adaptations Low impact on performance Expressive interface Scalability Structured provenance data Publication of provenance data

Provenance consumption: Query expressivity Query performance & scalability Provenance discovery Mapping from different provenance models Usability

Digital Enterprise Research Institute www.deri.ie

Core Requirements (cont’d)

Common requirements: User data representation independency Separation of concerns Reliable provenance storage Basic system administration support Security

Digital Enterprise Research Institute www.deri.ie

High-Level Architecture

Digital Enterprise Research Institute www.deri.ie

Consumption: Components

Digital Enterprise Research Institute www.deri.ie

Consumption: Query Types

Query Types SPARQL based queries Queries supported by reasoning Path queries Navigational queries Similarity queries

Query Type Distribution (API) 33% used transitivity 9% used rules reasoning 9% used path features 20% used SPARQL extensions 30% pure SPARQL 4% similarity

Digital Enterprise Research Institute www.deri.ie

Capture: Software Engineering Principles

Aspect Oriented Programming & Annotations. Pushback capture. Minimization of Adaptations.

Context-based provenance construction. Provenance URIs.

Digital Enterprise Research Institute www.deri.ie

Capture: Adaptations

Digital Enterprise Research Institute www.deri.ie

Capture: Logging & Storage

Digital Enterprise Research Institute www.deri.ie

Scenario

Digital Enterprise Research Institute www.deri.ie

Core Requirements Coverage

Digital Enterprise Research Institute www.deri.ie

Summary

Semantic Web standards and tools played a fundamental role in the construction of the framework.

Query expressivity over original SPARQL was improved.

Transitivity, path queries proved to be very important features.

Framework is usable in a realistic scenario. High coverage of core requirements. Available for download from early November/2010.

Digital Enterprise Research Institute www.deri.ie

Future Work

Evaluation of query expressivity and performance.

W3C Prov-XG requirements coverage analysis. Improvement of the coverage of the core

requirements.

Digital Enterprise Research Institute www.deri.ie

http://prov4j.org

Recommended