34
Y. Roussakis 1 , I. Chrysakis 1 , K. Stefanidis 1 , G. Flouris 1 , and Y. Stavrakas 2 1: Institute of Computer Science, FORTH, Heraklion, Greece 2: Institute for the Management of Information Systems, ATHENA, Athens, Greece A Flexible Framework for Understanding the Dynamics of Evolving RDF Datasets

A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, andY. Stavrakas2

1: Institute of Computer Science, FORTH, Heraklion, Greece2: Institute for the Management of Information Systems, ATHENA, Athens, Greece

A Flexible Framework for Understanding the Dynamics

of Evolving RDF Datasets

Page 2: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Introduction

• LOD datasets are constantly evolving– Inclusion of new experimental evidence– Correction of erroneous conceptualizations– Other observations

• Detection and analysis of the differences (deltas) between datasets is crucial – Synchronization of autonomously developed versions– Visualization of the evolution history of a dataset – Synchronization of interconnected LOD datasets

2

Page 3: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Introduction

3

Page 4: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Introduction

3

Questions to address:1. Which changes?2. How to detect?

3. How to represent?

Page 5: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Contributions

• A framework for detecting and analyzing changes and the evolution history of LOD datasets

• Key Features✔Change Definition

• Different change types w.r.t. the domain of expertise✔Change Detection

• Efficient change detection algorithm✔Change Representation

• Based on ontologies• Allows queries spanning multiple versions and consider both

changes and data

5

Page 6: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

D1 D2

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Motivation

6

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Page 7: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

D1 D2

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Mikel_Lasa

team

type

Athlete

name

“Mikel Lasa”

Motivation

6

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Page 8: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

D1 D2

Low-level changes

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Mikel_Lasa

team

type

Athlete

name

“Mikel Lasa”

(Mikel_Lasa, team, Real_Madrid_CF)

(Mikel_Lasa, name, Mikel Lasa)

( (Mikel_Lasa, rdf:type, Athlete)

Motivation

6

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Page 9: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

D1 D2

Low-level changes Simple changes

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Mikel_Lasa

team

type

Athlete

name

“Mikel Lasa”

Add_Property_Instance(Mikel_Lasa, team, Real_Madrid_CF)

Add_Property_Instance (Mikel_Lasa, name, “Mikel Lasa”)

Add_Type_To_Individual(Mikel_Lasa, Athlete)

(Mikel_Lasa, team, Real_Madrid_CF)

(Mikel_Lasa, name, Mikel Lasa)

( (Mikel_Lasa, rdf:type, Athlete)

Motivation

6

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Page 10: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

D1 D2

Low-level changes Simple changes Complex changes

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Mikel_Lasa

team

type

Athlete

name

“Mikel Lasa”

Add_Property_Instance(Mikel_Lasa, team, Real_Madrid_CF)

Add_Property_Instance (Mikel_Lasa, name, “Mikel Lasa”)

Add_Type_To_Individual(Mikel_Lasa, Athlete)

Add_Athlete(“Mikel Lasa”,Real_Madrid_CF) (Mikel_Lasa, team, Real_Madrid_CF)

(Mikel_Lasa, name, Mikel Lasa)

( (Mikel_Lasa, rdf:type, Athlete)

Motivation

6

name

Real_Madrid

SoccerClub

Real_Madrid_CF

Page 11: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Simple vs Complex Changes

11

Simple Changes Complex ChangesTerminology Fixed, domain-independent

(e.g., Add_Property_Instance)Custom, domain-dependent (e.g., Add_Athlete)

Changes Set Fixed, pre-defined Variable, user-definedComposition Low-level changes Simple changes

Usability Fine-grained types of evolution Coarse-grained types of evolutionChanges

PartitioningPerfect, i.e., complete and unambiguous

Cannot capture all the evolution aspects

SC1

SC2

SC1

SC2

SC1

SC2

Page 12: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Detection - Approach

• Detection based on plain SPARQL queries – one per change

• Simple changes – Prebuilt-queries

• Complex changes – Queries constructed dynamically

upon change definition

• Query results create the change-set

12

ADD_SUPERCLASSSELECT ?sub,?sup WHERE {GRAPH <v2>{?sub rdfs:subClassOf ?sup } FILTER NOT EXISTS {GRAPH <v1> { ?sub rdfs:subClassOf ?sup } }}

ADD_ATHLETESELECT ?athlete WHERE {GRAPH <changes>{?sc a co:Add_Type_To_Individual.?sc co:atti_p1 ?athlete.?sc co:atti_p1 dbpedia:athlete.} }

Page 13: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Motivation

13

What can I do with all these changes ???

Page 14: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

• Interesting queriesReturn all the left backs,born before 1980, which were transferred to Athletic Bilbao between versions V1 and V2, and used to play for Real Madrid CF in any version

• Access to both the changes and the data is required, across multiple versions– Changes are first-class citizens

Change Representation - Motivation

14

Page 15: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Approach

15

Change

Data

Schema

Simple_ChangeComplex_Change

consumes

Page 16: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Approach

15

Change

Data

Schema

Simple_ChangeComplex_Change

Add_Property_Instance

Add_Type_To_Individual

consumes

Page 17: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Approach

15

Change

Data

Schema

Simple_ChangeComplex_Change

Add_Property_Instance

Add_Type_To_Individual

SELECT …

sparql_info

Add_Player

consumes

Page 18: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Approach

15

Change

Data

Schema

Simple_ChangeComplex_Change

Add_Property_Instance

Add_Type_To_Individual

SELECT …

sparql_info

Add_Player

consumes

D/changes/App1/schema

Page 19: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Approach

15

Change

Data

Schema

Simple_ChangeComplex_Change

Add_Property_Instance

Add_Type_To_Individual

SELECT …

sparql_info

Add_Player

consumes

AddPropIn1

D/changes/App1/schema

Page 20: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Approach

15

Change

Data

Schema

Simple_ChangeComplex_Change

consumes

Add_Property_Instance

Add_Type_To_Individual

SELECT …

sparql_info

Add_Player

consumes

AddPropIn1

D/changes/App1/schema

AddPlayer1

Page 21: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Approach

15

Change

Data

Schema

Simple_Change

D/changes/v1-v2

Complex_Change

consumes

Add_Property_Instance

Add_Type_To_Individual

SELECT …

sparql_info

Add_Player

consumes

AddPropIn1

D/changes/App1/schema

AddPlayer1

Page 22: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Approach

15

Change

Data

Schema

Simple_Change

D/changes/v1-v2

Complex_Change

consumes

Add_Property_Instance

Add_Type_To_Individual

SELECT …

sparql_info

Add_Player

consumes

AddPropIn1team

Real_Madrid_CF

“Mikel Lasa”

D/changes/App1/schema

AddPlayer1

Page 23: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Change Representation - Approach

15

Data

Change

Data

Schema

Simple_Change

D/changes/v1-v2

Complex_Change

consumes

Add_Property_Instance

Add_Type_To_Individual

SELECT …

sparql_info

Add_Player

consumes

AddPropIn1team

Real_Madrid_CF

“Mikel Lasa”

D/changes/App1/schema

AddPlayer1

Page 24: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Experimental Evaluation

• Purpose1. Identify the number and type of simple and complex changes

that usually occur in real-world settings2. Study the performance and quantify the effect of different

parameters of our change detection process

• Setting– Open source version of Virtuoso Universal Server hosted on a

machine which uses a 12-core Intel Xeon E5-2630 at 2.30GHz and 64GB of RAM dedicated for Virtuoso

– For the multi-threaded part of the queries execution and files ingestion, the experiments were conducted using 8 of the cores in parallel

24

Page 25: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Datasets

• We used 3 real RDF datasets of different sizes: – A subset of the English Dbpedia 6 (consisting of article

categories, instance types, labels and mapping-based properties) - http://dbpedia.org

– The Foundational Model of Anatomy ontology (FMA) -http://sig.biostr.washington.edu/projects/fm/AboutFM.html

– The Experimental Factor Ontology (EFO) -http://www.ebi.ac.uk/efo

25

Page 26: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Simple Changes

Simple Changes for all datasets

26

ADDITIONS DELETIONSADD_TYPE_CLASS DELETE_TYPE_CLASS

ADD_TYPE_PROPERTY DELETE_TYPE_PROPERTY

ADD_SUPERCLASS DELETE_SUPERCLASS

ADD_SUPERPROPERTY DELETE_SUPERPROPERTY

ADD_DOMAIN DELETE_DOMAIN

ADD_RANGE DELETE_RANGE

ADD_PROPERTY_INSTANCE DELETE_PROPERTY_INSTANCE

ADD_COMMENT DELETE_COMMENT

ADD_LABEL DELETE_LABEL

ADD_TYPE_TO_INDIVIDUAL DELETE_TYPE_FROM_INDIVIDUAL

Page 27: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Complex Changes per dataset

Complex Changes

27

DBpedia FMA EFOAdd_Subject Add_Concept Add_Definition

Delete_Subject Delete_Concept Delete_Definition

Add_Thing Add_Restriction Add_Synonym

Delete_Thing Delete_Restriction Delete_Synonym

Add_Athlete Add_Synonym Mark_as_Obsolete

Update_Label Update_Comment Update_Comment

Add_Place Update_Domain Update_Domain

Delete_Place Update_Range Update_Range

Add_Person Add_Observation Add_Observation

Delete_Person Delete_Observation Delete_Observation

Page 28: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

1. Simple Changes Analysis

28

539

137

333

29

6116

20

1008

210

564

103

9425

28

1108994

1195937

2693767

10135868

1110

126

252

75

98556

25

611

76

107

28

6110

17

156913

230542

4018781

5904209

51

256

5871

2

16

274

66

6297

6

1079243

1040740

1207457

3177835

17

81

9438

7

5864

151384

142863

1350035

1822612

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

EFO6

EFO5

EFO4

EFO3

EFO2

EFO1

FMA2

FMA1

DBp2

DBp1

ADD_COMMENT ADD_DOMAIN ADD_LABELADD_PROPERTY_INSTANCE ADD_RANGE ADD_SUPERCLASSADD_TYPE_CLASS ADD_TYPE_TO_INDIVIDUAL DELETE_COMMENTDELETE_DOMAIN DELETE_LABEL DELETE_PROPERTY_INSTANCEDELETE_RANGE DELETE_SUPERCLASS DELETE_TYPE_CLASSDELETE_TYPE_FROM_INDIVIDUAL

Page 29: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

1. Complex Change Analysis

29

144844

142132

482

66

209

1402

20

144791

142105

408562

2693767

2685039

526

144

355

38

8022

7

47170

46684

1087642

874318

1

70

11

4

142105

139348

1207457

1182412

1

15

204

1

6286

2

142132

139374

61

224

6

2

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

EFO6

EFO5

EFO4

EFO3

EFO2

EFO1

FMA2

FMA1

DBp2

DBp1

Add Athlete Add Concept Add Definition Add Observation Add Person

Add Place Add Restriction Add Subject Add Synonym Add Thing

Delete Definition Delete Observation Delete Person Delete Place Delete Restriction

Delete Subject Delete Synonym Delete Thing Mark as Obsolete Update Comment

Update Domain Update Label Update Property Update Range

Page 30: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

2. Simple Changes Performance

• Size of the compared versions • Type of the detected simple changes • Number of detected simple changes

# detected simple changes >> size of compared versions30

Page 31: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

2. Complex Changes Performance

• Changes ontology size • Number of detected complex changes• Type of detected complex changes• Number of consumed simple changes per detected

complex change more complex queries

31

Page 32: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

32

• Low-level change detections (e.g., [1])– Report simple add/delete operations– Not concise or intuitive enough to human users

• High-level change detections (e.g., [2,3])– Provide more human-readable deltas– Predefined list of considered changes w.r.t the context

• [3] the closest to our work– Fixed high-level changes is proposed – no configurable– No cross-version, data-and-changes queries – Significantly less scalable and efficient implementation

Related Work

1. D. Zeginis et al. On computing deltas of rdf/s knowledge bases. ACM Trans. Web, 20112. P. Plessers, et al. Understanding ontology evolution: A change detection approach. Web Semant., 20073. V. Papavasileiou et al. High-level change detection in RDF(S) KBs. ACM Trans. Database Syst., 2013

Page 33: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

ISWC 2015, Bethlehem PA

Summary

• We proposed an approach to cope with the dynamicity of web datasets that:– Manages fixed, application-generic changes with formal properties

(simple changes)– Allows defining customized changes at run-time (complex changes)– Offers a scalable generic detection mechanism (via sparql queries)

• We performed sophisticated analysis on top of the detected changes– We represent changes in an ontology and treat them as first-class

citizens– We consider cross-snapshot queries, and queries involving

evolution history and data

33

Page 34: A Flexible Framework for Understanding the Dynamics of ... · Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, and Y. Stavrakas2. 1: Institute of Computer Science, FORTH,

34

Q & A

• Demo– D2V: A Tool for Defining, Detecting and Visualizing Changes on

the Data Web. www.ics.forth.gr/isl/D2VSystem

• Acknowledgements