Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Y. Roussakis1, I. Chrysakis1, K. Stefanidis1, G. Flouris1, andY. Stavrakas2
1: Institute of Computer Science, FORTH, Heraklion, Greece2: Institute for the Management of Information Systems, ATHENA, Athens, Greece
A Flexible Framework for Understanding the Dynamics
of Evolving RDF Datasets
ISWC 2015, Bethlehem PA
Introduction
• LOD datasets are constantly evolving– Inclusion of new experimental evidence– Correction of erroneous conceptualizations– Other observations
• Detection and analysis of the differences (deltas) between datasets is crucial – Synchronization of autonomously developed versions– Visualization of the evolution history of a dataset – Synchronization of interconnected LOD datasets
2
ISWC 2015, Bethlehem PA
Introduction
3
ISWC 2015, Bethlehem PA
Introduction
3
Questions to address:1. Which changes?2. How to detect?
3. How to represent?
ISWC 2015, Bethlehem PA
Contributions
• A framework for detecting and analyzing changes and the evolution history of LOD datasets
• Key Features✔Change Definition
• Different change types w.r.t. the domain of expertise✔Change Detection
• Efficient change detection algorithm✔Change Representation
• Based on ontologies• Allows queries spanning multiple versions and consider both
changes and data
5
ISWC 2015, Bethlehem PA
D1 D2
name
Real_Madrid
SoccerClub
Real_Madrid_CF
Motivation
6
name
Real_Madrid
SoccerClub
Real_Madrid_CF
ISWC 2015, Bethlehem PA
D1 D2
name
Real_Madrid
SoccerClub
Real_Madrid_CF
Mikel_Lasa
team
type
Athlete
name
“Mikel Lasa”
Motivation
6
name
Real_Madrid
SoccerClub
Real_Madrid_CF
ISWC 2015, Bethlehem PA
D1 D2
Low-level changes
name
Real_Madrid
SoccerClub
Real_Madrid_CF
Mikel_Lasa
team
type
Athlete
name
“Mikel Lasa”
(Mikel_Lasa, team, Real_Madrid_CF)
(Mikel_Lasa, name, Mikel Lasa)
( (Mikel_Lasa, rdf:type, Athlete)
Motivation
6
name
Real_Madrid
SoccerClub
Real_Madrid_CF
ISWC 2015, Bethlehem PA
D1 D2
Low-level changes Simple changes
name
Real_Madrid
SoccerClub
Real_Madrid_CF
Mikel_Lasa
team
type
Athlete
name
“Mikel Lasa”
Add_Property_Instance(Mikel_Lasa, team, Real_Madrid_CF)
Add_Property_Instance (Mikel_Lasa, name, “Mikel Lasa”)
Add_Type_To_Individual(Mikel_Lasa, Athlete)
(Mikel_Lasa, team, Real_Madrid_CF)
(Mikel_Lasa, name, Mikel Lasa)
( (Mikel_Lasa, rdf:type, Athlete)
Motivation
6
name
Real_Madrid
SoccerClub
Real_Madrid_CF
ISWC 2015, Bethlehem PA
D1 D2
Low-level changes Simple changes Complex changes
name
Real_Madrid
SoccerClub
Real_Madrid_CF
Mikel_Lasa
team
type
Athlete
name
“Mikel Lasa”
Add_Property_Instance(Mikel_Lasa, team, Real_Madrid_CF)
Add_Property_Instance (Mikel_Lasa, name, “Mikel Lasa”)
Add_Type_To_Individual(Mikel_Lasa, Athlete)
Add_Athlete(“Mikel Lasa”,Real_Madrid_CF) (Mikel_Lasa, team, Real_Madrid_CF)
(Mikel_Lasa, name, Mikel Lasa)
( (Mikel_Lasa, rdf:type, Athlete)
Motivation
6
name
Real_Madrid
SoccerClub
Real_Madrid_CF
ISWC 2015, Bethlehem PA
Simple vs Complex Changes
11
Simple Changes Complex ChangesTerminology Fixed, domain-independent
(e.g., Add_Property_Instance)Custom, domain-dependent (e.g., Add_Athlete)
Changes Set Fixed, pre-defined Variable, user-definedComposition Low-level changes Simple changes
Usability Fine-grained types of evolution Coarse-grained types of evolutionChanges
PartitioningPerfect, i.e., complete and unambiguous
Cannot capture all the evolution aspects
SC1
SC2
SC1
SC2
SC1
SC2
ISWC 2015, Bethlehem PA
Change Detection - Approach
• Detection based on plain SPARQL queries – one per change
• Simple changes – Prebuilt-queries
• Complex changes – Queries constructed dynamically
upon change definition
• Query results create the change-set
12
ADD_SUPERCLASSSELECT ?sub,?sup WHERE {GRAPH <v2>{?sub rdfs:subClassOf ?sup } FILTER NOT EXISTS {GRAPH <v1> { ?sub rdfs:subClassOf ?sup } }}
ADD_ATHLETESELECT ?athlete WHERE {GRAPH <changes>{?sc a co:Add_Type_To_Individual.?sc co:atti_p1 ?athlete.?sc co:atti_p1 dbpedia:athlete.} }
ISWC 2015, Bethlehem PA
Change Representation - Motivation
13
What can I do with all these changes ???
ISWC 2015, Bethlehem PA
• Interesting queriesReturn all the left backs,born before 1980, which were transferred to Athletic Bilbao between versions V1 and V2, and used to play for Real Madrid CF in any version
• Access to both the changes and the data is required, across multiple versions– Changes are first-class citizens
Change Representation - Motivation
14
ISWC 2015, Bethlehem PA
Change Representation - Approach
15
Change
Data
Schema
Simple_ChangeComplex_Change
consumes
ISWC 2015, Bethlehem PA
Change Representation - Approach
15
Change
Data
Schema
Simple_ChangeComplex_Change
Add_Property_Instance
Add_Type_To_Individual
consumes
ISWC 2015, Bethlehem PA
Change Representation - Approach
15
Change
Data
Schema
Simple_ChangeComplex_Change
Add_Property_Instance
Add_Type_To_Individual
SELECT …
sparql_info
Add_Player
consumes
ISWC 2015, Bethlehem PA
Change Representation - Approach
15
Change
Data
Schema
Simple_ChangeComplex_Change
Add_Property_Instance
Add_Type_To_Individual
SELECT …
sparql_info
Add_Player
consumes
D/changes/App1/schema
ISWC 2015, Bethlehem PA
Change Representation - Approach
15
Change
Data
Schema
Simple_ChangeComplex_Change
Add_Property_Instance
Add_Type_To_Individual
SELECT …
sparql_info
Add_Player
consumes
AddPropIn1
D/changes/App1/schema
ISWC 2015, Bethlehem PA
Change Representation - Approach
15
Change
Data
Schema
Simple_ChangeComplex_Change
consumes
Add_Property_Instance
Add_Type_To_Individual
SELECT …
sparql_info
Add_Player
consumes
AddPropIn1
D/changes/App1/schema
AddPlayer1
ISWC 2015, Bethlehem PA
Change Representation - Approach
15
Change
Data
Schema
Simple_Change
D/changes/v1-v2
Complex_Change
consumes
Add_Property_Instance
Add_Type_To_Individual
SELECT …
sparql_info
Add_Player
consumes
AddPropIn1
D/changes/App1/schema
AddPlayer1
ISWC 2015, Bethlehem PA
Change Representation - Approach
15
Change
Data
Schema
Simple_Change
D/changes/v1-v2
Complex_Change
consumes
Add_Property_Instance
Add_Type_To_Individual
SELECT …
sparql_info
Add_Player
consumes
AddPropIn1team
…
Real_Madrid_CF
“Mikel Lasa”
…
D/changes/App1/schema
AddPlayer1
ISWC 2015, Bethlehem PA
Change Representation - Approach
15
Data
Change
Data
Schema
Simple_Change
D/changes/v1-v2
Complex_Change
consumes
Add_Property_Instance
Add_Type_To_Individual
SELECT …
sparql_info
Add_Player
consumes
AddPropIn1team
…
Real_Madrid_CF
“Mikel Lasa”
…
D/changes/App1/schema
AddPlayer1
ISWC 2015, Bethlehem PA
Experimental Evaluation
• Purpose1. Identify the number and type of simple and complex changes
that usually occur in real-world settings2. Study the performance and quantify the effect of different
parameters of our change detection process
• Setting– Open source version of Virtuoso Universal Server hosted on a
machine which uses a 12-core Intel Xeon E5-2630 at 2.30GHz and 64GB of RAM dedicated for Virtuoso
– For the multi-threaded part of the queries execution and files ingestion, the experiments were conducted using 8 of the cores in parallel
24
ISWC 2015, Bethlehem PA
Datasets
• We used 3 real RDF datasets of different sizes: – A subset of the English Dbpedia 6 (consisting of article
categories, instance types, labels and mapping-based properties) - http://dbpedia.org
– The Foundational Model of Anatomy ontology (FMA) -http://sig.biostr.washington.edu/projects/fm/AboutFM.html
– The Experimental Factor Ontology (EFO) -http://www.ebi.ac.uk/efo
25
ISWC 2015, Bethlehem PA
Simple Changes
Simple Changes for all datasets
26
ADDITIONS DELETIONSADD_TYPE_CLASS DELETE_TYPE_CLASS
ADD_TYPE_PROPERTY DELETE_TYPE_PROPERTY
ADD_SUPERCLASS DELETE_SUPERCLASS
ADD_SUPERPROPERTY DELETE_SUPERPROPERTY
ADD_DOMAIN DELETE_DOMAIN
ADD_RANGE DELETE_RANGE
ADD_PROPERTY_INSTANCE DELETE_PROPERTY_INSTANCE
ADD_COMMENT DELETE_COMMENT
ADD_LABEL DELETE_LABEL
ADD_TYPE_TO_INDIVIDUAL DELETE_TYPE_FROM_INDIVIDUAL
ISWC 2015, Bethlehem PA
Complex Changes per dataset
Complex Changes
27
DBpedia FMA EFOAdd_Subject Add_Concept Add_Definition
Delete_Subject Delete_Concept Delete_Definition
Add_Thing Add_Restriction Add_Synonym
Delete_Thing Delete_Restriction Delete_Synonym
Add_Athlete Add_Synonym Mark_as_Obsolete
Update_Label Update_Comment Update_Comment
Add_Place Update_Domain Update_Domain
Delete_Place Update_Range Update_Range
Add_Person Add_Observation Add_Observation
Delete_Person Delete_Observation Delete_Observation
ISWC 2015, Bethlehem PA
1. Simple Changes Analysis
28
539
137
333
29
6116
20
1008
210
564
103
9425
28
1108994
1195937
2693767
10135868
1110
126
252
75
98556
25
611
76
107
28
6110
17
156913
230542
4018781
5904209
51
256
5871
2
16
274
66
6297
6
1079243
1040740
1207457
3177835
17
81
9438
7
5864
151384
142863
1350035
1822612
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
EFO6
EFO5
EFO4
EFO3
EFO2
EFO1
FMA2
FMA1
DBp2
DBp1
ADD_COMMENT ADD_DOMAIN ADD_LABELADD_PROPERTY_INSTANCE ADD_RANGE ADD_SUPERCLASSADD_TYPE_CLASS ADD_TYPE_TO_INDIVIDUAL DELETE_COMMENTDELETE_DOMAIN DELETE_LABEL DELETE_PROPERTY_INSTANCEDELETE_RANGE DELETE_SUPERCLASS DELETE_TYPE_CLASSDELETE_TYPE_FROM_INDIVIDUAL
ISWC 2015, Bethlehem PA
1. Complex Change Analysis
29
144844
142132
482
66
209
1402
20
144791
142105
408562
2693767
2685039
526
144
355
38
8022
7
47170
46684
1087642
874318
1
70
11
4
142105
139348
1207457
1182412
1
15
204
1
6286
2
142132
139374
61
224
6
2
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
EFO6
EFO5
EFO4
EFO3
EFO2
EFO1
FMA2
FMA1
DBp2
DBp1
Add Athlete Add Concept Add Definition Add Observation Add Person
Add Place Add Restriction Add Subject Add Synonym Add Thing
Delete Definition Delete Observation Delete Person Delete Place Delete Restriction
Delete Subject Delete Synonym Delete Thing Mark as Obsolete Update Comment
Update Domain Update Label Update Property Update Range
ISWC 2015, Bethlehem PA
2. Simple Changes Performance
• Size of the compared versions • Type of the detected simple changes • Number of detected simple changes
# detected simple changes >> size of compared versions30
ISWC 2015, Bethlehem PA
2. Complex Changes Performance
• Changes ontology size • Number of detected complex changes• Type of detected complex changes• Number of consumed simple changes per detected
complex change more complex queries
31
32
• Low-level change detections (e.g., [1])– Report simple add/delete operations– Not concise or intuitive enough to human users
• High-level change detections (e.g., [2,3])– Provide more human-readable deltas– Predefined list of considered changes w.r.t the context
• [3] the closest to our work– Fixed high-level changes is proposed – no configurable– No cross-version, data-and-changes queries – Significantly less scalable and efficient implementation
Related Work
1. D. Zeginis et al. On computing deltas of rdf/s knowledge bases. ACM Trans. Web, 20112. P. Plessers, et al. Understanding ontology evolution: A change detection approach. Web Semant., 20073. V. Papavasileiou et al. High-level change detection in RDF(S) KBs. ACM Trans. Database Syst., 2013
ISWC 2015, Bethlehem PA
Summary
• We proposed an approach to cope with the dynamicity of web datasets that:– Manages fixed, application-generic changes with formal properties
(simple changes)– Allows defining customized changes at run-time (complex changes)– Offers a scalable generic detection mechanism (via sparql queries)
• We performed sophisticated analysis on top of the detected changes– We represent changes in an ontology and treat them as first-class
citizens– We consider cross-snapshot queries, and queries involving
evolution history and data
33
34
Q & A
• Demo– D2V: A Tool for Defining, Detecting and Visualizing Changes on
the Data Web. www.ics.forth.gr/isl/D2VSystem
• Acknowledgements