23
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. DETECTING SEMANTIC DRIFT FOR ONTOLOGY MAINTENANCE Sándor Darányi (University of Borås, Sweden) Panos Mitzias (CERTH/ITI, Greece)

Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

Embed Size (px)

Citation preview

Page 1: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]

“This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”.

DETECTING SEMANTIC DRIFT FOR ONTOLOGY MAINTENANCESándor Darányi (University of Borås, Sweden)Panos Mitzias (CERTH/ITI, Greece)

Page 2: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶Evolving Semantics & Digital Preservation▶The PERICLES Approach▶PERICLES Tools

◦ Somoclu◦ SemaDrift

▶Putting it All Together◦ Data◦ Workflow◦ Sample Results

▶Conclusions

Outline

Page 3: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶Schlieder (2010) brings three examples why LTDP is paramount:◦ Because hardware and software evolves (technology

drift)◦ Because language changes (semantic drift)◦ Because value systems underlying societies change

(social value shifts)▶Apart from DP, formalizing change scenarios so that

they become manageable by computers is a hot research topic also in:◦ Semantic Web, Knowledge Engineering & Management,

Natural Language Processing, Document Engineering, Digital Humanities, Data Science

Evolving Semantics & Digital Preservation

Page 4: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶With DP acting roughly in the 5 to 50 years interval, recent advances in LTDP look at longer ranges◦ 2000 years: the use of DNA for very long term DP [Grass et

al., 2015]◦ 13.8 billion years: DNA combined with nanostructured

glass storage [Kazansky et al., 2016]▶The ultimate question is the returns of investment

into DP and LTDP, should one lose access to already preserved content◦ Currently proposed preventive measure: Develop scalable

methodologies of context-aware content interpretation by monitoring semantic vs. conceptual drifts

Evolving Semantics & LTDP Horizons

Page 5: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶We address evolving semantics from two perspectives: ◦ Change-sensitive ontologies necessitate logic◦ Scalability and the distributed nature of content asks

for statistical processing▶These two major components complement and

inform each other and become tools of the model-driven DP paradigm◦ E.g. collection-specific domain ontologies and change

monitoring options help appraisal

The PERICLES Approach

Page 6: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶Time-dependent content displacement in vector space affects categorization & retrieval

▶Model such content dynamics on a vector field, by metaphorical use of physical concepts

▶Somoclu (Self-Organizing Map Over a CLUster), using the ESOM (Emerging Self-Organizing Maps) algorithm

◦ Fastest massively parallel open-source SOM algorithm available, developed in PERICLES

PERICLES Tools - Somoclu

https://github.com/peterwittek/somoclu

Sándor Darányi
It does not hurt
Stratos Kontopoulos
_Marked as resolved_
Stratos Kontopoulos
[email protected] I added the Somoclu URL in the bottom, you can remove it if you don't want it there.
Sándor Darányi
_Re-opened_To the Morphing Chains slide: it will be confusing that our single S2S example is Ophelia whereas the chains relate to terminology changes about software based art, but that for a single slide only. Somehow Panos should stand by to explain how morphing chains can be matched with statistical drifts, or we end up in an unexplored area. Maybe how the drift log can be used to create such chains?
Page 7: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶Reduces high-dimensional space to low-dimensional one (2-d)

▶Preserves local topology▶Suitable for drift detection of

feature/ object locations▶After training the algorithm, each

data instance has a node (Best Matching Unit, BMU) on the map

▶Intense colours on the map indicate high distances between the original data points

Somoclu and Self-Organizing Maps

Page 8: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

Drift Detection Workflow

Page 9: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

Problem▶To measure semantic

drift in ontologies across time & versions▶Related to ontology

evolution, versioning, drift/shift/decay ->

PERICLES Tools - SemaDrift

Page 10: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶A suite of tools for measuring drift in ontologies across time/versions◦ SemaDrift Library (API)◦ SemaDrift Protege Plugin (GUI desktop application)◦ SemaDrift FX (GUI desktop application)

▶Cross-domain, no prior programming knowledge▶Apache V2 License▶Two proof-of-concept use cases: Tate and OWL-S

PERICLES Tools - SemaDrift

Page 11: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

SemaDrift Workflow

Collection of rdfs:labels

Concept to concept Label Drift

Series of ontologies

Collection of property triples

Concept to concept Intensional Drift

Collection of instance URIs

Concept to concept Extensional Drift

Concept to concept Whole Drift

Average Label, Intension, Extension Drift for the series

compare each concept to all concepts of next ontologyfor all concepts

average of label, int, ext

average of all concepts

output

Page 12: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

SemaDrift GUI Desktop Applications

Page 13: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶Extracted offline▶In this scenario, extensional drift shows

clearly

Morphing Chains

Page 14: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶Monitor feature (index term) drifts over time▶ Apply threshold to at-risk (splitting/merging) index terms▶ Extract index terms above threshold▶Ontology Creation (creation of a Digital Ecosystem Model,

DEM)SOMOCLU > Propose least volatile terms to be included in the model

▶Ontology MaintenanceSOMOCLU+SemaDrift > Assess at-risk terminology, update model, alert user

▶Appraisal SOMOCLU > Extract period-specific objects

Putting it All Together

Page 15: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

Tate Collection Statistics for Drift Analysis▶ Catalog as open data for

69.202 artworks in JSON format (53.698 time-stamped)

▶ Indexed by Tate’s own hierarchical subject index (three levels, from general to specific index terms)

▶ Two acquisition peaks: 1796-1844 (33.625 artworks) and 1960-2009 (12.756 artworks), broken down into 10 five-years epochs each

▶ 46.381 artworks in the experiment

Page 16: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

A Typical Tate Artefact: J.E.Millais’

Ophelia...

Page 18: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

J.E.Millais’ Ophelia in the Domain Ontology

Page 19: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

Semantic Drift Example: Concept Splitting in the Tate Collection

Page 20: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

With semantic relations impacted...

Drift Detection at Work

...ontology relations also change

Page 21: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

Another Type of Change in Conceptual Coherence: natural and inland merge

1796-1800 1801-1805

Page 22: Detecting Semantic Drift for ontology maintenance - Acting on Change 2016

▶Statistical analysis of scalable collections reveals content dislocations over time

▶Such drifts are the norm, not an exception▶They influence future access to content by their

impact on the efficiency of information retrieval and classification

▶For a remedy, drift detection can alert ontology maintenance and artefact (object) appraisal by designated workflows

Conclusions