CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC ANNOTATION ADAPTATION
1
Cédric PRUSKI
Dri%-‐a-‐LOD@EKAW 2016, November 20th, Bologna, Italy
MOTIVATION
2
data
KS KT
malignancy Malignant neoplasm
=
?
inaccessible
Outdated mappings and annotations may trigger undesirable results in biomedical systems
Crucial maintaining mappings
and annotations valid Malignant neoplasm
Large size and complexity
Prevents a totally manual maintenance
malignancy
malignancy
data
?
• What is the impact of concept drift (or ontology evolution) on ontology mappings and semantic annotations? • Quantitative • Qualitative
• How can we formally characterize concept drift? • Basic changes (Addition/Deletion of concepts) • Complex changes (Split, merge, move of concepts)
• Can we reuse information that characterizes concept drift to adapt ontology mappings and semantic annotations? • Prevention of re-alignment / re-annotation of whole datasets
PROBLEMATIC
3
① Concept drift for mapping adaptation a. DynaMO research project b. Change patterns
② Concept drift for semantic annotation maintenance a. ELISA research project b. Background knowledge
③ Discussion a. Concept drift for LOD
AGENDA
4
THE CASE OF MAPPING ADAPTATION
5
“Adaptation of existing mappings according to modifications
affecting KOS elements at evolution time”
Definition and Problematic
ONTOLOGY MAPPING ADAPTATION
6
MV1=(s, t, r) MV2=(s’, t, r’)
Hypothesis: There is a correlation between the way KOS’ elements evolve and the way mappings are adapted
UNDERSTANDING MAPPING EVOLUTION
7
• Identify potential interdependencies between changes affecting KOS entities and the mapping evolution
• Empirically examine official and real-world mappings over time • Evolution of SNOMED CT and ICD9CM as a case study
~400 000 mappings analyzed
SNOMEDCT
Jan/10
SNOMEDCT
Jul/10
SNOMEDCT
Jan/11
SNOMEDCT
Jul/11
ICD9CM 2009
ICD9CM 2010
MST 1
Jan/10 MST 2 Jul/10
MST 3 Jan/11
MST 4 Jul/11
How concept drift impact mappings?
How to identify these attributes?
KEY FINDINGS
8
This concept changed 560.39
≡ ≤ ≤ ≤
560.39
168000
is-a
44635007
is-a
29162007 168000 40515007
is-a
560.32
This concept was added
40515007
Before Evolution After Evolution
197063004
≡ ≤ ≤
ICD9CM
SNOMED CT SNOMED CT
ICD9CM
≡
similarity
Enterolith (disorder)
Typhlolithiasis (disorder) Concretion
of intestine (disorder)
Impaction of intestine
29162007 44635007
≤ ≡
Fecal impaction
Fecal impaction of colon
197063004
Fecal impaction
Fecal impaction of colon
Observed modifications
Time
Attributes -Concretion of intestine -Enterolith -Fecal impaction
Mapping adaptation based on the evolution of relevant concept attributes
Lexical change patterns
CHARACTERIZATION OF CHANGES
9
a1, a2,…, an
asup1, asup2,…, asupn
asib1
asub1, asub2, …, asubn
a1, a2, …, an asib1, asib2
Ø Total Copy (TC)
Ø Total Transfer (TT)
Ø Partial Copy (PC)
Ø Partial Transfer (PT)
unspecified mental behavioral problem
bronzed diabetes inflammatory bowel diseases
bronzed diabetes inflammatory bowel diseases 1
specified behavioral problem
inflammatory bowel diseases
cs0
cs1
time CONTEXT = SUP ∪ SUB ∪ SIB
time j
specified behavioral problem
time j+1
Semantic change patterns
CHARACTERIZATION OF CHANGES
10
a1, a2,…, an
asup1, asup2,…, asupn
asib1
a1, a2, …, an
asib1, asib2
asub1,…, asubn
Ø Equivalent (EQV)
Ø Partial Match (PTM)
Ø More Specific (MSP)
Ø Less Specific (LSP)
Diabetes type 1
Diabetes type I Focal atelectasis
Helical atelectasis
familial chylomicronemia
familial hyperchylomicronemia Kappa chain disease
Kappa light chain disease
cs0
cs1
time j+1
time
time j
CONTEXT = SUP ∪ SUB ∪ SIB
Heuristics
LINKING CP AND MAINTENANCE ACTIONS
11
as1, as2, as3, …, asn
as1, as2, as3, …, asn
asib1, asib2,…, asibn
cs0a1,…, ak ct
semType Affected by KOS changes
KOS KS KOS KT
cs1
relevant attributes
MoveM(mst , ccand1
)
ccand1
∃!Lexical CP (Total Transfer) Semantic CP
unchanged
Kappa light chain disease
Kappa chain disease
CONTEXT = SUP ∪ SUB ∪ SIB
time j
time j+1
• Concept drift has a huge impact on ontology mappings but some changes in concept do not affect mappings
• Drift of attribute values governs the mapping adaptation process
• In most of the cases concept drift results in local changes • Change in super, sub concepts and siblings
• Considering ontology versions alone is not enough to characterize concept drift • Need of external background knowledge to better determine the semantic relationship
between versions of concept • Cf. semantic annotation adaptation
Lessons learned
CONCEPT DRIFT FOR MAPPING ADAPTATION
12
THE CASE OF SEMANTIC ANNOTATIONS ADAPTATION
13
www.elisa-‐project.lu elisaelisa
Problem
SEMANTIC ANNOTATIONS ADAPTATION
14
Impact of concept drift on semantic annotations
METHODOLOGY
15
RESULTS
16
RESULTS
17
RESULTS
18
RESULTS
19
• Concept may have labels before and after evolution that are disjoint from the syntactic or lexical point of view • Ex: Cancer Malignant neoplasm
• Lexical and Semantic change patterns cannot be applied
• Consideration of external knowledge sources are required to characterize the evolution of concepts in such situations
• We propose a methods exploiting Bioportal to overcome this limitation • Ontologies • Mappings
• The method is able to find the semantic relationship between two versions of the same concepts • Equivalent, less specific, more specific, unrelated, partially matched
Use of external knowledge source
CONCEPT DRIFT FOR ANNOTATIONS
20
Example
USE OF EXTERNAL KNOWLEDGE SOURCE
21
“Pituitary)dwarfism”)(MeSH))
“Pituitary)dwarfism)II”)(MeSH))
SNOMED)CT,)ICD9CM,)MEDDRA,)
NCIT,)DOID,)RCD,)HP,)DERMLEX,)NATPRO,)
CRISP,)SOPHARM,)BDO,)SNMI)
OMIM)NDFRT)
Search)in)ontologies) Search)in)ontologies)
No)common)ontologies)
Use)mappings)
15)mappings)available)(OMIM)ontology))
“Pituitary)dwarfism)II”)(OMIM))Mapped_to)
“LaronRtype)isolated)somatotropin)defect”)(SNOMED)CT))
SNOMED)CT)is)the)common)ontology)
“LaronRtype)isolated)somatotropin)defect”)and)“Pituitary)dwarfism”)have)the)same)super)concept)
(“short)stature)disorder”))they)are)siblings)
1 1
2
(Direct)method))
(Indirect)method))
3
• Ontology regions do not evolve in the same way • Unstable regions à handle with care • Interesting for predicting concept drift
• Concept drift has a different impact on annotation tools • GATE • NCBO annotator
• Background knowledge gives promising results for characterizing concept drift • Bioportal ontologies • RDF datasets, Web data under investigation
• Will machine learning help in understanding concept drift? • Identification of relevant features • What ML techniques to use?
Lessons learned (so far …)
CONCEPT DRIFT IN ANNOTATION ADAPTATION
22
• Linked Open Data requires vocabulary for semantic interoperability purposes
• LOD for characterizing concept drift • Quality of LOD is problematic • Some datasets rely on outdated vocabularies
• Concept drift impacting LOD: • FOAF, DC not so dynamic as domain ontologies • No control over the datasets using controlled vocabularies
à How to propagate changes observed in the vocabulary to RDF datasets?
Concept drift for LOD
DISCUSSION
23
• Silvio Cardoso, • Dr. Marcos Da Silveira, • Dr. Duy Dinh, • Dr. Julio Dos Reis, • Dr. Anika Gross, • Pr. Erhard Rahm • Pr. Chantal Reynaud-Delaître,
• And all the others …
COLLABORATORS
24
M. Da Silveira, J. C. Dos Reis, C. Pruski, Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges, IMIA Yearbook of Medical Informatics, 10(1), 125-133, 2015 J. C. Dos Reis, D. Dinh, M. Da Silveira, C. Pruski, C. Reynaud-Delaître, Recognizing lexical and semantic change patterns in evolving life science ontologies to inform mapping adaptation, Artificial Intelligence in Medicine, 63(3), 153-170, (DOI: http://dx.doi.org/10.1016/j.artmed.2014.11.002), 2015 J. C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Understanding semantic mapping evolution by observing changes in biomedical ontologies, Journal of Biomedical Informatics, 47, 71-82, 2014. S. D. Cardoso, C. Pruski, M. Da Silveira, Y-C Lin, A. Gross, E. Rahm, C. Reynaud-Delaitre, Leveraging the Impact of Ontology Evolution on Semantic Annotations, Knowledge Engineering and Knowledge Management - 20th International Conference, (EKAW) 2016, Bologna, Italy, November 19-23, 2016 J.C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Characterizing Semantic Mappings Adaptation via Biomedical KOS Evolution: A Case Study Investigating SNOMED CT and ICD, AMIA 2013 Annual Symposium, Washington DC (USA), 2013 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Mapping Adaptation Actions for the Automatic Reconciliation of Dynamic Ontologies, ACM International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, CA (USA), 2013 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, The influence of similarity between concepts in evolving biomedical ontologies for mapping adaptation, European Medical Informatics Conference (MIE), 31/08 - 03/09, Istanbul, Turquie, 2014 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira and C. Reynaud-Delaître, Identifying change patterns of concept attributes in ontology evolution, Proc. of the 11th ESWC, Anissaras, Crete, (Greece), 2014. C. Pruski, J.C. Dos Reis, M. Da Silveira, Capturing the relationship between evolving biomedical concepts via background knowledge, 9th International SWAT4LS conference, Amsterdam, 2016
REFERENCES
25