Ontology Mapping Tool for Diabetes By Madhuri Gopal

Preview:

DESCRIPTION

Ontology Mapping Tool for Diabetes By Madhuri Gopal. Topics covered: Project overview Design Principles Technology Stack Approach and Methodology Execution Framework Modules Covered Results. Project Overview : Background The aim of the project is to overcome semantic - PowerPoint PPT Presentation

Citation preview

Ontology Mapping Tool for Diabetes

By Madhuri Gopal

Topics covered:

• Project overview• Design Principles• Technology Stack• Approach and Methodology • Execution Framework• Modules Covered• Results

Project Overview: Background • The aim of the project is to overcome semantic heterogeneity in the WWW by using ontology mapping techniques that find the semantic correspondences between similar elements of two ontologies. • We are aiming to map ontology that are created from standard documents on Diabetes medical domain. • Our approach will enable better decision making support for queries on these

documentsChallenges in the existing systems• Identification of a safer drug regimen requires searching through a space of

indicated regimens that outnumbers the pages Google searches 1000 to 1.• A single criterion is insufficient to guide the selection of a safer regimen.• Fragmented gathering and storage of clinical data• Lack of formal standardized knowledge representation of clinical data.

Design Principles Open Close Principle Software entities like classes, modules and functions should be open for extension but closed for modifications.

Dependency Inversion Principle a) High-level modules should not depend on low-level modules. Both should depend on abstractions. b) Abstractions should not depend on details. Details should depend on abstractions.

Interface Segregation Principle Clients should not be forced to depend upon interfaces that they don't use.

Design Principles contd… Single Responsibility Principle A class should have only one reason to change.

Liskov's Substitution Principle Derived types must be completely substitutable for their base types.

Technology Stack

The architecture followed is a 2 tier architecture.

Front-End : Java Back-end : Ontology (.owl files)

Development Hardware

Processor: Intel(R) Core™ 2 Duo CPU T6400 @ 2.00 GHZ Memory(RAM) : 4 GBSystem type: 32-bit Operating System

Tools used

Protégé - Ontology Creation (Stanford Open Source Tool)PDPTools – Neural networks Simulator ( Stanford Open Source Tool)

Approach and Methodology • Software prototyping (Incremental prototyping) methodology is used for development.

• The final product is built as separate prototypes. • At the end the separate prototypes are merged in an overall design

• Steps are: a) Identification of basic requirements. b) Development of the initial prototype c) Review of prototype d) Revision and Enhancement of the Prototype

Execution Framework• Eclipse IDE is used as the execution framework.

• All the required plugins (jar files) from protégé/plugins/edu.stanford.smi.protegex.owl and OWL API ( open source API) are included in the build path of the Java project for accessing the ontology built using Protégé ( Stanford open source tool).

• The IAC Neural networks is implemented using PDPTools suite of neural networks software ( Stanford tool for Parallel Distributed Processing) which runs in Matlab . All required inputs are taken from java environment by

connectivity between Eclipse and Matlab

Overall Architecture

Modules covered 1) Creation of diabetes ontology from American Association of Clinical Endocrinologists (Benchmark document ) and from Wikipedia

2) Name Similarity Matrix calculated for all terms in both ontologies using the Levenshtein Distance formula ( Dynamic Programming Technique)

3) Profile Similarity Matrix calculated using term frequency – inverse document frequency (tf.idf statistical data mining algorithm ) .

4) Conversion of ontology terms to a vector space model and computation of Cosine Similarity matrix.

Modules covered contd….5) Structural similarity matrix for calculation of structural similarity between ontologies using basic structural features such as depth from root, number of children , number of instances.

6) Similarity Aggregator for aggregating the name similarity , profile similarity and structural similarity

7) Harmony function estimation for filtering out the most useful similarities and eliminating the erroneous similarity.

8) IAC neural networks algorithm that satisfies a constraint satisfaction problem for improving the mapping between the two ontologies.

Ontology Creation

- Using Protégé

Ontology 1

Ontology 2

Ontology Mapping

Ontology Mapping

Input: 2 homogeneous ontologies O1 and O2 expressed in formal ontology language (OWL/RDF) .

Output: 4 Tuple: M(e1i , e2j , r, s)

where ‘M’ is the mapping

e1i is an element in O1 e2j is an element in O2 r mapping between e1i and e2j s confidence measure of mapping normalized from [0..1]

IR Based Similarity

Generator

Input: Ontologies O1 ,O2

Output : 3 similarity matrices that contain similarity scores for each pair of elements in

ontologies. Similarity Matrices : • Name Similarity• Profile Similarity• Structural Similarity

Name Similarity

This is calculated based on the edit distance between the name(id) of the elements

NameSim(e1i, e2j) = 1- { EditDist(e1j , e2j) / Max(l(e1i) , l(e2j)) }

where : EditDist - LevenShtein distance between elements.

l(e1i) and l(e2j)- length of strings e1i and e2j.

Sample Output for two Ontologies with 6 elements each

Name similarity matrix of dimension 37*26

Profile Similarity:

The profile similarity is defined in 3 steps:

• Profile Enrichment• Profile Propagation• Profile Mapping

Profile Enrichment and

Propagation

• Profile of a class Class ID + Comments + Properties Profiles + Instances Profiles

• Profile of a property Property ID + Property Domain + Property Range

• Profile of an instance Instance ID+ Descriptive information

Profile Mapping

• Cosine similarity between the profiles of the 2 elements e1i and e2j is calculated in a vector space model . → → ProfileSim(e1i, e2j) = ( Vei1 Ve2j) / ( |Vei1||Ve2j| )

where: Ve1i and Ve2j are 2 vectors representing the profile of elements e1i and e2j respectively.

Property domain range of Ontology1

Property Domain Range of Ontology 2

Cosine Similarity Matrix

Structural similarity

• This is applicable for classes alone as they have hierarchical information StructSim(e1i,e2j) = ∑ ( 1-diffk(e1i,e2j) / N where: e1i , e2j are 2 class elements in the ontology O1 and O2 respectively N – total number of structure features diffk(e1i , e2j) denotes the difference for feature k.

diff(e1i,e2j) = (sf(e1i) - sf(e2j)) / max (sf(e1i) , sf(e2j))

where:

sf(e1i), sf(e2j) denote the value of a structural feature of the element

Identical Ontologies Similarity Calculation

Structural Similarity Matrix

Harmony • Harmony estimates the importance and reliability of different similarities. Harmony (h) = #s_max / min(#e1 ,#e2)

where : #s_max - number of pairs of elements having the highest similarity in both the row and column in the similarity matrix.

#ei - number of elements of ontology Oi

Similarity matrices Harmony Estimation

Adaptive Similarity Aggregator

Input: Individual similarity matrices

Output : Aggregated similarity matrix

FinalSim(e1i,e2j) = ∑ hk * Simk( e1i,e2j) / n where: hk - kth similarity matrix harmony n- Total number of similarity matrices

Final Aggregated Similarity Matrix

IAC neural NetworkWith Constraint Satisfaction

H11

H12

H1n

SYNAPSIS 1

H21

H22

H2n

SYNAPSIS 2

H31

H32

H3n

Architecture

Neural Networks Constraint Satisfaction Sample Output

Thank You