15
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway Semantic Enrichment for Ontology Mapping

Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

  • View
    225

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

Xiaomeng Su & Jon Atle GullaDept. of Computer and Information Science

Norwegian University of Science and Technology

Trondheim Norway

June 2004

Semantic Enrichment for Ontology Mapping

Page 2: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 2

Semantic interoperability

The Semantic Web vision Ontology heterogeneity problem Comparison of ontologies should be aided by

automatic process Ontology mapping typically involves

identifying correspondences between the source ontologies

Page 3: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 3

Prerequisite

Focusing on light-weight ontology The ontologies share the same domain The same representation language is assumed Our approach is based on Referent Modelling

Language (RML)

Page 4: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 4

Idea illustration

Enrich the concept with extension. Documents (textual) that belong to a concept

considered to be extension. Using Information Retrieval techniques to compute

a representative feature vector of the extension information.

When no extension available, using text categorization to assign documents to concepts.

Page 5: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 5

Architecture

Mapping

Mapper

Mapping

Enhancer

Post-processing

WordNet

Presenter

Presentation&

Refinement

Exportor

Translation&

Storage

Mappingassertions

(XML)

------------

Configurationprofile(XML)

Cns + Manual

Textcategorization ----

--------

WordNetStop word

list

FVC

Semanticenrichment

Categorization results(XML)

Semantic enriching

Page 6: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 6

Functional view of the system

Document assignment (optional) Feature vector construction

Pre-processing Document representation Concept vector construction

leaf node -- average vector of the documents vectorsnon-leaf node -- weighted sum of the documents vectors,

sub concept vectors and related concept vectors

Page 7: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 7

Functional view of the system

Similarity calculation The similarity of two concepts – cosine measure The similarity of relations – domain and range The similarity calculation of clusters and that of the two

ontologies – based on the above two

Post-processing the assertions using WordNet to update the rank of assertions.

Mapping assertion generation and user feedback management

Page 8: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 8

Post-processing

To update ranks according to the concept relatedness in WordNet

We use simple path length measurement JWNL

Page 9: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 9

A prototype -iMapper

Page 10: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 10

Validation

To measure the accuracy of the mapping algorithm Focusing on concepts Using users manually identified mappings as gold

standards Measures

Precision (the fraction of automatically discovered mappings that are correct)

Recall (the fraction of the correct mappings that have been discovered)

Experiment on two domains Product catalogue integration Tourism Ontology comparison

Page 11: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 11

Measure precision at 11 standard recall levels Experiment in both domains The algorithm has identified most of the mappings and ranked them in

the correct order.

Precision vs. recall curves for the two tasks

0 %

20 %

40 %

60 %

80 %

100 %

120 %

0 % 20 % 40 % 60 % 80 % 100 % 120 %

recall

precision

product catalogue domain

tourism domain

Preliminary results

Page 12: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 12

Preliminary results

Using WordNet to update the rank showed different effects on the two domains

No topic related semantics in WordNet

WordNet effect on tourism domain

0 %

20 %

40 %

60 %

80 %

100 %

120 %

0 % 20 % 40 % 60 % 80 % 100 % 120 %

recall

precision

with WordNet

without WordNet

WordNet effect on product domain

0 %

20 %

40 %

60 %

80 %

100 %

120 %

0 % 20 % 40 % 60 % 80 % 100 % 120 %

recall

precision

with WordNet

without WordNet

Page 13: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 13

Evaluation remarks

Encouraging results Failure analysis

Quality of the gold standards Quality of the feature vector

Further evaluation Gold standard Sensitivity tests Alternative measures

Page 14: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 14

Summing up

An approach to ontology mapping based on the idea of semantic enrichment

The approach has been implemented and evaluated

Explored the possiblity of incorporating WordNet into the system

Page 15: Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic

NLDB’04 Page 15

The end...

Thank you!