31
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

  • View
    225

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Designing clustering methods for ontology building:

The Mo’K workbench

Authors:Gilles Bisson, Claire Nédellec

and Dolores CañameroPresenter:

Ovidiu Fortu

Page 2: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

INTRODUCTION

Paper objectives: Presentation of a workbench for

development and evaluation of the methods that learn ontologies

Some experimental results that illustrate the suitability of the model in characterization of the methods of learning semantic classes

Page 3: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

INTRODUCTION

Ontology building general strategy: Define a distance metric (as good an

approximation for the semantic distance as possible)

Devise/use a classifying algorithm that uses the above distance to build the ontology

Page 4: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Harris’ hypothesis

Formulation: Study of syntactic regularities leads to identification of syntactic schemata made out of combinations of word classes reflecting specific domain knowledge

Consequence: one can measure similarity using cooccurence in syntactic patterns

Page 5: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Conceptual clustering

Ontologies are organized as acyclic graphs: Nodes represent concepts Links represent inclusion (generality

relation) The methods considered in this

paper rely upon bottom-up construction of the graph

Page 6: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

The Mo’K model

Representation of examples: Binary syntactic patterns of the form:<head – grammatical relation – modifier head>,

where <modifier head> is the object, and the rest of the pattern is the attribute

Example: This causes a decrease in […] <cause Dobj decrease>

Page 7: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Clustering

Bottom up clustering by joining classes that are near: Join classes of objects (nouns or actions

– tuples <verb, relation>) that are frequently determined by the same attributes

Join attribute classes that frequently determine the same objects

Page 8: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Corpora

Specialized corpora used for domain specific ontologies

Corpora are pruned (rare examples are eliminated) – the workbench allows the specification of Minimum number of occurences for a

pattern to be considered Minimum number of occurences for an

attribute/object to be considered

Page 9: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Distance modeling

Consider only distances that: Take syntactic analysis as input Do not use other ontologies (like

WordNet) Are based on distributions of the

attributes of an object Identify general steps in

computation of these distances to formulate a general model

Page 10: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Distance computation

Step 1: weighting phase Modify the frequencies of elements in the

contingency matrix using general algorithm: Initialization of the weight of each example E:

W(E) Initialization of the weight of each attribute A:

W(A) For each example E

For each attribute A of the example Calculate W(A) in the context of E

Update global W(E) For each attribute A of the example

Normalization of the W(A) by W(E) Step 2: similarity computation phase

Page 11: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Distance evaluation

The workbench provides support for evaluation of metrics

The procedure is Divide the corpus in training and test Perform clustering on training Use similarities computed on training to

classify examples in the test and compute precision and recall – produce negative examples by randomly combining objects and attributes

Page 12: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Experiments

Purpose: evaluate Mo’K ’s parameterization possibilities and the impact of the parameters on results

Corpora: two French corpora One with cooking recipes from the Web

– nearly 50000 examples One with agricultural data (Agrovoc) –

168287 examples

Page 13: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Results (Asium’s distance, 20% test)

Corpus Learning object

% Induced learned triplets

Recall (test set)

Precision

Agrovoc

Action 40% 4.7% 45%

Nom 38% 5.3% 45%

Cooking Action 34% 12% 32%

Nom 38% 9.1% 52%

Page 14: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Recall rate

X-axis: the number of disjointed classes on which recall is evaluated

Page 15: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Class efficiency

Class efficiency: ration between triplets learned and triplets effectively used in evaluation of recall

Page 16: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Conclusions

Comments? Questions?

Page 17: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Ontology Learning and Its Application to Automated Terminology Translation

Authors:Roberto Navigli, Paola Velardi and

Aldo GangemiPresenter:

Ovidiu Fortu

Page 18: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Introduction

Paper objective: Present OntoLearn, a system for

automated construction of ontologies by extraction of relevant domain terms from corpora of text

Present the usage of OntoLearn in the task of translating multiword terms from English to Italian

Page 19: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

The OntoLearn architecture

Complex system, uses external resources like WordNet and the Ariosto language processor

Page 20: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

The OntoLearn

New important feature: Semantic interpretation of terms (word

sense disambiguation) Three main phases:

Terminology extraction Semantic interpretation Creation of a specialized view of

WordNet

Page 21: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Terminology extraction

Terms selected with shallow stochastic methods

Better quality if syntactic features are used

High frequency in a corpus is not necessarily sufficient: credit card – is a term last week – not a term

Page 22: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Terminology extraction, continued

The comparison of frequencies in texts from different domains eliminates such constructs as “last week” – domain relevance score

Relevance of term t in domain Dk

k

kt

Dtkt

ktk

k

njj

kkt

D

tf

f

fDtPE

DtP

DtP

DtPDR

k

domain in

termoffrequency

))|((

by estimated is )|( where

,)|(

)|(

,

','

,

,1

,

Page 23: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Terminology extraction, continued

Domain consensus of a term t in class Dk exploits the frequency of t across documents

td

dP

dPdPDC

kDd ttkt

includes document that

y probabilit theis )( where

,))(

1log)((,

Page 24: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Terminology extraction, continued

A combination of the two scores is used to detect relevant terms

)1,0( and

entropy normalized a is where

,)1(

,

,,,

normkt

normktktkt

DC

DCDRDW

Only the terms with DW larger than a threshold are retained

Page 25: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Semantic interpretation

Step 1: create semantic nets for every wk t and any synset wk by following all WordNet links, but limiting the path length to 3 (after disambiguation of words)

Step 2: intersect the networks and compute a score based on the number and type of semantic patterns connecting the networks

Page 26: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Semantic interpretation, continued

Semantic patterns are instances of 13 predefined metapatterns

Example: Topic, like in archeological site

Compute the score (Sik is sense k of

word i in the term) for all possible pairs

21 SS topic

)()( 1ki

ji SSNSSNI

Page 27: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Semantic interpretation, continued

Use the common paths in the semantic networks to detect semantic relations (taxonomic knowledge) between concepts: Select a set of domain specific semantic

relations Use inductive learning to learn semantic

relations given ontological knowledge Apply the model to detect semantic relations

Errors from the disambiguation phase can be corrected here

Page 28: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Creation of a specialized view of WordNet

In the last phase of the process Construct the ontology by eliminating

the WordNet nodes that are not domain terms from the semantic networks

A domain core ontology can also be used as backbone

Page 29: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Translating multiword terms

Classic approach: use of parallel corpora Advantage: easy to implement Disadvantage: few such corpora,

especially in specific domains OntoLearn based solution:

Use EuroWordNet and build ontologies in both languages, associating them to synsets

Page 30: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Translation – the experiment

Experiment on 405 complex term in a tourism corpus

Problem: poor encoding of Italian words in EuroWordNet (fewer terms than in the English version – reduce to 113 examples)

Use semantic relations given by OntoLearn to translate: room service servizio in camera

Quality of translation Good Acceptable Poor

Manually corrected input

74% 14% 12%

OntoLearn input 70% 14% 16%

Page 31: Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu

Conclusions

Questions? Comments?