31
Definition Clustering, Sense Naming & Lexical Augmentation Fabien JALABERT [email protected] Mathieu LAFOURCADE [email protected]

Definition Clustering, Sense Naming & Lexical Augmentation

  • Upload
    mirit

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Mathieu LAFOURCADE [email protected]. Fabien JALABERT [email protected]. Definition Clustering, Sense Naming & Lexical Augmentation. Study context 1/2. Natural Language Processing Lexical Semantics - WSD - Document indexing - PowerPoint PPT Presentation

Citation preview

Page 1: Definition Clustering, Sense Naming & Lexical Augmentation

Definition Clustering,Sense Naming

&Lexical Augmentation

Fabien [email protected]

Mathieu [email protected]

Page 2: Definition Clustering, Sense Naming & Lexical Augmentation

Natural Language Processing

• Lexical Semantics - WSD - Document indexing

• Dictionary construction and vectorization pb extracting definition meta-language example : ‘cannibale’ = ‘qui mange l’Homme en parlant de l’Homme’ themes : homme, manger, rhétorique

• Multi-source approach noise reduction problem : atom element = definition ≠ sense

• Objectives- clustering definitions to obtain senses- naming these senses

Study context 1/2

Page 3: Definition Clustering, Sense Naming & Lexical Augmentation

Term Tdef 1 - Source 1

def 2 - Source 1

def 3 - Source 1

def 1 - Source 2

def 2 - Source 2

def 1 - Source 3

def 2 - Source 3

def 1 - Source 1

Catégorie 1Sense 1

Sense 2

def 2 - Source 1

def 2 - Source 2

def 1 - Source 3

Sense 3def 3 - Source 1

def 1 - Source 2

def 2 - Source 3

Clustering

Multi-source base

‘Acception’ or sense base

Sense naming

Sense 2 – Name

Sense 1 – Name

Sense 2 – Name

Re-injection as new lexical source

t1

t2

t3

t4

t5

t6

tn

Study context 2/2

Page 4: Definition Clustering, Sense Naming & Lexical Augmentation

• Model, Construction, Organization

• Definition Clustering• Sense Naming• Lexical Augmentation

• Results

Summary

Page 5: Definition Clustering, Sense Naming & Lexical Augmentation

• An idea = a vector

• A vector component = a primitive as defined in a Th.– Thesaurus Larousse : 873 concepts

– Concepts are inter-related

Generator space

• A definition a vector

Conceptual Vector Model 1/2

arme

transports maritimes et fluviauxoiseau

Most activated primitives for ‘frégate’ :(oiseau 6134) (transports maritimes et fluviaux 5644) (arme 4891) …

Salton Deerwester

Chauché Lafourcade

Page 6: Definition Clustering, Sense Naming & Lexical Augmentation

Thematicaly terms close to ‘frégate’ :(destroyer 0.2246) (youyou 0.2267) (voilier 0.2268) (contre-torpilleur 0.2274) (chlamydère 0.2276) (oiseau-jardinier 0.2295) (trois-mâts 0.233) …

Thematicaly terms close to ‘frégate/oiseau/’ :(oiseau-jardinier 0.1237) (plumeur 0.1319) (goglu 0.136) (travailleur 0.136)(chlamydère 0.1385) (penne 0.141) (Galliformes 0.1422) (agami 0.1428) …

Thematicaly terms close to‘frégate/bateau/’ :(démâtage 0.1604) (dégréer 0.1676) (naval 0.1718) (bateau-piège 0.1774)

(bateau-vanne 0.1821) (batelet 0.1824) …

Conceptual Vector Model 2/2

xy

Thematic distance = angle between two vectors

Page 7: Definition Clustering, Sense Naming & Lexical Augmentation

SYGMART

la petite brise la glace

le petit briser le glace

GN – Gouv - adj GV - Gouv GN – Gouv - nf

9GN

8briser

7GV

6petit

5le

4GN

11glace

10le

3PH

2PHAMBG

1

12.

14GN

16GA

15le

18brise

17petit

22glacer

20GN

19GV

21le

23.

13PH

Definition Vector ComputationChauché

Page 8: Definition Clustering, Sense Naming & Lexical Augmentation

Learning agents : Sygmart, computation of vectors from definition, synonymy, antonymy, …

Multi-Agent OrganizationDouble-loop

Lecerf Schwab

Endogenous loop

Exogenous loop

Other agents (society)

Agent

Page 9: Definition Clustering, Sense Naming & Lexical Augmentation

Grouping definitions into senses

Clustering

Objective

Page 10: Definition Clustering, Sense Naming & Lexical Augmentation

• Deep analysis - several criteria• No training (but enhancement through exogenous loop)

• Frontier between senses and definitions

- Centroïd approach

- Heuristics (preferences) - cluster number = nb max of definitions in dictionaries- two definitions of a same source two different clusters

Clustering 1/5Strategy

Page 11: Definition Clustering, Sense Naming & Lexical Augmentation

Chaussure montante(quel qu'en soit l'usage )

Coup porté(en escrime

ou non)

Distinction entre"le coup en escrime"et "l'attaque surprise"

réunion devégétaux

Distinction entre"chaussure élégante" et"chaussure tout-terrain"

Clustering 2/5Difficulty

‘botte’

Page 12: Definition Clustering, Sense Naming & Lexical Augmentation

• Source by source iterationuntil obtaining a min value distribution

Affectation of min. value source/cluster From a distance matrix : Hungarian method – O(n3)

Clustering 3/5Algorithm 1/2

Kuhn Ford, Fulkerson

Page 13: Definition Clustering, Sense Naming & Lexical Augmentation

• For each criteriaone evaluationone distance matrix

• CriteriaComparing lexical contents of definitions

(with term frequency, co-occurrences, etc.)

Angular distanceSymbolic markers

- morphology- etymology ( ‘avocat’ : ‘ahuacatl’ / ‘advocatus’ )

- use (‘vieux’ , ‘ancien’, ‘poétique’ … )

- language level (‘argot’, ‘familier’, … )

- domain (‘médecine’, ‘zoologie’, … )

Clustering 4/5Algorithm 2/2

Page 14: Definition Clustering, Sense Naming & Lexical Augmentation

We would like to designate meanings

‘botte’

Correct results in many cases90 % for nouns, 70 % for verbs - to be done for adj

Pb with very strong polysemy vagueness, continuity in meanings

support verb: ‘prendre’,…

Study augmentation of cluster number

Clustering 5/5Results

Page 15: Definition Clustering, Sense Naming & Lexical Augmentation

Sense Naming

Objective

To give the system some capacity to « talk about a sense »

Page 16: Definition Clustering, Sense Naming & Lexical Augmentation

• Dictionary independent• Interface (man-system & system-system)

• A new lexical source looping :-)

Semantic annotation

La frégate/vaisseau/ naviguait à travers

les océans

La frégate/oiseau/ planait à travers les nues en poussant

son cri incomparable

Sense Naming 1/10Properties

Page 17: Definition Clustering, Sense Naming & Lexical Augmentation

1. Extraction

2. Validation and dispatching of polysem bags bijection

3. Evaluation of candidates

ordering and extracting the most appropriate ones

Sense Naming 2/10Procedure

Page 18: Definition Clustering, Sense Naming & Lexical Augmentation

• Extraction attached to a meaning– Morpho-syntactic analysis of the definition– Extraction of markers : « anc. », « méd. », …– Extraction from unstructured or semi-structured data (XML…)

‘frégate’ : [nf] [ancien] Au XVe s., grande barque demi-pontée gréant deux voiles latines sur antenne et assurant la liaison entre les ports et les escadres de galère. [Club Internet]

• Extraction from polysem bags– Word list (like synonym list of Université de Caen : )

Sense Naming 3/10Extraction

Ploux, Victori

ex: ‘botte’ = chaussure, bottillon, coup, attaque, amas, bouquet,…

Page 19: Definition Clustering, Sense Naming & Lexical Augmentation

Bijection being able to re-associate the proper meaning

ƒ : (term, sense) (term, annotation)

ƒ-1 : (term, annotation) (term, sense)

Sense Naming 4/10

• A candidate associated to a sense should be closer of its own sensethan any other

• Unattached candidates are associated to the closest meaning

• A candidate should not be present in a concurrent definition

),(),(, jAiAij saDsaDss ≤≠∀

Validation

Page 20: Definition Clustering, Sense Naming & Lexical Augmentation

• Extraction grade

• Evaluating the capacity to disambiguate (to distinguish a sense from all others)

• Evaluating the capacity to associateCognitive cost reduction

Sense Naming 5/10Evaluation

Prince

Page 21: Definition Clustering, Sense Naming & Lexical Augmentation

‘frégate’ : [nf] [ancien] Au XVe s., grande barque demi-pontée gréant deux voiles latinessur antenne et assurant la liaison entre les ports et les escadres de galère. [Club Internet]

XVe grande barque demi-pontée barque demi-pontée

(6) (2) (1) (3) (1)

gréant voiles latines voiles latines antenne

(4) (5) (6) (5) (7)

au grande barque demi-pontéeXVe , gréant deux voiles latines sur antennes …

SujetGV

COD CCCC

Sense Naming 6/10Extraction grade

Page 22: Definition Clustering, Sense Naming & Lexical Augmentation

12 ddM A −=

1d

MM A

R =

3d

MR R

NS =

absolute margin

relative margin

risk of ‘non-sens’

Sense Naming 7/10

Disambiguation capacity 1/2

frégate vaisseau

w.3(navire moderne)

w.2(navire ancien) t.12

(sanguin)

t.11(navire)(oiseau)

w.1

Ma = d1 - d2 = 0,1

Mr = 0,1 / d1= 0.33

Rns = d3 / 0,33= 0.6

0,95

1,2

0,8

0,85

0,3= d1

0,4= d2

0,2= d3

Page 23: Definition Clustering, Sense Naming & Lexical Augmentation

Sense Naming 8/10

Disambiguation capacity 2/2

frégate vaisseau

w.3(navire moderne)

w.2(navire ancien) t.12

(sanguin)

t.11(navire)(oiseau)

w.1

Ma = d1 - d2 = 0,1

Mr = 0,1 / d1= 0.33

Rns = d3 / 0,33= 0.6

0,95

1,2

0,8

0,85

0,3= d1

0,4= d2

0,2= d3

frégate voilier

w.3(navire moderne)

w.2(navire ancien) t.12

(navire)

t.11(oiseau)(oiseau)

w.1

Ma = d1 - d2 = 0,04

Mr = 0,04 / d1= 0,16

Rns = d3 / 0,16= 4

0,3

0,7

0,29 = d2

0,72

0,72

0.25 = d1

0,65= d3

Page 24: Definition Clustering, Sense Naming & Lexical Augmentation

survey

- collocations (botte de paille, …)

- co-occurrences (Tintin Milou)

- synonyms and hyperonyms(manger se nourrir, mouche insecte animal)

- domain / context for technical terms(médecine, architecture, agriculture, sport, …)

Done for 13 terms totalizing 38 definitions 134 answers

Sense Naming 9/10Cognitive cost

Church Daille Véronis

Page 25: Definition Clustering, Sense Naming & Lexical Augmentation

‘botte’

- multi-criteria approach seems adapted- easily extensible- strong precision

- enhancement needed for meta-language processing- criteria implementation

(associative memory, lexical functions )

- synthesis grammar (botte/secret/ vs. botte/secrète/)

Useful for multilingual lexical databases

Sense Naming 10/10Results

Mel’cukSchwab

Page 26: Definition Clustering, Sense Naming & Lexical Augmentation

Multilingual Lexical DatabaseSome terms are not lexicalized in some language

Objectivelexicalize these terms

Lexical Augmentation

Page 27: Definition Clustering, Sense Naming & Lexical Augmentation

abats

giblets

offal.1

FRANCAIS ENGLISHACCEPTIONS

abats offal

giblets

offal.2

refuse refuse scrapdéchet

abats de volaille

abats de bœuf

abats de porc

beef offal

porc offal

Lexical Augmentation 1/2Papillon projectBoitet LepageMangot-Lerebours Sérasset

Page 28: Definition Clustering, Sense Naming & Lexical Augmentation

• Extraction from definition and sense mane (glosses of dictionaries) abats = {‘porc’, ‘volaille’, ‘bœuf’, …}

• Patterns‘abats de volaille’, ‘abats en volaille’, …

• Patterns validation with co-occurrencesrelative number de hits in Google

• Difficulties ‘dog meat’ ‘viande pour chien’ / ‘viande de chien’ ?

Lexical Augmentation 2/2Procedure

Page 29: Definition Clustering, Sense Naming & Lexical Augmentation

Clustering• promissing results

manual evaluation on 100 difficult terms, 70 % of proper clusters, 30 % of bad affectation locutions

• pb to increase the cluster number maturing of the basic clusters

Sens Naming complementary with conceptual vectors• Good precision

manual evaluation 90 % of pertinent termsautomatic evaluation 70 % (angular distance)

• Towards a synthesis grammar botte/secret/ botte/secrète/

Future works• More criteria

(associative memory, more lexical functions)• Enhance definition analysis (meta-language)

Conclusion

Page 30: Definition Clustering, Sense Naming & Lexical Augmentation

Theoricformalisation de la ‘capacité de désambiguïsation’ et du ‘risque de non-sens’formalisation de l’annotation en sémantique lexicaleproposition d’une mesure de similarité générique entre définitions

Praticalimplémentation sous forme d’agentscatégorisation, nommage (services sur la Toile)augmentation lexicale (en cours)

Diffusionun poster à RECITAL’2003 (Batz sur Mer – 10 – 14 juin 2003)un article à Papillon’2003 (Sapporo – 2 – 6 juillet 2003)soumission pour RFIA’2004

Contribution

Page 31: Definition Clustering, Sense Naming & Lexical Augmentation

Thank you