Upload
mirit
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Mathieu LAFOURCADE [email protected]. Fabien JALABERT [email protected]. Definition Clustering, Sense Naming & Lexical Augmentation. Study context 1/2. Natural Language Processing Lexical Semantics - WSD - Document indexing - PowerPoint PPT Presentation
Citation preview
Definition Clustering,Sense Naming
&Lexical Augmentation
Fabien [email protected]
Mathieu [email protected]
Natural Language Processing
• Lexical Semantics - WSD - Document indexing
• Dictionary construction and vectorization pb extracting definition meta-language example : ‘cannibale’ = ‘qui mange l’Homme en parlant de l’Homme’ themes : homme, manger, rhétorique
• Multi-source approach noise reduction problem : atom element = definition ≠ sense
• Objectives- clustering definitions to obtain senses- naming these senses
Study context 1/2
Term Tdef 1 - Source 1
def 2 - Source 1
def 3 - Source 1
def 1 - Source 2
def 2 - Source 2
def 1 - Source 3
def 2 - Source 3
def 1 - Source 1
Catégorie 1Sense 1
Sense 2
def 2 - Source 1
def 2 - Source 2
def 1 - Source 3
Sense 3def 3 - Source 1
def 1 - Source 2
def 2 - Source 3
Clustering
Multi-source base
‘Acception’ or sense base
Sense naming
Sense 2 – Name
Sense 1 – Name
Sense 2 – Name
Re-injection as new lexical source
t1
t2
t3
t4
t5
t6
tn
Study context 2/2
• Model, Construction, Organization
• Definition Clustering• Sense Naming• Lexical Augmentation
• Results
Summary
• An idea = a vector
• A vector component = a primitive as defined in a Th.– Thesaurus Larousse : 873 concepts
– Concepts are inter-related
Generator space
• A definition a vector
Conceptual Vector Model 1/2
arme
transports maritimes et fluviauxoiseau
Most activated primitives for ‘frégate’ :(oiseau 6134) (transports maritimes et fluviaux 5644) (arme 4891) …
Salton Deerwester
Chauché Lafourcade
Thematicaly terms close to ‘frégate’ :(destroyer 0.2246) (youyou 0.2267) (voilier 0.2268) (contre-torpilleur 0.2274) (chlamydère 0.2276) (oiseau-jardinier 0.2295) (trois-mâts 0.233) …
Thematicaly terms close to ‘frégate/oiseau/’ :(oiseau-jardinier 0.1237) (plumeur 0.1319) (goglu 0.136) (travailleur 0.136)(chlamydère 0.1385) (penne 0.141) (Galliformes 0.1422) (agami 0.1428) …
Thematicaly terms close to‘frégate/bateau/’ :(démâtage 0.1604) (dégréer 0.1676) (naval 0.1718) (bateau-piège 0.1774)
(bateau-vanne 0.1821) (batelet 0.1824) …
Conceptual Vector Model 2/2
xy
Thematic distance = angle between two vectors
SYGMART
la petite brise la glace
le petit briser le glace
GN – Gouv - adj GV - Gouv GN – Gouv - nf
9GN
8briser
7GV
6petit
5le
4GN
11glace
10le
3PH
2PHAMBG
1
12.
14GN
16GA
15le
18brise
17petit
22glacer
20GN
19GV
21le
23.
13PH
Definition Vector ComputationChauché
Learning agents : Sygmart, computation of vectors from definition, synonymy, antonymy, …
Multi-Agent OrganizationDouble-loop
Lecerf Schwab
Endogenous loop
Exogenous loop
Other agents (society)
Agent
Grouping definitions into senses
Clustering
Objective
• Deep analysis - several criteria• No training (but enhancement through exogenous loop)
• Frontier between senses and definitions
- Centroïd approach
- Heuristics (preferences) - cluster number = nb max of definitions in dictionaries- two definitions of a same source two different clusters
Clustering 1/5Strategy
Chaussure montante(quel qu'en soit l'usage )
Coup porté(en escrime
ou non)
Distinction entre"le coup en escrime"et "l'attaque surprise"
réunion devégétaux
Distinction entre"chaussure élégante" et"chaussure tout-terrain"
Clustering 2/5Difficulty
‘botte’
• Source by source iterationuntil obtaining a min value distribution
Affectation of min. value source/cluster From a distance matrix : Hungarian method – O(n3)
Clustering 3/5Algorithm 1/2
Kuhn Ford, Fulkerson
• For each criteriaone evaluationone distance matrix
• CriteriaComparing lexical contents of definitions
(with term frequency, co-occurrences, etc.)
Angular distanceSymbolic markers
- morphology- etymology ( ‘avocat’ : ‘ahuacatl’ / ‘advocatus’ )
- use (‘vieux’ , ‘ancien’, ‘poétique’ … )
- language level (‘argot’, ‘familier’, … )
- domain (‘médecine’, ‘zoologie’, … )
Clustering 4/5Algorithm 2/2
We would like to designate meanings
‘botte’
Correct results in many cases90 % for nouns, 70 % for verbs - to be done for adj
Pb with very strong polysemy vagueness, continuity in meanings
support verb: ‘prendre’,…
Study augmentation of cluster number
Clustering 5/5Results
Sense Naming
Objective
To give the system some capacity to « talk about a sense »
• Dictionary independent• Interface (man-system & system-system)
• A new lexical source looping :-)
Semantic annotation
La frégate/vaisseau/ naviguait à travers
les océans
La frégate/oiseau/ planait à travers les nues en poussant
son cri incomparable
Sense Naming 1/10Properties
1. Extraction
2. Validation and dispatching of polysem bags bijection
3. Evaluation of candidates
ordering and extracting the most appropriate ones
Sense Naming 2/10Procedure
• Extraction attached to a meaning– Morpho-syntactic analysis of the definition– Extraction of markers : « anc. », « méd. », …– Extraction from unstructured or semi-structured data (XML…)
‘frégate’ : [nf] [ancien] Au XVe s., grande barque demi-pontée gréant deux voiles latines sur antenne et assurant la liaison entre les ports et les escadres de galère. [Club Internet]
• Extraction from polysem bags– Word list (like synonym list of Université de Caen : )
Sense Naming 3/10Extraction
Ploux, Victori
ex: ‘botte’ = chaussure, bottillon, coup, attaque, amas, bouquet,…
Bijection being able to re-associate the proper meaning
ƒ : (term, sense) (term, annotation)
ƒ-1 : (term, annotation) (term, sense)
Sense Naming 4/10
• A candidate associated to a sense should be closer of its own sensethan any other
• Unattached candidates are associated to the closest meaning
• A candidate should not be present in a concurrent definition
),(),(, jAiAij saDsaDss ≤≠∀
Validation
• Extraction grade
• Evaluating the capacity to disambiguate (to distinguish a sense from all others)
• Evaluating the capacity to associateCognitive cost reduction
Sense Naming 5/10Evaluation
Prince
‘frégate’ : [nf] [ancien] Au XVe s., grande barque demi-pontée gréant deux voiles latinessur antenne et assurant la liaison entre les ports et les escadres de galère. [Club Internet]
XVe grande barque demi-pontée barque demi-pontée
(6) (2) (1) (3) (1)
gréant voiles latines voiles latines antenne
(4) (5) (6) (5) (7)
au grande barque demi-pontéeXVe , gréant deux voiles latines sur antennes …
SujetGV
COD CCCC
Sense Naming 6/10Extraction grade
12 ddM A −=
1d
MM A
R =
3d
MR R
NS =
absolute margin
relative margin
risk of ‘non-sens’
Sense Naming 7/10
Disambiguation capacity 1/2
frégate vaisseau
w.3(navire moderne)
w.2(navire ancien) t.12
(sanguin)
t.11(navire)(oiseau)
w.1
Ma = d1 - d2 = 0,1
Mr = 0,1 / d1= 0.33
Rns = d3 / 0,33= 0.6
0,95
1,2
0,8
0,85
0,3= d1
0,4= d2
0,2= d3
Sense Naming 8/10
Disambiguation capacity 2/2
frégate vaisseau
w.3(navire moderne)
w.2(navire ancien) t.12
(sanguin)
t.11(navire)(oiseau)
w.1
Ma = d1 - d2 = 0,1
Mr = 0,1 / d1= 0.33
Rns = d3 / 0,33= 0.6
0,95
1,2
0,8
0,85
0,3= d1
0,4= d2
0,2= d3
frégate voilier
w.3(navire moderne)
w.2(navire ancien) t.12
(navire)
t.11(oiseau)(oiseau)
w.1
Ma = d1 - d2 = 0,04
Mr = 0,04 / d1= 0,16
Rns = d3 / 0,16= 4
0,3
0,7
0,29 = d2
0,72
0,72
0.25 = d1
0,65= d3
survey
- collocations (botte de paille, …)
- co-occurrences (Tintin Milou)
- synonyms and hyperonyms(manger se nourrir, mouche insecte animal)
- domain / context for technical terms(médecine, architecture, agriculture, sport, …)
Done for 13 terms totalizing 38 definitions 134 answers
Sense Naming 9/10Cognitive cost
Church Daille Véronis
‘botte’
- multi-criteria approach seems adapted- easily extensible- strong precision
- enhancement needed for meta-language processing- criteria implementation
(associative memory, lexical functions )
- synthesis grammar (botte/secret/ vs. botte/secrète/)
Useful for multilingual lexical databases
Sense Naming 10/10Results
Mel’cukSchwab
Multilingual Lexical DatabaseSome terms are not lexicalized in some language
Objectivelexicalize these terms
Lexical Augmentation
abats
giblets
offal.1
FRANCAIS ENGLISHACCEPTIONS
abats offal
giblets
offal.2
refuse refuse scrapdéchet
abats de volaille
abats de bœuf
abats de porc
beef offal
porc offal
Lexical Augmentation 1/2Papillon projectBoitet LepageMangot-Lerebours Sérasset
• Extraction from definition and sense mane (glosses of dictionaries) abats = {‘porc’, ‘volaille’, ‘bœuf’, …}
• Patterns‘abats de volaille’, ‘abats en volaille’, …
• Patterns validation with co-occurrencesrelative number de hits in Google
• Difficulties ‘dog meat’ ‘viande pour chien’ / ‘viande de chien’ ?
Lexical Augmentation 2/2Procedure
Clustering• promissing results
manual evaluation on 100 difficult terms, 70 % of proper clusters, 30 % of bad affectation locutions
• pb to increase the cluster number maturing of the basic clusters
Sens Naming complementary with conceptual vectors• Good precision
manual evaluation 90 % of pertinent termsautomatic evaluation 70 % (angular distance)
• Towards a synthesis grammar botte/secret/ botte/secrète/
Future works• More criteria
(associative memory, more lexical functions)• Enhance definition analysis (meta-language)
Conclusion
Theoricformalisation de la ‘capacité de désambiguïsation’ et du ‘risque de non-sens’formalisation de l’annotation en sémantique lexicaleproposition d’une mesure de similarité générique entre définitions
Praticalimplémentation sous forme d’agentscatégorisation, nommage (services sur la Toile)augmentation lexicale (en cours)
Diffusionun poster à RECITAL’2003 (Batz sur Mer – 10 – 14 juin 2003)un article à Papillon’2003 (Sapporo – 2 – 6 juillet 2003)soumission pour RFIA’2004
Contribution
Thank you