Word Sense Disambiguation German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics

Word Sense DisambiguationWord Sense Disambiguation

German Rigau i ClaramuntGerman Rigau i Claramunt

http://www.lsi.upc.es/~rigauhttp://www.lsi.upc.es/~rigauTALP Research CenterTALP Research Center

Departament de Llenguatges i Sistemes Departament de Llenguatges i Sistemes InformàticsInformàtics

Universitat Politècnica de CatalunyaUniversitat Politècnica de Catalunya

2

– SettingSetting– Unsupervised WSD systemsUnsupervised WSD systems– Supervised WSD systemsSupervised WSD systems– Using the Web and EWN to WSDUsing the Web and EWN to WSD

WSDWSD

OutlineOutline

3

Word Sense DisambiguationWord Sense Disambiguation– is the problem of assigning the appropriate is the problem of assigning the appropriate

meaning (sense) to a given word in a textmeaning (sense) to a given word in a text– ““WSD is perhaps the great open problem at WSD is perhaps the great open problem at

the lexical level of NLP” (Resnik & Yarowsky the lexical level of NLP” (Resnik & Yarowsky 97)97)

– WSD resolution would allowWSD resolution would allow:: acquisition of subcategorisation acquisition of subcategorisation

structure: parsingstructure: parsing improve existing Information Retrievalimprove existing Information Retrieval Machine TranslationMachine Translation Natural Language UnderstandingNatural Language Understanding

Using the Web and EWN for WSDUsing the Web and EWN for WSD

SettingSetting

4

ExampleExample

– Senses: Senses: (WordNet 1.5.)(WordNet 1.5.) age 1age 1: the length of time something (or someone) has : the length of time something (or someone) has

existed; "his age was 71"; "it was replaced because of existed; "his age was 71"; "it was replaced because of its age" its age"

age 2age 2: a historic period; "the Victorian age"; "we live : a historic period; "the Victorian age"; "we live in a litigious age”in a litigious age”

– DSO Corpora examples: DSO Corpora examples: (Ng 96)(Ng 96) He was mad about stars at the >> He was mad about stars at the >> age 1age 1 << of nine . << of nine . About 20,000 years ago the last ice >> About 20,000 years ago the last ice >> age 2age 2 << <<

ended . ended .


SettingSetting

5

Knowledge-Driven WSD Knowledge-Driven WSD (Unsupervised)(Unsupervised)– knowledge-based WSDknowledge-based WSD– 100% coverage100% coverage– 55% accuracy (SensEval-1)55% accuracy (SensEval-1)– No Training ProcessNo Training Process– Large scale lexical knowledge Large scale lexical knowledge

resourcesresources WordNetWordNet MRDsMRDs ThesaurusThesaurus


SettingSetting

6

Data-Driven (Supervised)Data-Driven (Supervised)– corpus-based WSDcorpus-based WSD– statistical-based WSDstatistical-based WSD– Machine-Learning WSDMachine-Learning WSD– no full coverageno full coverage– 75% accuracy (SensEval-1)75% accuracy (SensEval-1)– Training ProcessTraining Process

learning from large amount of sense learning from large amount of sense annotated corporaannotated corpora

(Ng 97) effort of 16 man/year per year (Ng 97) effort of 16 man/year per year per languageper language


SettingSetting

UnsupervisedUnsupervisedWord Sense DisambiguationWord Sense DisambiguationSystemsSystems





8

– SettingSetting– Knowledge-driven WSD methodsKnowledge-driven WSD methods

MRDsMRDs Thesauri & CorpusThesauri & Corpus LKBsLKBs

– LKBs & Conceptual DistanceLKBs & Conceptual Distance– LKBs & Conceptual DensityLKBs & Conceptual Density

LKBs & CorpusLKBs & Corpus

– Experiments: Genus Sense Experiments: Genus Sense DisambiguationDisambiguation

– Future WorkFuture Work

Unsupervised WSD SystemsUnsupervised WSD Systems

OutlineOutline

9

Knowledge-Driven (Unsupervised)Knowledge-Driven (Unsupervised)

+ No Need of large anotated corpora + No Need of large anotated corpora

+ Tested on unrestricted domains + Tested on unrestricted domains (words and senses)(words and senses)

- Worst results- Worst results


SettingSetting

10

Lesk MethodLesk Method– (Lesk 86)(Lesk 86)

Counting word overlaping between Counting word overlaping between context and senses of the wordcontext and senses of the word

– (Cowie et al. 92)(Cowie et al. 92) simulated annealing for overcomming the simulated annealing for overcomming the

combinatorial explosion using LDOCEcombinatorial explosion using LDOCE

– (Wilks & Stevenson 97)(Wilks & Stevenson 97) simulated annealingsimulated annealing 57% accuracy at a sense level57% accuracy at a sense level


MRDsMRDs

11

Coocurrence Word VectorsCoocurrence Word Vectors– (Wilks et al. 93)(Wilks et al. 93)

word-context vectors from LDOCEword-context vectors from LDOCE testing large set of relateness functionstesting large set of relateness functions 13 senses of word bank13 senses of word bank 45% accuracy45% accuracy

– (Rigau et al. 97)(Rigau et al. 97) (Noun) Genus Sense Disambiguation(Noun) Genus Sense Disambiguation 60% accuracy60% accuracy


MRDsMRDs

12

371.616 conexions371.616 conexions

11.8004 11.8004 9.8 9.8 16 16 elaborado elaborado queso queso 35 35 11311310.8938 10.8938 8.0 8.0 23 23 pasta pasta queso queso 178 178 11311310.4846 10.4846 7.5 7.5 25 25 leche leche queso queso 274 274 11311310.2483 10.2483 9.2 9.2 13 13 oveja oveja queso queso 45 45 1131139.1513 9.1513 7.6 7.6 16 16 queso queso sabor sabor 113 113 1601607.4956 7.4956 8.3 8.3 8 8 queso queso tortilla tortilla 113 113 51516.7732 6.7732 7.5 7.5 8 8 queso queso vaca vaca 113 113 84846.5830 6.5830 6.1 6.1 12 12 maíz maíz queso queso 347 347 1131136.2208 6.2208 8.9 8.9 5 5 queso queso suero suero 113 113 21216.1509 6.1509 8.8 8.8 5 5 mantequilla mantequilla queso queso 22 22 1131136.1474 6.1474 7.9 7.9 6 6 compacta compacta queso queso 50 50 1131135.9918 5.9918 7.7 7.7 6 6 picante picante queso queso 55 55 1131135.9002 5.9002 9.8 9.8 4 4 manchego manchego queso queso 9 9 1131135.6805 5.6805 7.3 7.3 6 6 cabra cabra queso queso 75 75 1131135.6300 5.6300 5.9 5.9 9 9 pan pan queso queso 287 287 113113


MRDsMRDs

13

– (Yarowsky 92)(Yarowsky 92) uses Roget’s Thesaurus to partitionuses Roget’s Thesaurus to partition Grolier’s EnciclopediaGrolier’s Enciclopedia 1042 categories1042 categories 92% accuracy for 12 polysemous words92% accuracy for 12 polysemous words

– (Yarowsky 95)(Yarowsky 95) seed wordsseed words

– (Liddy & Paik 92)(Liddy & Paik 92) subject-code correlation matrixsubject-code correlation matrix 122 LDOCE semantic codes122 LDOCE semantic codes 166 sentences of Wall Street Journal166 sentences of Wall Street Journal 89% correct subject code89% correct subject code


Thesauri & CorpusThesauri & Corpus

14

– (Rada et al. 92)(Rada et al. 92) length of the shortest path length of the shortest path

– (Sussna 93)(Sussna 93)– (Agirre et al. 94)(Agirre et al. 94)

(Rigau 94; Rigau et al. 95, 97; Atserias et (Rigau 94; Rigau et al. 95, 97; Atserias et al. 97)al. 97)

length of the shortest pathlength of the shortest path specificity of the conceptsspecificity of the concepts


LKBs & Conceptual LKBs & Conceptual DistanceDistance

),(21

2122

11 )(

1),(

iiki

i ccpathc kwcwc cdepth

minwwdist

15

– (Agirre & Rigau 95, 96)(Agirre & Rigau 95, 96)


LKBs & Conceptual LKBs & Conceptual DensityDensity

Word to be disambiguated: W Context words: w1 w2 w3 w4 ...

W

sense1sense2

sense3

sense4

16

– (Agirre & Rigau 95, 96)(Agirre & Rigau 95, 96) length of the shortest path length of the shortest path the depth in the hierarchythe depth in the hierarchy concepts in a dense part of the hierarchy concepts in a dense part of the hierarchy

are relatively closer than those in a more are relatively closer than those in a more sparse region.sparse region.

the measure should be independent of the the measure should be independent of the number of concepts involvednumber of concepts involved


LKBs & Conceptual LKBs & Conceptual DensityDensity

CD(c,m) nhypi

i0

m 1

descendantsc

17

– (Resnik 95)(Resnik 95) Information ContentInformation Content

– (Richardson et al. 94)(Richardson et al. 94)– (Jiang & Conrath 97)(Jiang & Conrath 97)


LKBs & CorpusLKBs & Corpus

18

– Unsupervised WSDUnsupervised WSD– Unrestricted WSD (coverage 100%)Unrestricted WSD (coverage 100%)– Eight Heuristics (McRoy 92)Eight Heuristics (McRoy 92)– Combining several lexical resourcesCombining several lexical resources– Combining several methods Combining several methods


Experiments: Genus Sense Experiments: Genus Sense DisambiguationDisambiguation

19

0) Monosemous Genus Term0) Monosemous Genus Term

1) Entry Sense Ordering1) Entry Sense Ordering

2) Explicit Semantic Domain2) Explicit Semantic Domain

3) Word Matching (Lesk 86)3) Word Matching (Lesk 86)

4) Simple Concordance4) Simple Concordance

5) Coocurrence Word Vectors5) Coocurrence Word Vectors

6) Semantic Vectors6) Semantic Vectors

7) Conceptual Distance7) Conceptual Distance



20

Polysemous OverallHeuristic 1: Monosemous Genus Term - - 100% 16%Heuristic 2: Entry Sense Ordering 70% 100% 75% 100%Heuristic 3: Explicit Semantic Domain 100% 1% 100% 2%Heuristic 4: Word Matching 72% 61% 79% 56%Heuristic 5: Simple Concordance 57% 100% 65% 95%Heuristic 6: Cooccurrence Vectors 60% 100% 66% 97%Heuristic 7: Semantic Vectors 58% 99% 63% 94%Heuristic 8: Conceptual Distance 49% 95% 57% 89%Ensambling 79% 100% 83% 100%

–Results:Results:



21

Overall- Heuristic 1: Monosemous Genus Term 79% 100%- Heuristic 2: Entry Sense Ordering 72% 100%- Heuristic 3: Explicit Semantic Domain 82% 98%- Heuristic 4: Word Matching 81% 100%- Heuristic 5: Simple Concordance 81% 100%- Heuristic 6: Cooccurrence Vectors 81% 100%- Heuristic 7: Semantic Vectors 81% 100%- Heuristic 8: Conceptual Distance 77% 100%Ensambling 83% 100%

–Knowledge provided by each heuristic:Knowledge provided by each heuristic:



SupervisedSupervisedWord Sense DisambiguationWord Sense DisambiguationSystemsSystems





23

– SettingSetting– MethodologyMethodology– Machine Learning algorithmsMachine Learning algorithms

Naive Bayes (Mooney 98)Naive Bayes (Mooney 98) Snow (Dagan et al. 97) Snow (Dagan et al. 97) Exemplar-based (Ng 97)Exemplar-based (Ng 97) LazyBoosting (Escudero et al. 00)LazyBoosting (Escudero et al. 00)

– Experimental ResultsExperimental Results Naive Bayes vs. Exemplar BasedNaive Bayes vs. Exemplar Based Portability and Tuning of Supervised WSDPortability and Tuning of Supervised WSD


WSD using ML algoritmsWSD using ML algoritms

OutlineOutline

24

Data-Driven (Supervised)Data-Driven (Supervised)

+ Better results+ Better results

- Need of large corpora - Need of large corpora knowledge adquisition bottleneck knowledge adquisition bottleneck (Gale et al. 93, Ng 97)(Gale et al. 93, Ng 97)

- Tested on limited domains (words - Tested on limited domains (words and senses)and senses)

WSD using ML algorithmsWSD using ML algorithms

SettingSetting

25

Current research lines: “open the bottleneck”Current research lines: “open the bottleneck”

• Design of efficient example sampling Design of efficient example sampling methods (Engelson & Dagan 96; Fujii et al. methods (Engelson & Dagan 96; Fujii et al. 98)98)

• Use of WordNet and Web to automatically Use of WordNet and Web to automatically obtain examples (Leacock et al. 98; Mihalcea obtain examples (Leacock et al. 98; Mihalcea & Moldovan 99)& Moldovan 99)

• Use of unsupervised methods for estimating Use of unsupervised methods for estimating parameters (Pedersen & Bruce 98)parameters (Pedersen & Bruce 98)


SettingSetting

26

• Contradictory Previous Work:Contradictory Previous Work:

• (Mooney, 98)(Mooney, 98)+ + tt-student test of significance-student test of significance+ + nn-fold cross-validation-fold cross-validation- Word “line” with 4,149 examples and 6 senses - Word “line” with 4,149 examples and 6 senses (Leacock et al. 93).(Leacock et al. 93).- Neither parameter setting nor algorithm tunning- Neither parameter setting nor algorithm tunning

• (Ng 97)(Ng 97)+ Large corpora (192,800 occurrences of 191 + Large corpora (192,800 occurrences of 191 words)words)- Direct Test (No - Direct Test (No nn-fold crossvalidation).-fold crossvalidation).- Small set of features.- Small set of features.


SettingSetting

27






OutlineOutline

28

• Main goals:Main goals:• Study supervised methods for WSDStudy supervised methods for WSD• Use it with “Automatically Extracted Examples Use it with “Automatically Extracted Examples from the Web using WordNet”from the Web using WordNet”• Rigorous direct comparisonsRigorous direct comparisons

• Supervised WSD MethodsSupervised WSD Methods• Naive Bayes Naive Bayes

State-of-the-art accuracy (Mooney 98)State-of-the-art accuracy (Mooney 98)• SnowSnow

From “Text Categorization” (Dagan et al. 97) From “Text Categorization” (Dagan et al. 97) • Exemplar-basedExemplar-based

State-of-the-art accuracy (Ng 97)State-of-the-art accuracy (Ng 97)• BoostingBoosting

From “Text Categorization” (Schapire & Singer From “Text Categorization” (Schapire & Singer “to appear”, Escudero, Màrquez & Rigau “to appear”, Escudero, Màrquez & Rigau 2000)2000)

WSD using ML algorithms WSD using ML algorithms

MethodologyMethodology

29

• Evaluation (Dietterich 98)Evaluation (Dietterich 98)• 10-fold crossvalidation10-fold crossvalidation• tt-student test of significance-student test of significance

• DataData• LDC (Ng 96)LDC (Ng 96)

192,800 occurrences of 191 words 192,800 occurrences of 191 words (121 nouns +70 verbs)(121 nouns +70 verbs)Avg. Number of senses: 7.2 N, 12.6 V, 9.2 (all)Avg. Number of senses: 7.2 N, 12.6 V, 9.2 (all)

• WSJ Corpus (Corpus A)WSJ Corpus (Corpus A)• Brown Corpus (Corpus B)Brown Corpus (Corpus B)

• Sets of attributesSets of attributes• Set A (Ng 97)Set A (Ng 97)

Small set of featuresSmall set of featuresNo broad-context attributesNo broad-context attributes

• Set B Set B (Ng 96) (Ng 96)Large set of featuresLarge set of featuresBroad-context attributesBroad-context attributes

WSD using ML algorithms WSD using ML algorithms

MethodologyMethodology

30






OutlineOutline

31

• Based on Bayes Theorem (Duda & Hart Based on Bayes Theorem (Duda & Hart 73)73)• Frequencies used as probabilitiesFrequencies used as probabilities• Assumed independence of example Assumed independence of example featuresfeatures

• Smoothing technique (Ng 97)Smoothing technique (Ng 97)

examples trainingofnumber :NN

CPCvP i

ij


Naive BayesNaive Bayes

32

• kk-NN approach (Ng 96; Ng 97)-NN approach (Ng 96; Ng 97)

• DistancesDistances• HammingHamming• Modified Value Difference MetricModified Value Difference Metric

MVDM (Cost & Salzberg 93)MVDM (Cost & Salzberg 93)• VariantsVariants

• Example weightingExample weighting• Attribute weighting (RLM 91)Attribute weighting (RLM 91)

k=3


Exemplar-based WSDExemplar-based WSD

33

• Snow (Golding & Roth 99) Snow (Golding & Roth 99) • Sparse Network of WinowsSparse Network of Winows• On-line learning systemOn-line learning system• Winow (Littlestone 88) Winow (Littlestone 88)

• linear threshold linear threshold

•mistake-driven (when predicted class is mistake-driven (when predicted class is wrong)wrong)


Snow Snow

34


Snow Snow

ww-1-1= = “average”“average” ww+2+2=“42”=“42” ww+1+1=“of”=“of” ww+2+2=“nuclear”=“nuclear”

Winow Sense 1Winow Sense 1 Winow Sense 2Winow Sense 2

MAXMAX

... an average <age_1> of 42 ...... an average <age_1> of 42 ... ... in this <age_2> of nuclear ...... in this <age_2> of nuclear ...

wwff

35

• AdaBoost.MH (Freund & Shapire’00)AdaBoost.MH (Freund & Shapire’00)• Combine many simple weak classifiers Combine many simple weak classifiers (hypothesis)(hypothesis)• Weak classifiers are trained sequenciallyWeak classifiers are trained sequencially• Each iteration concentrate on the most difficult Each iteration concentrate on the most difficult casescases

+ Results: Better than NB and EB+ Results: Better than NB and EB- Problem: Computational Complexity- Problem: Computational Complexity

Time and space grow linearly with number of Time and space grow linearly with number of examples.examples.

+ Solution: LazyBoosting!+ Solution: LazyBoosting!


BoostingBoosting

36






OutlineOutline

37

Features from Set A (Ng 97):Features from Set A (Ng 97):ww-2-2, w, w-1 -1 , w, w+1+1, w, w+2 +2 , (w, (w-2-2, w, w-1-1), (w), (w-1 -1 , w, w+1+1), (w), (w+1+1, w, w+2+2))

15 reference words (10 N, 5 V)15 reference words (10 N, 5 V)AverageAverage

nsns exex attatt

nouns (121)nouns (121) 8.68.6 10401040 39783978

verbs (70)verbs (70) 17.917.9 12661266 44324432

total (191)total (191) 12.112.1 11151115 41504150

Accuaracy %Accuaracy %

MFSMFS NBNB EB1EB1 EB15EB15 AB750AB750 ABSCABSC

nouns (121)nouns (121) 57.457.4 71.771.7 65.865.8 71.171.1 73.573.5 73.473.4

verbs (70)verbs (70) 46.646.6 57.657.6 51.151.1 58.158.1 59.359.3 59.159.1

total (191)total (191) 53.353.3 66.466.4 60.260.2 66.266.2 68.168.1 68.068.0


Experimental Results Experimental Results (LazyBoosting)(LazyBoosting)

38

Accelerating the WeakLearnerAccelerating the WeakLearner– Reducing Feature SpaceReducing Feature Space

Frequency filtering (Freq)Frequency filtering (Freq)– Discard those features occourring less than N timesDiscard those features occourring less than N times

Local frequency filtering (LFreq)Local frequency filtering (LFreq)– Selects the N most freqeunt features of each senseSelects the N most freqeunt features of each sense

RLM ranking (López de Mantaras 91)RLM ranking (López de Mantaras 91)– Selects the N most relevant featuresSelects the N most relevant features

– Reducing the number of Attributes examinedReducing the number of Attributes examined LazyBoostingLazyBoosting

– A small proportion of attributes are randomly A small proportion of attributes are randomly selected at each iterationselected at each iteration



39

Accelerating the WeakLearnerAccelerating the WeakLearner– All methods perform quite wellAll methods perform quite well

many irrelevant attributes in the domainmany irrelevant attributes in the domain

– LFreq is slghly better than FreqLFreq is slghly better than Freq– RLM performs better than LFreq and FreqRLM performs better than LFreq and Freq

– LazyBoosting is better than all other LazyBoosting is better than all other methodsmethods

acceptable performance with 1% of exploration when acceptable performance with 1% of exploration when looking for a weak rule.looking for a weak rule.

10% achieves the same performance than 100%10% achieves the same performance than 100% 7 times faster!7 times faster!



40

7 features from Set A (Ng 97):7 features from Set A (Ng 97):ww-2-2, w, w-1 -1 , w, w+1+1, w, w+2 +2 , (w, (w-2-2, w, w-1-1), (w), (w-1 -1 , w, w+1+1), (w), (w+1+1, w, w+2+2))

15 reference words (10 N, 5 V)15 reference words (10 N, 5 V)AverageAverage

nsns exex attatt

nouns (121)nouns (121) 8.68.6 10401040 39783978

verbs (70)verbs (70) 17.917.9 12661266 44324432

total (191)total (191) 12.112.1 11151115 41504150

Accuaracy %Accuaracy %

MFSMFS NBNB EB15EB15 LB10SCLB10SC

nounsnouns (121)(121) 56.456.4 68.768.7 68.068.0 70.870.8

verbs (70)verbs (70) 46.746.7 64.864.8 64.964.9 67.567.5

total (191)total (191) 52.352.3 67.167.1 66.766.7 69.569.5



41

Experiments on Set A with 15 words:Experiments on Set A with 15 words:• ResultsResults

• Conclusions:Conclusions:• NB and EB are better than MFSNB and EB are better than MFS• kk-NN performs better with k>1-NN performs better with k>1• Variants of EB improve the EBVariants of EB improve the EB• MVDM(MVDM(cscs) metric is better than Hamming distance) metric is better than Hamming distance• EB performs better than NBEB performs better than NB

MFS NB EBh,1 h,7 h,15,e h,7,a h,7,e,a cs,1 cs,10 cs,10,e

nouns 57.4 71.7 65.8 70.0 71.1 72.1 72.6 70.6 73.6 73.7verbs 46.6 57.6 51.1 56.3 58.1 56.4 58.1 55.9 60.3 60.5all 53.3 66.4 60.2 64.8 66.2 66.1 67.2 65.0 68.6 68.7

time 00:07 00:08 00:11 09:56


Experimental Results (NB vs EB)Experimental Results (NB vs EB)

42

Experiments on Set B with 15 words:Experiments on Set B with 15 words:• ResultsResults

What happened?What happened?• Problem with the binary representation of the broad-context attributes.Problem with the binary representation of the broad-context attributes.

• Examples are represented with sparse vectors (5,000 positions).Examples are represented with sparse vectors (5,000 positions).• Two examples coincide in the majority of values.Two examples coincide in the majority of values.• Biases the similarity measure in favour of shortest sentences.Biases the similarity measure in favour of shortest sentences.

Related work “Clarified”:Related work “Clarified”:(Mooney 98)(Mooney 98)• Poor results of Poor results of kk-NN algorithm-NN algorithm(Ng 96; Ng 97)(Ng 96; Ng 97)• Lower results of a system with a large number of attributesLower results of a system with a large number of attributes

MFS NB EB (h,15)nouns 57.4 72.2 64.3verbs 46.6 55.2 43.0all 53.3 65.8 56.2

time 16:13 06:04



43

Improving both methods (NB and EB) (Escudero et al. 00b)Improving both methods (NB and EB) (Escudero et al. 00b)• Use only positive informationUse only positive information• Treat the broad-context attributes as multivalued attributesTreat the broad-context attributes as multivalued attributes

Let two values:Let two values:

The similarity The similarity SS between two values has to be redefined between two values has to be redefined as:as:

• This representation allows a very computationally efficient This representation allows a very computationally efficient implementation:implementation:

PositivePositive Naive Bayes (PNB) Naive Bayes (PNB)PositivePositive Exemplar-based (PEB) Exemplar-based (PEB)

mn jjjjiiii vvvVvvvV ,...,, , ,...,,

2121

2121, VVVVS



44

Experiments on Set B with 15 words:Experiments on Set B with 15 words:• ResultsResults

• Conclusions:Conclusions:• PEB improves by 12.2 points the accuracy of EBPEB improves by 12.2 points the accuracy of EB• PEB is higher than Set A except PEBPEB is higher than Set A except PEBh,10,e,ah,10,e,a

• PNB is at least as accurate as NBPNB is at least as accurate as NB• The positive approach increases greatly the efficiency The positive approach increases greatly the efficiency (80 times for NB and 15 for EB) of the algorithms(80 times for NB and 15 for EB) of the algorithms• PEB accuracy is higher than PNBPEB accuracy is higher than PNB

MFS NB PNB EB PEBh,15 h,1 h,7 h,7,e h,7,a h,10,e,a cs,1 cs,10 cs,10,e

nouns 57.4 72.2 72.4 64.3 70.6 72.4 73.7 72.5 73.4 73.2 75.4 75.6verbs 46.6 55.2 55.3 43.0 54.7 57.7 59.5 58.9 60.2 58.6 61.9 62.1all 53.3 65.8 66.0 56.2 64.6 66.8 68.4 67.4 68.4 67.7 70.3 70.5

time 16:13 00:12 06:04 00:25 03:55 49:43



45

Global Results (191 words):Global Results (191 words):

• Conclusions:Conclusions:• In In Set Set A, A,

The best option is Exemplar-based using MVDM metricThe best option is Exemplar-based using MVDM metric• In In Set BSet B, ,

The best option is Exemplar-based using Hamming The best option is Exemplar-based using Hamming distance and example weightingdistance and example weightingMVDM metric has higher accuracy but is currently MVDM metric has higher accuracy but is currently computationally prohibitivecomputationally prohibitive

• Positive Exemplar-based allows the addition of unordered Positive Exemplar-based allows the addition of unordered contextual attributes with an accuracy improvementcontextual attributes with an accuracy improvement• Positive information allows to improve greatly the efficiencyPositive information allows to improve greatly the efficiency

SetA SetBMFS PNB PEBh PEBcs PNB PEBh PEBcs

nouns 56.4 68.7 68.5 70.2 69.2 70.1 -verbs 48.7 64.8 65.3 66.4 63.4 67.0 -all 53.2 67.1 67.2 68.6 66.8 68.8 -

time 00:33 00:47 92:22 01:06 01:46 -



46


Experimental Results (Portability)Experimental Results (Portability)

15 features from Set A (Ng 96):15 features from Set A (Ng 96): pp-3-3, p, p-2-2, p, p-1 -1 , p, p+1+1, p, p+2+2, p, p+3+3, w, w-1 -1 , w, w+1+1, (w, (w-2-2, w, w-1-1), (w), (w-1 -1 , w, w+1+1), (w), (w+1+1, , ww+2+2), (w), (w-3-3, w, w-2-2, w, w-1-1), (w), (w-2-2, w, w-1 -1 , w, w+1+1), (w), (w-1 -1 , w, w+1+1, w, w+2+2), (w), (w+1+1, w, w+2 +2 , , ww+3+3))

21 reference words (13 N, 8 V)21 reference words (13 N, 8 V) DSO CorpusDSO Corpus

– Wall Street Journal (Corpus A)Wall Street Journal (Corpus A)– Brown Corpus (Corpus B)Brown Corpus (Corpus B)

7 combinations of training-test sets7 combinations of training-test sets– A+B-A+B, A+B-A, A+B-BA+B-A+B, A+B-A, A+B-B– A-A, B-B, A-B, B-AA-A, B-B, A-B, B-A

Forcing the number of examples of corpus A and B Forcing the number of examples of corpus A and B be the same (reducing the size to the smalest)be the same (reducing the size to the smalest)

47



First Experiment (% accuracy)First Experiment (% accuracy)

MethodMethod A+B-A+BA+B-A+B A+B-AA+B-A A+B-BA+B-B

MFCMFC 46.646.6 53.053.0 39.239.2

NBNB 61,661,6 67.367.3 55.955.9

EBEB 63.063.0 69.069.0 57.057.0

SnowSnow 60.160.1 65.665.6 56.356.3

LBLB 66.366.3 71.871.8 60.960.9

MethodMethod A-AA-A B-BB-B A-BA-B B-AB-A

MFCMFC 56.056.0 45.545.5 36.436.4 38.738.7

NBNB 65.965.9 56.856.8 41.441.4 47.747.7

EBEB 69.069.0 57.457.4 45.345.3 51.151.1

SnowSnow 67.167.1 56.156.1 44.144.1 49.849.8

LBLB 71.371.3 59.059.0 47.147.1 52.052.0

48



Conclusions of First ExperimentConclusions of First Experiment

LazyBoosting outperforms all other methods in all casesLazyBoosting outperforms all other methods in all cases the knowledge acquired from a single corpus almost covers the knowledge acquired from a single corpus almost covers

the knowldge of combining both corporathe knowldge of combining both corpora Very disappointing results!Very disappointing results!

Looking at Kappa valuesLooking at Kappa values– NB most similar to MFCNB most similar to MFC– LB most similar to DSOLB most similar to DSO– LB most disimilar to MFCLB most disimilar to MFC

49



Second ExperimentSecond Experiment

Adding tuning materialAdding tuning material B+%A-A, A+%B-B, %A-A, %B-BB+%A-A, A+%B-B, %A-A, %B-B ranging from 10% to 50% (50% remaining for test)ranging from 10% to 50% (50% remaining for test)

For NB, EB, Snow it is not worth keeping the For NB, EB, Snow it is not worth keeping the original corpusoriginal corpus

LB has a moderate (but consistent) improvement LB has a moderate (but consistent) improvement when retaining the original training setwhen retaining the original training set

50



Third ExperimentThird Experiment

Two main reasons Two main reasons – Corpus A and B have a very Corpus A and B have a very different distribution of sensesdifferent distribution of senses– Examples of corpus A and B contain Examples of corpus A and B contain differentdifferent informationinformation

New corpus sense-balanced New corpus sense-balanced – Forcing the number of examples of each sense of corpus A Forcing the number of examples of each sense of corpus A

and B be the same (reducing the size to the smalest)and B be the same (reducing the size to the smalest)

51



Third Experiment (% accuracy)Third Experiment (% accuracy)

MethodMethod A+B-A+BA+B-A+B A+B-AA+B-A A+B-BA+B-B

MFCMFC 48.648.6 48.648.6 48.548.5

LBLB 64.464.4 66.266.2 62.562.5

MethodMethod A-AA-A B-BB-B A-BA-B B-AB-A

MFCMFC 48.648.6 48.548.5 48.748.7 48.748.7

LBLB 65.265.2 61.761.7 56.156.1 58.058.0

Even when the same distribution of senses is conserved between Even when the same distribution of senses is conserved between training and test examples, the portability is not garanteed!training and test examples, the portability is not garanteed!

52






OutlineOutline

53

• Other methods (SVM’s, DL’s, ...)Other methods (SVM’s, DL’s, ...)• Other corpora (Semcor, Senseval, Bruce, ...)Other corpora (Semcor, Senseval, Bruce, ...)• Comparison with unsupervised methodsComparison with unsupervised methods• Combination of classifiersCombination of classifiers• Search of the optimum set of features for Search of the optimum set of features for each methodeach method• Try new sets of features (semantic Try new sets of features (semantic features, ...)features, ...)• 3 research lines of “bottleneck knowledge 3 research lines of “bottleneck knowledge adquisition solution”adquisition solution”• Other tagsets (synsets, semantic fields, Other tagsets (synsets, semantic fields, base concepts, groups of synsets, ...)base concepts, groups of synsets, ...)


Future WorksFuture Works

Using the Web and Using the Web and EuroWordNet forEuroWordNet forWord Sense DisambiguationWord Sense Disambiguation





55

– SettingSetting– Exploiting EWN Semantic RelationsExploiting EWN Semantic Relations– Collecting training Corpus from the Collecting training Corpus from the

WebWeb


OutlineOutline

56

Our approachOur approach

– UnsupervisedUnsupervised Automatically obtain training corpora Automatically obtain training corpora using the Web or on-line corporausing the Web or on-line corpora

– to feed a to feed a supervisedsupervised ML WSD ML WSD systemsystem


SettingSetting

57

– SettingSetting– Exploiting EWN Semantic Exploiting EWN Semantic

RelationsRelations– Collecting training Corpus from the Collecting training Corpus from the

WebWeb


OutlineOutline

58

WordNetWordNet– WordNet is organized conceptuallyWordNet is organized conceptually– 123,497 content words123,497 content words

11,514 polisemous11,514 polisemous

– 99,642 synsets99,642 synsets


Exploiting EWN Semantic Exploiting EWN Semantic RelationsRelations

wine, vino -- (fermented juice (of grapes especially))wine, vino -- (fermented juice (of grapes especially)) => sake, saki -- (Japanese beverage from fermented => sake, saki -- (Japanese beverage from fermented rice ...)rice ...) => vintage -- (a season's yield of wine from a vineyard)=> vintage -- (a season's yield of wine from a vineyard) => red wine -- (wine having a red color derived from => red wine -- (wine having a red color derived from skins ...)skins ...) => Pinot noir -- (dry red California table wine ...)=> Pinot noir -- (dry red California table wine ...) => claret, red Bordeaux -- (dry red Bordeaux or => claret, red Bordeaux -- (dry red Bordeaux or Bordeaux-like wine)Bordeaux-like wine) => Saint Emilion -- (full-bodied red wine from ...)=> Saint Emilion -- (full-bodied red wine from ...) => Chianti -- (dry red Italian table wine from the Chianti => Chianti -- (dry red Italian table wine from the Chianti ...)...) => Cabernet, Cabernet Sauvignon -- (superior Bordeaux-=> Cabernet, Cabernet Sauvignon -- (superior Bordeaux-type red wine)type red wine) => Rioja -- (dry red table wine from the Rioja ...)=> Rioja -- (dry red table wine from the Rioja ...) => zinfandel -- (dry fruity red wine from California)=> zinfandel -- (dry fruity red wine from California)

59



SRSR PoSPoS ExamplesExamples

SynonymySynonymy NounNoun coche, automóvilcoche, automóvilVerbVerb salir, pasearsalir, pasearAdjAdj feliz, contentofeliz, contentoAdvAdv duramente, severamenteduramente, severamente

HyponymyHyponymy NounNoun coche -> vehículocoche -> vehículoMeronymyMeronymy NounNoun motor -> cochemotor -> cocheTroponymyTroponymy VerbVerb marchar -> caminarmarchar -> caminarEntailmentEntailment VerbVerb roncar -> dormirroncar -> dormir

60



<evento social><evento social>

<competición, concurso><competición, concurso>

<evento><evento>

<<partido_1partido_1>>

<semifinal><semifinal><cuartos_de_final><cuartos_de_final>

<grupo_social><grupo_social>

<organización><organización>

<agrupación grupo colectivo><agrupación grupo colectivo>

<<partido_2, partido_2, partido_político>partido_político>

<partido_laborista><partido_laborista>

61



partido 1partido 1

Todos los Todos los partidospartidos piden reformas legales para TV3. piden reformas legales para TV3.La derecha planea agruparse en un La derecha planea agruparse en un partidopartido..El diputado reiteró que ni él ni UDC, “como El diputado reiteró que ni él ni UDC, “como partidopartido”, han ”, han recibido dinero de Pellerols.recibido dinero de Pellerols.

partido 2partido 2

Pero España puso al Pero España puso al partidopartido intensidad, ritmo y coraje. intensidad, ritmo y coraje.El seleccionador cree que el El seleccionador cree que el partidopartido de hoy contra Italia dará de hoy contra Italia dará la medida de Españala medida de EspañaEl Racing no gana en su campo desde hace seis El Racing no gana en su campo desde hace seis partidospartidos..

62



partido 1partido 1

No negociaremos nunca com un No negociaremos nunca com un partido políticopartido político que sea que sea partidario de la independencia de Taiwan.partidario de la independencia de Taiwan.Una vez más es noticia la desviación de fondos destinadoss a Una vez más es noticia la desviación de fondos destinadoss a la formación ocupacional hacia la financiación de un la formación ocupacional hacia la financiación de un partido partido políticopolítico..Estas lleyess fueron votadas gracias a un consenso general de Estas lleyess fueron votadas gracias a un consenso general de los los partidos políticospartidos políticos..

partido 2partido 2

Rivera pide el suporte de la afición para encarrilar las Rivera pide el suporte de la afición para encarrilar las semifinalessemifinales..Sólo el equipo de Valero Ribera puede sentenciar una Sólo el equipo de Valero Ribera puede sentenciar una semifinalsemifinal como lo hizo ayer en un Palau Blaugrana como lo hizo ayer en un Palau Blaugrana completamente entregado.completamente entregado.El Racing ganó los El Racing ganó los cuartos de finalcuartos de final en su campo. en su campo.

63



11,514 polisemous words11,514 polisemous words

1 sense1 sense

synonymsynonym brotherbrother fatherfather daugtherdaugther grandchidgrandchid1 step1 step 20952095 89038903 38943894 759759 1161162 step2 step 33 13311331 1616 333 step3 step 5125124 step4 step 1471475 step5 step 4343totaltotal 29052905 89068906 59275927 775775 119119

64




2 senses2 senses

synonymsynonym brotherbrother fatherfather daughterdaughter grandchildgrandchild1 step1 step 479479 69886988 584584 408408 87872 step2 step 2424 9797 88 223 step3 step 994 step4 step 33totaltotal 479479 70127012 693693 417417 8989

65




3 senses3 senses

synonymsynonym brotherbrother fatherfather daughterdaughter grandchildgrandchild1 step1 step 108108 56405640 7676 239239 59592 step2 step 2222 66 11totaltotal 108108 56625662 7676 245245 6060

66




1 sense1 sense

S+BS+B S+DS+D S+B+DS+B+D S+B+D+FS+B+D+FS+B+D+F+CS+B+D+F+C1 step1 step 89038903 34613461 92579257 1028410284 10284102842 step2 step 33 3434 188188 10681068 106810683 step3 step 22 3030 137137 1371374 step4 step 44 1919 1919totaltotal 89068906 34873487 94799479 1150811508 1150811508

67




2 sense2 sense

S+BS+B S+DS+D S+B+DS+B+D S+B+D+FS+B+D+FS+B+D+F+CS+B+D+F+C1 step1 step 75807580 12821282 80488048 88918891 889988992 step2 step 281281 1616 461461 11961196 121312133 step3 step 1111 11 3333 264264 2452454 step4 step 22 8080 74745 step5 step 1313 13136 step6 step 22 22totaltotal 78727872 12991299 85448544 1044610446 1044610446

68




3 sense3 sense

S+BS+B S+DS+D S+B+DS+B+D S+B+D+FS+B+D+FS+B+D+F+CS+B+D+F+C1 step1 step 61166116 568568 66916691 76577657 767376732 step2 step 274274 55 482482 10301030 103910393 step3 step 55 4646 295295 3113114 step4 step 77 9191 78785 step5 step 11 2828 12126 step6 step 33 33totaltotal 63956395 573573 72307230 91049104 91139113

69

– SettingSetting– Exploiting EWN Semantic RelationsExploiting EWN Semantic Relations– Collecting training Corpus from Collecting training Corpus from

the Webthe Web


OutlineOutline

70

(Mihalcea & Moldovan 99)(Mihalcea & Moldovan 99)– Search engines: AltavistaSearch engines: Altavista– Complex queriesComplex queries

synonymssynonyms definitionsdefinitions

– 120 word senses120 word senses– 91% precision91% precision– Example:Example:

<grow, raise, farm, produce> (cultivate by <grow, raise, farm, produce> (cultivate by growing)growing)

cultivate NEAR growing AND (grow OR raise OR cultivate NEAR growing AND (grow OR raise OR farm OR produce)farm OR produce)


Collecting training Corpus from the Collecting training Corpus from the WebWeb

Documents

Word Sense Disambiguation German Rigau i Claramunt rigau TALP Research Center Departament de Llenguatges i Sistemes Informàtics