Intelligent Database Systems Lab Presenter : Kung, Chien-Hao Authors : Yoong Keok Lee and Hwee Tou...

Preview:

Citation preview

Intelligent Database Systems Lab

Presenter : Kung, Chien-Hao

Authors : Yoong Keok Lee and Hwee Tou Ng

2002,EMNLP

An Empirical Evaluation of Knowledge Sources and learining Algorithms for

Word Sense Disambiguation

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Intelligent Database Systems Lab

Motivation• Natural language is inherently ambiguous.

• A word can have multiple meanings(or senses).

Intelligent Database Systems Lab

Objectives• This paper evaluates a variety of knowledge sources

and supervised learning algorithms for word sense

disambiguation on SENSEVAL-2 and SENSEVAL-1 data.

Intelligent Database Systems Lab

Methodology

Part of speech (POS) of Neighboring Words

Single Words in the Surrounding Context

Local CollocationsSyntactic Relations

Knowledge Sources

Intelligent Database Systems Lab

Methodology• Part-of-Speech(POS) of Neighboring Words– This paper use 7 features to encode this knowledge source – Setence segmentation program

(Reynar and Ratnaparkhi , 1997)– POS tagger

(Ratnaparkhi , 1996)

Reid saw me looking at the iron bars. barsand

NNP VBD PRP VBG IN DT NN NNS .

{IN,DT,NN,NNS,.,,}

Intelligent Database Systems Lab

Methodology• Single Words in the Surrounding Context– Feature selection method• Parameter:M2

{chocolate, iron, beer}

Reid saw me looking at the iron bars.

bars

<0,1,0>

Intelligent Database Systems Lab

Methodology• Local Collocations– This paper extracted 11 features.

C-1,-1 ,C1,1,C-2,-2,C2,2,C-2,-1,C-1,1,C1,2,C-3,-1,C-2,1,C-1,2,C1,3

{ a_chocolate , the_wine , the_iron }

Reid saw me looking at the iron bars.

bars

<the_iron>C-2,-1

Intelligent Database Systems Lab

Methodology• Syntactic Relations

(a) Show w and its POS(b) Show the sentence where w occurs(c) Show the feature vector corresponding to syntactic relations

Intelligent Database Systems Lab

• Learning Algorithms– Support Vector Machines– AdaBoost– Naïve Bayes– Decision Trees

• Evaluation Data Sets– SENSEVAL-2– SENSEVAL-1

Methodology

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Conclusions

• Using all of these knowledge sources and SVM

achieves accuracy higher than the best official scores

on both SENSEVAL-2 and SENSEVAL-a test data.

Intelligent Database Systems Lab

Comments• Advantages– This paper easy to read.

• Applications– WSD

Recommended