Upload
gail-mullen
View
32
Download
3
Embed Size (px)
DESCRIPTION
Authors:. Balamurali A R *,+ Subhabrata Mukherjee + Akshat Malu + Pushpak Bhattacharyya + * IITB- Monash Research Academy, IIT Bombay + Dept. of Computer Science and Engineering, IIT Bombay. Leveraging Sentiment to Compute Word Similarity. GWC 2012, Matsue, Japan. - PowerPoint PPT Presentation
Citation preview
Leveraging Sentiment to Compute Word Similarity
GWC 2012, Matsue, Japan
Balamurali A R*,+ Subhabrata Mukherjee+ Akshat Malu+ Pushpak Bhattacharyya+
*IITB-Monash Research Academy, IIT Bombay+Dept. of Computer Science and Engineering, IIT Bombay
Authors:
Slides Acknowledgement: Akshat Malu
Similarity Metrics
An unavoidable component in an NLP system Example :Word Sense disambiguation (Banerjee &
Pedersen 2002), malapropism detection (Hirst & St-Onge,1997)
Underlying principle: Distributional similarity in terms of their meaning. Example: Refuge and Asylum are similar
Existing approaches: Finding the similarity between a word pair based on their meaning (definition)
Similarity & Sentiment
Our hypothesis
“Knowing the sentiment content of the words is beneficial in measuring their similarity”
SenSim Metric
Using sentiment along with the meaning of the words to calculate their similarity
The gloss of the synset is the most informative piece
We leverage it in calculating both, the meaning based similarity as well as the sentiment similarity of the word pair
We use a gloss-vector based approach with cosine similarity in our metric
Gloss Vector Gloss vector is created by representing all the
words of the gloss in the form of a vector Assumption: Synset for the word is already
known Each dimension of the gloss vector represents
a sentiment score of the respective content word
Sentiment scores are obtained from different scoring functions based on an external lexicon
SentiWordNet 1.0 is used as the external lexicon
Problem: The vector thus formed is too sparse
Augmenting Gloss Vector To counter sparsity of gloss vectors, they are
augmented with the gloss of the related synsets
The context is further extended by adding the gloss of synsets of the words present in the gloss of the original word
Refuge
a shelter from danger or hardship
Area
CountryHarbora
ge
a shelter from danger or hardshipa shelter from danger or hardship; a particular geographical region of
indefinite boundarya shelter from danger or hardship; a particular geographical region of indefinite boundary; a place of refuge
a shelter from danger or hardship; a particular geographical region of indefinite boundary; a place of refuge; a structure that provides privacy and protection from danger
a shelter from danger or hardship; a particular geographical region of indefinite boundary; a place of refuge; a structure that provides privacy and protection from danger; the condition of being susceptible to harm or injury
a shelter from danger or hardship; a particular geographical region of indefinite boundary; a place of refuge; a structure that provides privacy and protection from danger; the condition of being susceptible to harm or injury; a state of misfortune or affliction
a shelter from danger or hardship; a particular geographical region of indefinite boundary; a place of refuge; a structure that provides privacy and protection from danger; the condition of being susceptible to harm or injury; a state of misfortune or affliction
a shelter from danger or hardshipa shelter from danger or hardshipa shelter from danger or hardshipa shelter from danger or hardship
Hyponymy Hypernymy
Hyponymy
Scoring Functions Sentiment Difference (SD)
Difference between the positive and negative sentiment values
Sentiment Max (SM) The greater of the positive and negative sentiment
values Sentiment Threshold Difference (TD)
Same as SD but with a minimum threshold value Sentiment Threshold Max (TM)
Same as SM but with a minimum threshold valueScoreSD(A) = SWNpos(A) – SWNneg(A)ScoreSM(A) = max(SWNpos(A), SWNneg(A))
ScoreTD(A) = sign(max(SWNpos(A) , SWNneg(A)) )* (1+abs(SWNpos(A) – SWNneg(A)))
ScoreTM(A) = sign(max(SWNpos(A), SWNneg(A))) * (1+abs(max(SWNpos(A), SWNneg(A))))
SenSim Metric
SenSim_x(A,B) =
cosine(glossvec(sense(A),glossvec(sense(B))
glossvec = 1:score_x(1) 2:score_x(2)............ n:score_x(n)Score_X (Y) = Sentiment score of word Y using scoring function x
x = Scoring function of type SD/SM/TD/TM
Synset Replacemen
t using Similarity
metrics
Comparing scores* given by SenSim with those given by human annotators
* all scores normalized to a scale of 1-5 (1-least similar, 5-most similar)
Evaluation
Evaluation
Intrinsic Extrinsic
Correlation with
Human Annotators
Annotation based on meaning
Annotation based on
sentiment and meaning combined
Correlation between
other Metrics
Correlation with Human
Annotators
Correlation
between other
Metrics
Synset Replaceme
nt using Similarity metrics
Synset Replacement using Similarity metrics Unknown feature problem in supervised
classification–
If Test_Synset T is not in
Train_synset_list
Get Train_Synset_L
ist
Get Test_Synset_Li
st
Training corpus(Synset
)Test
corpus(Synset)
Using similarity metric S find a
similar synset R
Replace T with RNew Test
corpus(Syns
et)
Yes
No
Metrics used:LIN (Lin, 1998) ,LCH (Leacock and Chodorow 1998), Lesk (Banerjee and Pedersen, 2002)
Results – Intrinsic Evaluation (1/4) Sentiment as a parameter for finding
similarity (1/2)
Adding sentiment to the context yields better correlation among the annotators
The decrease in correlation on adding sentiment in case of NOUN can be because sentiment does not play that important role in this case
Annotation Strategy
Overall
NOUN VERB ADJECTIVE ADVERB
Meaning 0.768 0.803 0.750 0.527 0.759
Meaning+Sentiment
0.799 0.750 0.889 0.720 0.844
Pearson Correlation Coefficient between two annotators for various annotation strategies
Results – Intrinsic Evaluation (2/4) Sentiment as a parameter for finding
similarity (2/2)Metric Used
Overall
NOUN VERB ADJECTIVE
ADVERB
LESK 0.22 0.51 -0.91 0.19 0.37
LIN 0.27 0.24 0.00 NA NA
LCH 0.36 0.34 0.44 NA NA
SenSim (SD)
0.46 0.73 0.55 0.08 0.76
SenSim (SM)
0.50 0.62 0.48 0.06 0.54
SenSim (TD)
0.45 0.73 0.55 0.08 0.59
SenSim (TM)
0.48 0.62 0.48 0.06 0.78
Pearson Correlation(r) of various metrics with Gold standard data* all experiments performed on gold standard data consisting 48 sense-marked word pairs
Results- Extrinsic Evaluation (3/4) Effect of SenSim on synset replacement
strategy
Baseline signifies the experiment where in there are no synset replacements
Metric Used
Accuracy (%)
PP NP PR NR
Baseline 89.10 91.50 87.07 85.18 91.24
LESK 89.36 91.57 87.46 85.68 91.25
LIN 89.27 91.24 87.61 85.85 90.90
LCH 89.64 90.48 88.86 86.47 89.63
SenSim (SD)
89.95 91.39 88.65 87.11 90.93
SenSim (SM)
90.06 92.01 88.38 86.67 91.58
SenSim (TD)
90.11 91.68 88.69 86.97 91.23
SenSim (TM)
90.17 91.81 88.71 87.09 91.36
Classification results of synset replacement experiment using different similarity metrics; * PP-Positive Precision (%), NP-Negative Precision(%), PR-Positive Recall (%), NR-Negative Recall (%)
Results- Extrinsic Evaluation (4/4) Effect of SenSim on synset replacement
strategy
Improvement is only marginal as no complex features are used for training the classifier
Metric Used
Accuracy (%)
PP NP PR NR
Baseline 89.10 91.50 87.07 85.18 91.24
LESK 89.36 91.57 87.46 85.68 91.25
LIN 89.27 91.24 87.61 85.85 90.90
LCH 89.64 90.48 88.86 86.47 89.63
SenSim (SD)
89.95 91.39 88.65 87.11 90.93
SenSim (SM)
90.06 92.01 88.38 86.67 91.58
SenSim (TD)
90.11 91.68 88.69 86.97 91.23
SenSim (TM)
90.17 91.81 88.71 87.09 91.36
Classification results of synset replacement experiment using different similarity metrics; * PP-Positive Precision (%), NP-Negative Precision(%), PR-Positive Recall (%), NR-Negative Recall (%)
Conclusions Proposed that sentiment content can aid in
similarity measurement, which to date has been done on the basis of meaning alone.
Verified this hypothesis by taking the correlation between annotators using different annotation strategies
Annotator correlation on including sentiment as an additional parameter for similarity measurement was higher than just semantic similarity
SenSim, based on this hypothesis, performs better than the existing metrics, which fail to account for sentiment while calculating similarity
References (1/2) Balamurali, A., Joshi, A. & Bhattacharyya, P. (2011), Harnessing wordnet senses for
supervised sentiment classification, in ‘Proc. Of EMNLP-2011’. Banerjee, S. & Pedersen, T. (2002), An adapted lesk algorithm for word sense
disambiguation using wordnet, in ‘Proc. of CICLing-02’. Banerjee, S. & Pedersen, T. (2003), Extended gloss overlaps as a measure of semantic relatedness, in ‘Proc. of IJCAI-03’.
Esuli, A. & Sebastiani, F. (2006), SentiWordNet: A publicly available lexical resource for opinion mining, in ‘Proceedings of LREC-06’, Genova, IT.
Grishman, R. (2001), Adaptive information extraction and sublanguage analysis, in ‘Proc. Of IJCAI-01’.
Hirst, G. & St-Onge, D. (1997), ‘Lexical chains as representation of context for the detection and correction malapropisms’.
Jiang, J. J. & Conrath, D. W. (1997), Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy, in ‘Proc. of ROCLING X’.
Leacock, C. & Chodorow, M. (1998), Combining local context with wordnet similarity for word sense identification, in ‘WordNet: A Lexical Reference System and its Application’.
Leacock, C., Miller, G. A. & Chodorow, M. (1998), ‘Using corpus statistics and wordnet relations for sense identification’, Comput. Linguist. 24.
Lin, D. (1998), An information-theoretic definition of similarity, in ‘Proc. of ICML ’98’. Partington, A. (2004), ‘Utterly content in each others company semantic prosody and semantic preference’, International Journal of Corpus Linguistics 9(1).
Patwardhan, S. (2003), Incorporating dictionary and corpus information into a context vector measure of semantic relatedness, Master’s thesis, University of Minnesota, Duluth.
Pedersen, T., Patwardhan, S. & Michelizzi, J. (2004), Wordnet::similarity: measuring the relatedness of concepts, in ‘Demonstration Papers at HLT-NAACL’04’.
Rada, R., Mili, H., Bicknell, E. & Blettner, M. (1989), ‘Development and application of a metric on semantic nets’, IEEE Transactions on Systems Management and Cybernetics 19(1).
Resnik, P. (1995a), Disambiguating noun groupings with respect to Wordnet senses, in ‘Proceedings of the Third Workshop on Very Large Corpora’, Somerset, New Jersey.
Resnik, P. (1995b), Using information content to evaluate semantic similarity in a taxonomy, in ‘Proc. of IJCAI-95’.
Richardson, R., Smeaton, A. F. & Murphy, J. (1994), Using wordnet as a knowledge base for measuring semantic similarity between words, Technical report, Proc. of AICS-94.
Sinclair, J. (2004), Trust the Text: Language, Corpus and Discourse, Routledge. Wan, S. & Angryk, R. A. (2007), Measuring semantic similarity using wordnet-based context
vectors., in ‘Proc. of SMC’07’. Wu, Z. & Palmer, M. (1994), Verb semantics and lexical selection, in ‘Proc. of ACL-94’, New
Mexico State University, Las Cruces, New Mexico. Zhong, Z. & Ng, H. T. (2010), It makes sense: A wide-coverage word sense disambiguation
system for free text., in ‘ACL (System Demonstrations)’ 10’.
References (2/2)
Metrics used for Comparison LIN
uses the information content individually possessed by two concepts in addition to that shared by them.
Lesk based on the overlap of words in their
individual glosses Leacock & Chodorow (LCH)
the shortest path through hypernymy relation
simLIN (A,B) = 2 X log Pr (lso (A,B))
log Pr (A) + log Pr (B)
simLCH (A,B) = -log ( len (A,B)
2D )
How? Use of WordNet Similarity Metrics
If Test_Synset T is not in
Train_synset_list
Get Train_Synset_L
ist
Get Test_Synset_Li
st
Training corpus(Synset
)Test
corpus(Synset)
Using similarity metric S find a
similar synset R
Replace T with RNew Test
corpus(Syns
et)
Yes
No
Metrics used:LIN (Lin, 1998) ,LCH (Leacock and Chodorow 1998), Lesk (Banerjee and Pedersen, 2002)
Experimental Setup Datasets Used
Intrinsic Evaluation Gold Standard Data containing 48 sense-marked word pairs
Extrinsic Evaluation Dataset provided by Balamurali et al (2011)
Word Sense Disambiguation carried out using the WSD engine by Zhong & Ng (2010) (82% accuracy)
WordNet::Similarity 2.05 package used for computing similarity by other metric scores
Pearson Correlation Coefficient used to find inter-annotator agreement
Sentiment Classification done using C-SVM; all results are average of five-fold cross-validation accuracies