Upload
moses-cobb
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
OUTLINES
What is an Opinion?
Problem definition
Word Sentiment Classifier
Sentence Sentiment Classifier
Experimental Analysis
Shortcomings
Future works2
WHAT IS AN OPINION?
An opinion is a quadruple [Topic, Holder, Claim, Sentiment] The Holder believes a Claim about the Topic and
in many cases associates a Sentiment.
Opinion may contain sentiment or not e.g. I believe the world is flat. (absent)
Sentiment can be implicit or explicit e.g. I like apple. (explicit) e.g. We should decrease our dependence on oil (implicit)
3
PROBLEM DEFINITION
Opinion = [Topic, Holder, Claim, Sentiment]
Given a Topic a set of texts about the topic
Find The sentiments (only positive or negative) about
the topic in each sentence Identify the people who hold that sentiment.
4
AUTHORS APPROACH
4 Basic stages Calculation of the polarity of sentiment bearing
words (Word Sentiment Classifier)
Selection of sentence containing both topic and holder
Holder based region identification
Combine these polarity to provide the sentence sentiment (Sentence Sentiment Classifier)
5
WORD SENTIMENT CLASSIFIER
To build a classifier we need a training data
How to generate training data for word sentiment classifier?
Assemble a small amount of seed words by hand
Seed word list only contains positive and
negative polarity words
Then grow this list by adding synonyms and
antonyms from WordNet [1]6
WORD SENTIMENT CLASSIFIERWORDNET (CONTD.)
Figure: An example of the relationship between Hyponyms and Hypernym [source:
wikipedia] 8
WORD SENTIMENT CLASSIFIER (CONTD.)
Initial Seed word list
Adjectives (15 positive and 19 negative)
Verbs (23 positive and 21 negative)
Final Seed word list
Adjectives (5880 positive and 6233 negative)
Verb (2840 positive and 3239 negative)
Some words e.g. “great”, “strong” appears in
both positive and negative categories. 9
WORD SENTIMENT CLASSIFIER (CONTD.)
Now we have A set of words Each word has a class label (or polarity) of either
positive or negative How to calculate the strength of the
sentiment polarity? For a new word w we compute first the synonym
set (syn1, syn2, …, synn) from WordNet . Then we compute arg max P(c|w) which is
equivalent to arg max P(c| syn1, syn2, …, synn)
Here c is sentiment category (positive or negative)
10
WORD SENTIMENT CLASSIFIER (CONTD.) There are two possible ways to calculate
arg max P(c|w) Approach 1
Where f_k is the kth feature of category c. And count(f_k,synset(w)) is the total
number of occurrence of f_k in the synonym set of w.
m
1k
synset(w))count(f_k,c)^|p(f_kmaxP(c) arg
c)|syn_nsyn_2,...,yn_1,maxP(c)P(s arg
c)|maxP(c)P(w arg
w)|maxp(c arg
11
WORD SENTIMENT CLASSIFIER (CONTD.)
There are two possible ways to calculate arg max P(c|w)
Approach 2
Where count(syn_i,c) is the count of occurrence of w’s synonyms in the list of c.
)(
),_()(maxarg
)|()(maxarg
)|(maxarg
1
ccount
cisyncountcp
cwpcp
wcp
n
i
12
WORD SENTIMENT CLASSIFIER (CONTD.) word “amusing”, for
example, is classified as carrying primarily positive sentiment, and “blame” as primarily negative
“afraid” with strength -0.99 represents strong negativity while “abysmal” with strength -0.61 represents weaker negativity.
13
SENTENCE SENTIMENT CLASSIFIER
Consists of 4 parts:
Identification of Topic in the sentence (i.e. direct matching)
Identification of opinion holder
Identification of region
Development of model to combine sentiments
14
SENTENCE SENTIMENT CLASSIFIER (CONTD.)HOLDER IDENTIFICATION
Assumption Person and organization are the only opinion
holder For sentence with more than holder just pick the
closest one to Topic.
Method BBN named entity tagger identifier [2]
A software tool [http://www.bbn.com/technology/speech/identifinder]
15
SENTENCE SENTIMENT CLASSIFIER (CONTD.)SENTIMENT REGION IDENTIFICATION
Where to look for the sentiment? Proposed different sentiment region
Window 1 Full sentence
Window 2 Words between holder and Topic
Window 3 Window2 ± 2
Window 4 Window 2 to the end of the sentence
16
SENTENCE SENTIMENT CLASSIFIER (CONTD.) CLASSIFICATION MODEL
3 different models
Model 0: Signs can be positive or negative
Model 1: Harmonic mean of the sentiment in the region
region)in signs(
c w_i)| p(c_jargmax
)_|()(
1)|(
1
if
iwcpcn
scpn
i17
SENTENCE SENTIMENT CLASSIFIER (CONTD.) CLASSIFICATION MODEL
Model 1 (Contd.) n( c) is the number of words in the region whose
sentiment category is c. s is the sentiment strength
Model 2 Geometric mean of the sentiment in the region
ciwjcpif
cnscp
)_|_(maxarg
w_i)|p(c x )1)((^10)|(n
1i
18
EXPERIMENTAL ANALYSIS
Two set of experiments for
Word Sentiment Classifier
Sentence Sentiment Classifier
20
EXPERIMENTAL ANALYSIS (CONTD.)WORD SENTIMENT CLASSIFIER
Dataset Word List from TOEFL exam A predefined list
Containing 19748 English Adjectives And 8011 English Verbs
Take an intersection of above two lists. Finally take randomly 462 adjectives and 502
verbs.
Classification of dataset Human 1 and Human 2: label adjectives Human 2 and Human 3 : label verbs 21
EXPERIMENTAL ANALYSIS (CONTD.)WORD SENTIMENT CLASSIFIER
Class LabelPositive, Negative and Neutral
Measurement Type Strict – Consider all class label Lenient – Two Class Label Negative and Positive
merged with neutral
Table: Inter Human Agreement
22
EXPERIMENTAL ANALYSIS (CONTD.)WORD SENTIMENT CLASSIFIER
Table: Human-Machine Agreement (Small Seed Set)
Table: Human-Machine Agreement (Larger Seed Set)
23
EXPERIMENTAL ANALYSIS (CONTD.)SENTENCE SENTIMENT CLASSIFIER
Dataset 100 sentences from the DUC 2001 Corpus Topics covered: “illegal alien”, “term limit”, “gun
control” and “NAFTA”
Classification of Sentence 100 sentences from the DUC 2001 Corpus [3] Two human classify the sentence into three class
label : positive, negative and N/A.
24
EXPERIMENTAL ANALYSIS (CONTD.)SENTENCE SENTIMENT CLASSIFIER
Experiment Variants Three different models Four different windows Two different word classifier models Manual annotated holder vs. automatic holder
So in total 16 different variants for each model 1 and model 2 and 8 different variants for model 0.
25
EXPERIMENTAL ANALYSIS (CONTD.)SENTENCE SENTIMENT CLASSIFIER
Table: Results with manually annotated Holder
Table: Results with automatic Holder
26
EXPERIMENTAL ANALYSIS (CONTD.)SENTENCE SENTIMENT CLASSIFIER
Performance Matrix Correctness
Correct identification of both holder and sentiment
Best Model : Model 0 Best Window : window 4
Accuracy 81% accuracy obtained on manually annotated
holder 67% accuracy obtained on automatic holder
27
SHORTCOMINGS
Consider only unigram model. As a result, for some words having both positive
and negative sentiment this model will fail. E.g.: Term limit really hit at democracy.
Model cannot infer sentiment from fact Absence of adjective, verb and noun sentiment
word prevents classification. E.g.: She thinks term limit will give women more
opportunities in politics.
28
FUTURE WORK
One of assumption of this work is that the topic is given. Can we extract topic automatically? E.g: Twitter HashTag ??
Not only positive or negative sentiment Context dependent sentiment (Bi-gram or ti-
gram analysis)
29
REFERENCES
[1] Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. 1993. Introduction to WordNet: An On-Line Lexical Database. http://www.cosgi.princeton.edu/~wn.
[2] BBN named entity tagger identifier- http://www.bbn.com/technology/speech/identifinder
[3] DUC 2001 Corpus. http://www-nlpir.nist.gov/projects/duc/data.html
30