D ETERMINING THE S ENTIMENT OF O PINIONS Presentation by Md Mustafizur Rahman (mr4xb) 1

DETERMINING THE SENTIMENT OF OPINIONSPresentation by

Md Mustafizur Rahman (mr4xb)

1

OUTLINES

What is an Opinion?

Problem definition

Word Sentiment Classifier

Sentence Sentiment Classifier

Experimental Analysis

Shortcomings

Future works2

WHAT IS AN OPINION?

An opinion is a quadruple [Topic, Holder, Claim, Sentiment] The Holder believes a Claim about the Topic and

in many cases associates a Sentiment.

Opinion may contain sentiment or not e.g. I believe the world is flat. (absent)

Sentiment can be implicit or explicit e.g. I like apple. (explicit) e.g. We should decrease our dependence on oil (implicit)

3

PROBLEM DEFINITION

Opinion = [Topic, Holder, Claim, Sentiment]

Given a Topic a set of texts about the topic

Find The sentiments (only positive or negative) about

the topic in each sentence Identify the people who hold that sentiment.

4

AUTHORS APPROACH

4 Basic stages Calculation of the polarity of sentiment bearing

words (Word Sentiment Classifier)

Selection of sentence containing both topic and holder

Holder based region identification

Combine these polarity to provide the sentence sentiment (Sentence Sentiment Classifier)

5

WORD SENTIMENT CLASSIFIER

To build a classifier we need a training data

How to generate training data for word sentiment classifier?

Assemble a small amount of seed words by hand

Seed word list only contains positive and

negative polarity words

Then grow this list by adding synonyms and

antonyms from WordNet [1]6

WORD SENTIMENT CLASSIFIERWORDNET

7

WORD SENTIMENT CLASSIFIERWORDNET (CONTD.)

Figure: An example of the relationship between Hyponyms and Hypernym [source:

wikipedia] 8

WORD SENTIMENT CLASSIFIER (CONTD.)

Initial Seed word list

Adjectives (15 positive and 19 negative)

Verbs (23 positive and 21 negative)

Final Seed word list

Adjectives (5880 positive and 6233 negative)

Verb (2840 positive and 3239 negative)

Some words e.g. “great”, “strong” appears in

both positive and negative categories. 9


Now we have A set of words Each word has a class label (or polarity) of either

positive or negative How to calculate the strength of the

sentiment polarity? For a new word w we compute first the synonym

set (syn1, syn2, …, synn) from WordNet . Then we compute arg max P(c|w) which is

equivalent to arg max P(c| syn1, syn2, …, synn)

Here c is sentiment category (positive or negative)

10

WORD SENTIMENT CLASSIFIER (CONTD.) There are two possible ways to calculate

arg max P(c|w) Approach 1

Where f_k is the kth feature of category c. And count(f_k,synset(w)) is the total

number of occurrence of f_k in the synonym set of w.

m

1k

synset(w))count(f_k,c)^|p(f_kmaxP(c) arg

c)|syn_nsyn_2,...,yn_1,maxP(c)P(s arg

c)|maxP(c)P(w arg

w)|maxp(c arg

11


There are two possible ways to calculate arg max P(c|w)

Approach 2

Where count(syn_i,c) is the count of occurrence of w’s synonyms in the list of c.

)(

),_()(maxarg

)|()(maxarg

)|(maxarg

1

ccount

cisyncountcp

cwpcp

wcp

n

i

12

WORD SENTIMENT CLASSIFIER (CONTD.) word “amusing”, for

example, is classified as carrying primarily positive sentiment, and “blame” as primarily negative

“afraid” with strength -0.99 represents strong negativity while “abysmal” with strength -0.61 represents weaker negativity.

13

SENTENCE SENTIMENT CLASSIFIER

Consists of 4 parts:

Identification of Topic in the sentence (i.e. direct matching)

Identification of opinion holder

Identification of region

Development of model to combine sentiments

14

SENTENCE SENTIMENT CLASSIFIER (CONTD.)HOLDER IDENTIFICATION

Assumption Person and organization are the only opinion

holder For sentence with more than holder just pick the

closest one to Topic.

Method BBN named entity tagger identifier [2]

A software tool [http://www.bbn.com/technology/speech/identifinder]

15

SENTENCE SENTIMENT CLASSIFIER (CONTD.)SENTIMENT REGION IDENTIFICATION

Where to look for the sentiment? Proposed different sentiment region

Window 1 Full sentence

Window 2 Words between holder and Topic

Window 3 Window2 ± 2

Window 4 Window 2 to the end of the sentence

16

SENTENCE SENTIMENT CLASSIFIER (CONTD.) CLASSIFICATION MODEL

3 different models

Model 0: Signs can be positive or negative

Model 1: Harmonic mean of the sentiment in the region

region)in signs(

c w_i)| p(c_jargmax

)_|()(

1)|(

1

if

iwcpcn

scpn

i17

SENTENCE SENTIMENT CLASSIFIER (CONTD.) CLASSIFICATION MODEL

Model 1 (Contd.) n( c) is the number of words in the region whose

sentiment category is c. s is the sentiment strength

Model 2 Geometric mean of the sentiment in the region

ciwjcpif

cnscp

)_|_(maxarg

w_i)|p(c x )1)((^10)|(n

1i

18

SYSTEM ARCHITECTURE

19

EXPERIMENTAL ANALYSIS

Two set of experiments for

Word Sentiment Classifier

Sentence Sentiment Classifier

20

EXPERIMENTAL ANALYSIS (CONTD.)WORD SENTIMENT CLASSIFIER

Dataset Word List from TOEFL exam A predefined list

Containing 19748 English Adjectives And 8011 English Verbs

Take an intersection of above two lists. Finally take randomly 462 adjectives and 502

verbs.

Classification of dataset Human 1 and Human 2: label adjectives Human 2 and Human 3 : label verbs 21


Class LabelPositive, Negative and Neutral

Measurement Type Strict – Consider all class label Lenient – Two Class Label Negative and Positive

merged with neutral

Table: Inter Human Agreement

22


Table: Human-Machine Agreement (Small Seed Set)

Table: Human-Machine Agreement (Larger Seed Set)

23

EXPERIMENTAL ANALYSIS (CONTD.)SENTENCE SENTIMENT CLASSIFIER

Dataset 100 sentences from the DUC 2001 Corpus Topics covered: “illegal alien”, “term limit”, “gun

control” and “NAFTA”

Classification of Sentence 100 sentences from the DUC 2001 Corpus [3] Two human classify the sentence into three class

label : positive, negative and N/A.

24


Experiment Variants Three different models Four different windows Two different word classifier models Manual annotated holder vs. automatic holder

So in total 16 different variants for each model 1 and model 2 and 8 different variants for model 0.

25


Table: Results with manually annotated Holder

Table: Results with automatic Holder

26


Performance Matrix Correctness

Correct identification of both holder and sentiment

Best Model : Model 0 Best Window : window 4

Accuracy 81% accuracy obtained on manually annotated

holder 67% accuracy obtained on automatic holder

27

SHORTCOMINGS

Consider only unigram model. As a result, for some words having both positive

and negative sentiment this model will fail. E.g.: Term limit really hit at democracy.

Model cannot infer sentiment from fact Absence of adjective, verb and noun sentiment

word prevents classification. E.g.: She thinks term limit will give women more

opportunities in politics.

28

FUTURE WORK

One of assumption of this work is that the topic is given. Can we extract topic automatically? E.g: Twitter HashTag ??

Not only positive or negative sentiment Context dependent sentiment (Bi-gram or ti-

gram analysis)

29

REFERENCES

[1] Miller, G.A., R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. 1993. Introduction to WordNet: An On-Line Lexical Database. http://www.cosgi.princeton.edu/~wn.

[2] BBN named entity tagger identifier- http://www.bbn.com/technology/speech/identifinder

[3] DUC 2001 Corpus. http://www-nlpir.nist.gov/projects/duc/data.html

30

Documents

D ETERMINING THE S ENTIMENT OF O PINIONS Presentation by Md Mustafizur Rahman (mr4xb) 1