A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts

A comparison of lexicon-based approaches for Sentiment Analysis

of microblog postsCataldo Musto, Giovanni Semeraro, Marco Polignano

(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)

DART 2014 8th Internation Workshop on

Information Filtering and Retrieval Pisa (Italy)

December 10, 2014

Outline• Background

• Sentiment Analysis • Lexicon-based approaches

• Methodology • State-of-the-art

lexicons • Experiments • Conclusions

2Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014

BackgroundOne minute on the Web


BackgroundOne minute on the Web

4

Information Overload

Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014

5

BackgroundInformation Overload

Obstacleor Opportunity?Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014

6

Opportunities(Social) Content Analytics

Insight: to aggregate rough human-generated data to get valuable people-based findings


7

Social Content AnalyticsApplications

- Online brand monitoring

- Social CRM- Real-time polls

All these applications share a common denominator


8




All these applications share a common denominator


They all need a methodology to automatically associate an opinion and/or a polarity to each piece of content

9




All these applications share a common denominatorThey all need a methodology to automatically associate

an opinion and/or a polarity to each piece of content

Solution: Sentiment Analysis


10

Sentiment AnalysisDefinition

“It is the field of study that analyzes people’s

opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as

products, services, organizations, individuals, issues, events, topics, and

their attributes “ (*)

(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)


11

Sentiment AnalysisDefinition

“It is the field of study that analyzes people’s

opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as

products, services, organizations, individuals, issues, events, topics, and

their attributes “ (*)

(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)


We will focus on the polarity detection task

12

Sentiment AnalysisState of the art

Supervised Approaches

(Machine Learning-based)

Unsupervised Approaches

(Lexicon-based)Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014

13

Sentiment AnalysisSupervised approaches

Learn a classification model relying on labeled examples

?Man

Dog


14

Sentiment AnalysisUnsupervised approaches

Rely on external lexical resourcesthat associate a polarity score to each term.

Sentiment of the content depends on the sentiment of the terms which compose it.

joy +++

frustration - -


15

Sentiment AnalysisSupervised vs Unsupervised

Nakov, Preslav, et al. "Semeval-2013 task 2: Sentiment analysis in Twitter.” Proceedings of SemEval 2013Rosenthal, Sara, et al. "Semeval-2014 task 9: Sentiment analysis in Twitter." Proceedings of SemEval 2014.

(*)(**)

Pros Cons

Supervised Higher Accuracy (*) (**)

Pre-labeled examples

Unsupervised No TrainingAccuracy depends on lexical

resources

Several lexical resources available


Pros Cons

Supervised Higher Accuracy (*) (**)

Pre-labeled examples

Unsupervised No TrainingAccuracy depends on lexical

resources

Several lexical resources available

We focus on lexicon-based approaches

16

Sentiment AnalysisSupervised vs Unsupervised


17

Contributions

We provide a comparison of

lexical resources for sentiment analysis of

microblog posts

We propose a novel unsupervised lexicon-

based approach for sentiment analysis

1.

2.


18

Methodology


Lexicon-based approach

Insight:The polarity of a textual content (e.g. a

microblog posts) depends on the polarity of the microphrases which compose it.

19

Methodology





A microphrase is built whenever a splitting cue

is found in the text

20

Methodology







Conjunctions, adverbs and punctuations are used as

splitting cues

21

Methodology








splitting cues

example: “I don’t like this food, it’s terrible”

22

Methodology








splitting cues

example: “I don’t like this food, it’s terrible”{ { m1 m2

splittingcue

23

Methodology



Insight:

pol(T) = ∑ pol(mi)

The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it.

i=1

k

Tweet microphrase

T={m1…mk}

24

Methodology



Insight:

pol(T) = ∑ pol(mi)i=1

k

The polarity of a microphrase depends on the polarity of the terms which compose it.

pol(mi) = ∑ score(tj)j=1

term

n

T={m1…mk}

Mi={t1…tn}

Tweet microphrase

25

Methodology


Four variant proposed

Basic pol(T) = ∑ pol(mi)

i=1

k


n

score(tj)

26

Methodology



i=1

k


n

Normalized pol(T) = ∑ pol(mi)

i=1

pol(mi) = ∑j=1

n

|mi|

Score of each microphrase is normalized according to its length


score(tj)

27

Methodology



i=1

k


n


i=1

pol(mi) = ∑j=1

n

|mi|

Emphasized pol(T) = ∑ pol(mi)

i=1


n*w(tj)

Specific categories are provided with an higher weight

categories=adverbs, verbs, adjectives & valence &&

valence shifters (intensifiers & downtoners)Several weights have been evaluated


score(tj)

28

Methodology



i=1

k


n


i=1

pol(mi) = ∑j=1

n

|mi|

Emphasized Normalized-Emphasized pol(T) = ∑ pol(mi)

i=1


n


pol(mi) = ∑score(tj)|mi|

*w(tj) *w(tj)

Combination


score(tj)

29

Methodology


We have a problemBasic

pol(T) = ∑ pol(mi)i=1

k


n


i=1

pol(mi) = ∑j=1

n

|mi|


i=1


n



*w(tj) *w(tj)

score(tj)

30

Methodology



i=1

k


n


i=1

pol(mi) = ∑j=1

n

|mi|


i=1


n



*w(tj) *w(tj)

How to calculate score(tj) ?

We have a problem

31

Solution


32

Lexical ResourcesState of the art

We evaluated four state-of-the-art resources for sentiment analysis

SentiWordNet

WordNet Affect

SenticNet

MPQA

http://sentiwordnet.isti.cnr.it

http://wndomains.fbk.eu/wnaffect.html

http://sentic.net

http://mpqa.cs.pitt.edu


http://sentiwordnet.isti.cnr.it

http://wndomains.fbk.eu/wnaffect.html

http://sentic.net

http://mpqa.cs.pitt.edu

33

Lexical ResourcesSentiWordNet(*)

Each WordNet synset is provided with three different sentiment scores (positivity, negativity, objectivity)

(*) Baccianella, Stefano, Andrea Esuli, and Fabrizio Sebastiani. "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining." LREC. Vol. 10. 2010.


34

Lexical ResourcesWordNet Affect(*)

Affective-related synsets are mapped with an A-Label

e.g. euphoria —> positive-emotion illness —> physical state

WordNet extension

(*) Strapparava, Carlo, and Alessandro Valitutti. "WordNet Affect: an Affective Extension of WordNet." LREC. Vol. 4. 2004.


35

Lexical ResourcesSenticNet(*)

(*) Cambria, Erik, Daniel Olsher, and Dheeraj Rajagopal. "SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis." Twenty-eighth AAAI conference on artificial intelligence. 2014.

Inspired by the Hourglass of Emotions model

Each term is represented of the ground of the intensity of four basic emotional dimensions (sensitivity, aptitude, attention, pleasantness)

The activation level of each dimension defines 16 basic emotions


36


According to the triggered emotions, each term is provided with an aggregated polarity score


37


SenticNet models a sentiment score for some bigrams and trigrams as well!


38

Lexical ResourcesMPQA(*)

(*) Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann. "Recognizing contextual polarity in phrase-level sentiment analysis." Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 2005.

Each term is (manually) provided

with a discrete sentiment score

+1 positive0 neutral

-1 negative


39

Lexical ResourcesComparison

Resource Coverage (terms)

SentiWordNet 117,659

WordNet Affect 200

SenticNet 14,000

MPQA 8,222



41

Lexical Resources


Score calculation

SentiWordNetGiven a term, score(tj) is the

mean of the sentiment score of

all the possible synsets of tj

0.75 + 0 + 1 +1score(good) = = 4

0.687

score(benevolence) =

42

Lexical Resources


Score calculationWordNet Affect

Given a term, score(tj), WordNet Affect hierarchy is

climbed until an A-Label which occur in SentiWordNet is found.

tj inherits the sentiment score of the A-Label

score(good) = 0.339

43

Lexical Resources


Score calculation

SenticNetGiven a term,

score(tj), SenticNet APIs are queried and sentiment

score is extracted

0.883score(good) =

44

Lexical Resources


Score calculation

MPQAGiven a term,

score(tj), MPQA Lexicon are queried and

sentiment score is extracted

1score(good) =

45

Methodology


Experimental EvaluationResearch Hypothesis

46

1. How do the different versions of the algorithm perform with respect to state-of-the-art datasets?

2. What is the best lexical resource to detect the polarity of microblog posts?


Experimental EvaluationDescription of the datasets

47

• SemEval-2013 • 14,435 Tweets

• 8,180 training • 3,255 test • Positive, Negative, Neutral

• STS Dataset • 1,600,000 Tweets

• only 359 test • Positive, Negative


Experimental EvaluationStatistics about Coverage

48

Lexicon SemEval-2013-Test STS-Test

Vocabulary Size 18,309 6,711

SentiWordNet 4,314 883

WordNet-Affect 149 48

MPQA 897 224

SenticNet 1,497 326


Experiment 1

49

Intra-Lexicons evaluation


significant (p < 0,0001)

Basic

Normalized

Emphasized

Norm-Emph

45 50 55 60 65

58,99

58,65

58,1

57,67

Experiment 1

50

SemEval :: SentiWordNet

Emphasis and Normalization improve the accuracy

norm vs norm+emph


Basic

Normalized

Emphasized

Norm-Emph

45 50 55 60 65

55,08

53,95

55,05

53,92

Experiment 1

51

SemEval :: WordNet Affect

Emphasis and Normalization improve the accuracy

not significant


Basic

Normalized

Emphasized

Norm-Emph

45 50 55 60 65

58,1

58,25

57,97

58,03

Experiment 1

52

SemEval :: MPQA

Emphasis improves the accuracy. Normalization doesn’t.

not significant


Basic

Normalized

Emphasized

Norm-Emph

45 50 55 60 65

48,08

48,29

47,25

48,69

Experiment 1

53

SemEval :: SenticNet

No improvement

norm vs norm+emph

significant (p < 0,0001)


Experiment 1

54

General OutcomesSentiWordNet WordNet Affect MPQA

Emphasis leads to improvements (7 out of 8 comparisons).

1.2.

SenticNet

Normalization doesn’t. (1 out of 4 comparisons)


Basic

Normalized

Emphasized

Norm-Emph

60 63,75 67,5 71,25 75

71,59

71,31

72,42

71,87

Experiment 1

55

STS :: SentiWordNet

Normalization improves the accuracy. Emphasis doesn’t

not significantgaps


Basic

Normalized

Emphasized

Norm-Emph

60 63,75 67,5 71,25 75

62,95

62,96

62,67

62,95

Experiment 1

56

STS :: WordNet Affect

not significantgaps

Emphasis improves the accuracy. Normalization doesn’tCataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014

Basic

Normalized

Emphasized

Norm-Emph

60 63,75 67,5 71,25 75

70,76

69,92

70,75

69,54

Experiment 1

57

STS :: MPQA

not significantgaps

Both Emphasis and Normalization improve the accuracy.Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014

Basic

Normalized

Emphasized

Norm-Emph

70 71,75 73,5 75,25 77

74,65

73,82

74,65

74,37

Experiment 1

58

STS :: SenticNet

Normalization improves the accuracy. Emphasis doesn’tCataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014

not significant

Experiment 1

59

General OutcomesSentiWordNet WordNet Affect MPQA

Controversial behavior (normalization typically improves, emphasis doesn’t)

1.2.

SenticNet

Little statistical significance (small dataset)


Experiment 2

60

Inter-Lexicons evaluation


Experiment 2

61

Comparison between lexicons


Accu

racy

0

20

40

60

80

SemEval-2013 STS

70,76

58,2562,96

55,08

74,65

48,69

72,42

58,99

SentiWordNet SenticNet WordNet-Affect MPQA

Experiment 2

62



Accu

racy

0

20

40

60

80

SemEval-2013 STS

70,76

58,2562,96

55,08

74,65

48,69

72,42

58,99


SentiWordNet is the best-performing configuration on SemEval data

Experiment 2

63



Accu

racy

0

20

40

60

80

SemEval-2013 STS

70,76

58,2562,96

55,08

74,65

48,69

72,42

58,99


MPQA well-performs on SemEval data

Experiment 2

64



Accu

racy

0

20

40

60

80

SemEval-2013 STS

70,76

58,2562,96

55,08

74,65

48,69

72,42

58,99


SenticNet has a controversial behavior: worst on SemEval - best on STS

Experiment 2

65



Accu

racy

0

20

40

60

80

SemEval-2013 STS

70,76

58,2562,96

55,08

74,65

48,69

72,42

58,99


Reason: SenticNet can hardly classify neutral Tweets (threshold learning?)

Experiment 2

66



Accu

racy

0

20

40

60

80

SemEval-2013 STS

70,76

58,2562,96

55,08

74,65

48,69

72,42

58,99


SentiWordNet and MPQA confirm their performance on STS

Experiment 2

67



Accu

racy

0

20

40

60

80

SemEval-2013 STS

70,76

58,2562,96

55,08

74,65

48,69

72,42

58,99


Poor coverage negatively influences Wordnet-Affect performances

Experiment 2

68

Statistical Analysis


Accu

racy

0

20

40

60

80

SemEval-2013 STS

70,76

58,2562,96

55,08

74,65

48,69

72,42

58,99


= not significant gap = significant gap

p < 0,11p < 0,0001bestbest p < 0,42p < 0,0001 p < 0,001 p < 0,50

Experiment 2

69

Conclusions


Accu

racy

0

20

40

60

80

SemEval-2013 STS

70,76

58,2562,96

55,08

74,65

48,69

72,42

58,99


p < 0,11p < 0,0001bestbest p < 0,42p < 0,0001 p < 0,001 p < 0,50

= best-performing lexicons

Conclusions


Lessons Learned

71

Comparison of 4 state-of-the-art resourcesSentiWordNet - SenticNet - MPQA - WordNet Affect

Evaluation.Research Question: What is the impact of each lexical resource in the task of polarity classification?

MPQA and SentiWordNet typically overcome other resources (interesting result, due to the smaller coverage of MPQA)

SenticNet behavior is worth to be deepen investigated

INVESTIGATION ABOUT THE EFFECTIVENESS OF LEXICAL RESOURCES IN POLARITY CLASSIFICATION OF MICROBLOG POSTS


1.2.

Future Research

72

Evaluation against different datasets and with more lexical results;

Better tuning of parameters (classification threshold) , integration of more complex syntactic structures, merging lexical resources

Integration of the algorithm in a recommendation framework to exploit sentiment-based information to model user interests


questions?Cataldo Musto, Ph.D

[email protected]

mailto:[email protected]

Technology

A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts