35
Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium 2014 Associate Professor Co-founder & Chief Scientist Wednesday, March 5, 14

Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

  • Upload
    others

  • View
    9

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

Practical Sentiment Analysis Tutorial

Jason Baldridge @jasonbaldridge

Sentiment Analysis Symposium 2014

Associate Professor Co-founder & Chief Scientist

Wednesday, March 5, 14

Page 2: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Polarity classification [slide from Lillian Lee]

Consider just classifying an avowedly subjective text unit as either positive or negative (“thumbs up or “thumbs down”).

One application: review summarization.

Elvis Mitchell, May 12, 2000: It may be a bit early to make such judgments, but Battlefield Earth may well turn out to be the worst movie of this century.

Can’t we just look for words like “great”, “terrible”, “worst”?

Yes, but ... learning a sufficient set of such words or phrases is an active challenge.

17

Wednesday, March 5, 14

Page 3: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Using a lexicon [slide from Lillian Lee]

From a small scale human study:

18

Proposed word lists Accuracy

Subject 1

Positive: dazzling, brilliant, phenomenal, excellent, fantasticNegative: suck, terrible, awful, unwatchable, hideous 58%

Subject 2

Positive: gripping, mesmerizing, riveting, spectacular, cool, awesome, thrilling, badass, excellent, moving, exciting Negative: bad, cliched, sucks, boring, stupid, slow

64%

Automatically determined (from data)

Positive: love, wonderful, best, great, superb, beautiful, still Negative: bad, worst, stupid, waste, boring, ?, !

69%

Wednesday, March 5, 14

Page 4: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Polarity words are not enough [slide from Lillian Lee]

Can’t we just look for words like “great” or “terrible”?

Yes, but ...

This laptop is a great deal.

A great deal of media attention surrounded the release of the new laptop.

This laptop is a great deal ... and I’ve got a nice bridge you might be interested in.

19

Wednesday, March 5, 14

Page 5: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Polarity words are not enough

Polarity flippers: some words change positive expressions into negative ones and vice versa.

Negation: America still needs to be focused on job creation. Not among Obama's great accomplishments since coming to office !! [From a tweet in 2010]

Contrastive discourse connectives: I used to HATE it. But this stuff is yummmmmy :) [From a tweet in 2011 -- the tweeter had already bolded “HATE” and “But”!]

Multiword expressions: other words in context can make a negative word positive:

That movie was shit. [negative]

That movie was the shit. [positive] (American slang from the 1990’s)

20

Wednesday, March 5, 14

Page 6: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

More subtle sentiment (from Pang and Lee)

With many texts, no ostensibly negative words occur, yet they indicate strong negative polarity.

“If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.” (review by Luca Turin and Tania Sanchez of the Givenchy perfume Amarige, in Perfumes: The Guide, Viking 2008.)

“She runs the gamut of emotions from A to B.” (Dorothy Parker, speaking about Katharine Hepburn.)

“Jane Austen’s books madden me so that I can’t conceal my frenzy from the reader. Every time I read ‘Pride and Prejudice’ I want to dig her up and beat her over the skull with her own shin-bone.” (Mark Twain.)

21

Wednesday, March 5, 14

Page 7: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Thwarted expectations (from Pang and Lee)

22

This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.

There are also highly negative texts that use lots of positive words, but ultimately are reversed by the final sentence. For example

This is referred to as a thwarted expectations narrative because in the final sentence the author sets up a deliberate contrast to the preceding discourse, giving it more impact.

Wednesday, March 5, 14

Page 8: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Polarity classification: it’s more than positive and negative

Positive: “As a used vehicle, the Ford Focus represents a solid pick.”

Negative: “Still, the Focus' interior doesn't quite measure up to those offered by some of its competitors, both in terms of materials quality and design aesthetic.”

Neutral: “The Ford Focus has been Ford's entry-level car since the start of the new millennium.”

Mixed: “The current Focus has much to offer in the area of value, if not refinement.”

23

http://www.edmunds.com/ford/focus/review.html

Wednesday, March 5, 14

Page 9: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Other dimensions of sentiment analysis

Subjectivity: is an opinion even being expressed? Many statements are simply factual.

Target: what exactly is an opinion being expressed about?

Important for aggregating interesting and meaningful statistics about sentiment.

Also, it affects how the language use indicates polarity: e.g, unpredictable is usually positive for movie reviews, but is very negative for a car’s steering

Ratings: rather than a binary decision, it is often of interest to provide or interpret predictions about sentiment on a scale, such as a 5-star system.

24

Wednesday, March 5, 14

Page 10: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Other dimensions of sentiment analysis

Perspective: an opinion can be positive or negative depending on who is saying it

entry-level could be good or bad for different people

it also affects how an author describes a topic: e.g. pro-choice vs pro-life, affordable health care vs obamacare.

Authority: was the text written by someone whose opinion matters more than others?

it is more important to identify and address negative sentiment expressed by a popular blogger than a one-off commenter or supplier of a product reviewer on a sales site

follower graphs (where applicable) are very useful in this regard

Spam: is the text even valid or at least something of interest?

many tweets and blog post comments are just spammers trying to drive traffic to their sites

25

Wednesday, March 5, 14

Page 11: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Sentiment lexicons: Bing Liu’s opinion lexicon

Bing Liu maintains and freely distributes a sentiment lexicon consisting of lists of strings.

Distribution page (direct link to rar archive)

Positive words: 2006

Negative words: 4783

Useful properties: includes mis-spellings, morphological variants, slang, and social-media mark-up

Note: may be used for academic and commercial purposes.

39Slide by Chris Potts

Wednesday, March 5, 14

Page 12: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Sentiment lexicons: MPQA

The MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon is maintained by Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (Wiebe, Wilson, and Cardie 2005).

Note: distributed under a GNU Public License (not suitable for most commercial uses).

40Slide by Chris Potts

Wednesday, March 5, 14

Page 13: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Other sentiment lexicons

SentiWordNet (Baccianella, Esuli, and Sebastiani 2010) attaches positive and negative real-valued sentiment scores to WordNet synsets (Fellbaum1998).

Note: recently changed license to permissive, commercial-friendly terms.

Harvard General Inquirer is a lexicon attaching syntactic, semantic, and pragmatic information to part-of-speech tagged words (Stone, Dunphry, Smith, and Ogilvie 1966).

Linguistic Inquiry and Word Counts (LIWC) is a proprietary database consisting of a lot of categorized regular expressions. Its classifications are highly correlated with those of the Harvard General Inquirer.

41Slide by Chris Potts

Wednesday, March 5, 14

Page 14: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

When you have a big lexicon, use it!

42

Debate08 HCR STS IMDB

Super simple

Small lexicon

Opinion lexicon

20.5 21.6 19.4 27.4

21.5 22.1 25.5 51.4

47.8 42.3 62.0 73.6

Using Bing Liu’s Opinion Lexicon, scores across all datasets go up dramatically.

Well above (three-way) coin-flipping!

Wednesday, March 5, 14

Page 15: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Features for classification

53

That new 300 movie looks sooo friggin BAD ASS .

Wednesday, March 5, 14

Page 16: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Features for classification

53

That new 300 movie looks sooo friggin BAD ASS .

w=thatw=neww=300w=moview=looksw=sooow=frigginw=badw=ass

Wednesday, March 5, 14

Page 17: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Features for classification

53

That new 300 movie looks sooo friggin BAD ASS .

w=thatw=neww=300w=moview=looksw=sooow=frigginw=badw=ass

w=so

Wednesday, March 5, 14

Page 18: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Features for classification

53

That new 300 movie looks sooo friggin BAD ASS .

w=thatw=neww=300w=moview=looksw=sooow=frigginw=badw=ass

w=so

bi=<START>_that bi=that_newbi=new_300 bi=300_moviebi=movie_looksbi=looks_sooobi=sooo_frigginbi=friggin_badbi=bad_assbi=ass_.bi=._<END>

Wednesday, March 5, 14

Page 19: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Features for classification

53

That new 300 movie looks sooo friggin BAD ASS .

w=thatw=neww=300w=moview=looksw=sooow=frigginw=badw=ass

art adj noun noun verb adv adv adj noun punc

w=so

bi=<START>_that bi=that_newbi=new_300 bi=300_moviebi=movie_looksbi=looks_sooobi=sooo_frigginbi=friggin_badbi=bad_assbi=ass_.bi=._<END>

Wednesday, March 5, 14

Page 20: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Features for classification

53

That new 300 movie looks sooo friggin BAD ASS .

w=thatw=neww=300w=moview=looksw=sooow=frigginw=badw=ass

art adj noun noun verb adv adv adj noun punc

w=so

bi=<START>_that bi=that_newbi=new_300 bi=300_moviebi=movie_looksbi=looks_sooobi=sooo_frigginbi=friggin_badbi=bad_assbi=ass_.bi=._<END>

wt=that_artwt=new_adjwt=300_nounwt=movie_nounwt=looks_verbwt=sooo_advwt=friggin_advwt=bad_adjwt=ass_noun

Wednesday, March 5, 14

Page 21: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Features for classification

53

That new 300 movie looks sooo friggin BAD ASS .

w=thatw=neww=300w=moview=looksw=sooow=frigginw=badw=ass

art adj noun noun verb adv adv adj noun punc

w=so

bi=<START>_that bi=that_newbi=new_300 bi=300_moviebi=movie_looksbi=looks_sooobi=sooo_frigginbi=friggin_badbi=bad_assbi=ass_.bi=._<END>

wt=that_artwt=new_adjwt=300_nounwt=movie_nounwt=looks_verbwt=sooo_advwt=friggin_advwt=bad_adjwt=ass_noun

NP

NP

NP

NP

NP

NPVP

S

Wednesday, March 5, 14

Page 22: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Features for classification

53

That new 300 movie looks sooo friggin BAD ASS .

w=thatw=neww=300w=moview=looksw=sooow=frigginw=badw=ass

art adj noun noun verb adv adv adj noun punc

w=so

bi=<START>_that bi=that_newbi=new_300 bi=300_moviebi=movie_looksbi=looks_sooobi=sooo_frigginbi=friggin_badbi=bad_assbi=ass_.bi=._<END>

wt=that_artwt=new_adjwt=300_nounwt=movie_nounwt=looks_verbwt=sooo_advwt=friggin_advwt=bad_adjwt=ass_noun

NP

NP

NP

NP

NP

NPVP

S

subtree=S_NP_movie-S_VP_looks-S_VP_NP_bad_ass

Wednesday, March 5, 14

Page 23: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Features for classification

53

That new 300 movie looks sooo friggin BAD ASS .

w=thatw=neww=300w=moview=looksw=sooow=frigginw=badw=ass

art adj noun noun verb adv adv adj noun punc

w=so

bi=<START>_that bi=that_newbi=new_300 bi=300_moviebi=movie_looksbi=looks_sooobi=sooo_frigginbi=friggin_badbi=bad_assbi=ass_.bi=._<END>

wt=that_artwt=new_adjwt=300_nounwt=movie_nounwt=looks_verbwt=sooo_advwt=friggin_advwt=bad_adjwt=ass_noun

NP

NP

NP

NP

NP

NPVP

S

subtree=NP_sooo_bad_ass

subtree=S_NP_movie-S_VP_looks-S_VP_NP_bad_ass

Wednesday, March 5, 14

Page 24: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Complexity of features

Features can be defined on very deep aspects of the linguistic content, including syntactic and rhetorical structure.

The models for these can be quite complex, and often require significant training material to learn them, which means it is harder to employ them for languages without such resources.

I’ll show an example for part-of-speech tagging in a bit.

Also: the more fine-grained the feature, the more likely it is rare to see in one’s training corpus. This requires more training data, or effective semi-supervised learning methods.

54

Wednesday, March 5, 14

Page 25: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Recall the four sentiment datasets

55

Dataset Topic Year # Train # Dev #Test Reference

Debate08Obama vs McCain debate

2008 795 795 795Shamma, et al. (2009) "Tweet the Debates: Understanding Community Annotation of

Uncollected Sources."

HCRHealth care

reform2010 839 838 839

Speriosu et al. (2011) "Twitter Polarity Classification with

Label Propagation over Lexical Links and the Follower Graph."

STS(Stanford)

Twitter Sentiment

2009 - 216 -Go et al. (2009) "Twitter

sentiment classification using distant supervision"

IMDBIMDB movie

reviews2011 25,000 25,000 -

Mas et al. (2011) "Learning Word Vectors for Sentiment

Analysis"

Wednesday, March 5, 14

Page 26: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Logistic regression, in domain

56

Debate08 HCR STS IMDB

Opinion Lexicon + IMDB1000

Logistic Regressionw/ bag-of-words

Logistic Regressionw/ extended

features

62.4 49.1 56.0 66.1

60.9 56.0(no labeled training set) 86.7

70.2 60.5 -

When training on labeled documents from the same corpus.

Models trained with Liblinear (via ScalaNLP Nak)

Note: for IMDB, the logistic regression classifier only predicts positive or negative (because there are no neutral training examples), effectively making it easier than for the lexicon-based method.

Wednesday, March 5, 14

Page 27: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Logistic regression (using extended features), cross-domain

57

Debate08 HCR STS

Debate08

HCR

Debate08+HCR

70.2 51.3 56.5

56.4 60.5 54.2

70.3 61.2 59.7Trai

ning

cor

pora

Evaluation corpora

In domain training examples add 10-15% absolutely accuracy (56.4 -> 70.2 for Debate08, and 51.3 -> 60.5 for HRC).

More labeled examples almost always help, especially if you have no in-domain training data (e.g. 56.5/54.2 -> 59.7 for STS).

Wednesday, March 5, 14

Page 28: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

Aspect-based sentiment analysis

Why NLP is hardSentiment analysis overview

Document classification

VisualizationSemi-supervised learning

Stylistics & author modelingBeyond text

Wrap up

Wednesday, March 5, 14

Page 29: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Is it coherent to ask what the sentiment of a document is?

Documents tend to discuss many entities and ideas, and they can express varying opinions, even toward the same entity.

This is true even in tweets, e.g.

positive towards the HCR bill

negative towards Mitch McConnell

67

Here's a #hcr proposal short enough for Mitch McConnell to read: pass the damn bill now

Wednesday, March 5, 14

Page 30: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Fine-grained sentiment

Two products, iPhone and Blackberry

Overall positive to iPhone, negative to Blackberry

Postive aspect/features of iPhone: touch screen, voice quality. Negative (for the mother): expensive.

68Slide adapted from Bing Liu

Wednesday, March 5, 14

Page 31: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Components of fine-grained analysis

Opinion targets: entities and their features/aspects

Sentiment orientations: positive, negative, or neutral

Opinion holders: persons holding the opinions

Time: when the opinions are expressed

69Slide adapted from Bing Liu

Wednesday, March 5, 14

Page 32: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

An entity e is a product, person, event, organization, or topic. e is represented as

a hierarchy of components, sub-components, and so on.

Each node represents a component and is associated with a set of attributes of the component.

An opinion can be expressed on any node or attribute of the node.

For simplicity, we use the term aspects (features) to represent both components and attributes.

Entity and aspect (Hu and Liu, 2004; Liu, 2006)

70

iPhone

screen battery

{cost,size,appearance,...}

{battery_life,size,...}{...} ...

Slide adapted from Bing Liu

Wednesday, March 5, 14

Page 33: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Opinion definition (Liu, Ch. in NLP handbook, 2010)

An opinion is a quintuple (e,a,so,h,t) where:

e is a target entity.

a is an aspect/feature of the entity e.

so is the sentiment value of the opinion from the opinion holder h on feature a of entity e at time t. so is positive, negative or neutral (or more granular ratings).

h is an opinion holder.

t is the time when the opinion is expressed.

Examples from the previous passage:

71

(iPhone, GENERAL, +, Abc123, 5-1-2008) (iPhone, touch_screen, +, Abc123, 5-1-2008) (iPhone, cost, -, mother_of(Abc123), 5-1-2008)

Slide adapted from Bing Liu

Wednesday, March 5, 14

Page 34: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

The goal: turn unstructured text into structured opinions

Given an opinionated document (or set of documents)

discover all quintuples (e,a, so, h, t)

or solve a simpler form of it, such as the document level task considered earlier

Having extracted the quintuples, we can feed them into traditional visualization and analysis tools.

72Slide adapted from Bing Liu

Wednesday, March 5, 14

Page 35: Practical Sentiment Analysis Tutorial - BIUu.cs.biu.ac.il/~89-680/sentiment.pdf · Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium

© 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014

Several sub-problems

e is a target entity: Named Entity Extraction (more)

a is an aspect of e: Information Extraction

so is sentiment: Sentiment Identification

h is an opinion holder: Information/Data Extraction

t is the time: Information/Data Extraction

73Slide adapted from Bing Liu

All of these tasks can make use of deep language processing methods, including parsing, coreference, word sense disambiguation, etc.

Wednesday, March 5, 14