80
Sentiment Analysis with Neural Networks Duyu Tang Associate Researcher Natural Language Computing Group Microsoft Research Meishan Zhang Associate Professor School of Computer Science and Technology Heilongjiang University

Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment Analysis with Neural Networks

Duyu Tang

Associate Researcher

Natural Language Computing Group

Microsoft Research

Meishan Zhang

Associate Professor

School of Computer Science and Technology

Heilongjiang University

Page 2: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Outline

Definition of sentiment analysis

Sentiment-specific word embedding (Duyu)

Sentence composition (Meishan)

Document composition (Duyu)

Fine-grained sentiment classification/extraction (Meishan)

2

Page 3: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment/Opinion

Why are sentiments/opinions so important? Sentiments are key influencers of our behaviors.

Our beliefs and perceptions of reality are conditioned on how others see the world.

Whenever we need to make a decision we often seek out others’ opinions. True for both individuals and organizations

It is simply the “human nature” We want to express our opinions

We also want to hear others’ opinions

3

Page 4: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment Analysis

Computational study of opinions, sentiments, appraisal, and emotions expressed in text. Reviews of movies, hotels, restaurants, etc.

Reviews of products

Comments for news

Tweets

Yelp/Dianping/TripAdvisor/RT/IMDB, etc.

Amazon/Taobao

Twitter/FB/Weibo

4Bing Liu. 2012. Sentiment analysis and opinion mining. In Synthesis lectures on human language technologies, 1-167.

Page 6: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

6

Page 8: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment Analysis

Definition: A sentiment is a quadruple Opinion targets: entities/aspects to be evaluated

Sentiments: positive and negative

Opinion holders: persons who hold opinions

Time: when opinions are given

Id: Alice on 1-May-2014 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. However, the price is a little high“

8

Target Sentiment Holder Time

iPhone positive Alice 1-May-2014

touch screen positive Alice 1-May-2014

price negative Alice 1-May-2014

Bing Liu. 2012. Sentiment analysis and opinion mining. In Synthesis lectures on human language technologies, 1-167.

Page 9: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment Analysis Tasks

Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Unstructured text Structured data

TasksWord level sentiment analysis

Sentiment/Document level sentiment classification

Target/Aspect level sentiment classification

Aspect extraction

9

Page 10: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment Classification

Input Text (sentences, reviews, tweets, etc.)

Target/Aspect

Output Label: positive, negative or neutral

Score

10

Page 11: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment Classification

Positive

Positive Negative

The price is great and the service even better Positive

Sentiment/Document level

Target/Aspect level

11

Page 12: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Representation Learning is Important for Sentiment Analysis

Inferring the sentiment of text requires us to deeply understand the semantic meanings of text.

Dominating (including state-of-the-art) approaches are machine learning driven, whose performances highly depend on the selection of feature representation.

It is desirable to learn text representation from data, leverage the knowledge from big data, less depend on feature engineering and make progress towards AI.

12

Saif Mohammad, Svetlana Kiritchenko, Xiaodan Zhu. 2013. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. In SemEval 2013.

Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Trans. Pattern Analysis and Machine Intelligence

Page 13: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Outline

Definition of sentiment analysis

Sentiment-specific word embedding (Duyu)

Sentence composition (Meishan)

Document composition (Duyu)

Fine-grained sentiment classification/extraction (Meishan)

13

Page 14: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Word Embedding

Traditional: one-hot representationWords are treated atomic, one-hot representation

Embedding Continuous representation of meaning

14

Microsoft = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0]MSFT = [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]

Microsoft & MSFT = 0

Microsoft =

1.0450.912

-0.894-1.0530.459

MSFT Beijing

Shenzhen

MS

Microsoftx2

x1

Page 15: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Context-based Models

Neural language model Predict based approach

Objective function

15

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137-1155.

Loss(target word | context words; Vectors)

Page 16: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Context-based Models

Ranking based approach Distinguish between real and corrupted word sequence

Objective function

16

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 12, 2493-2537.

Lookup

Linear

HTanh

Linear

𝑤𝑖−𝑐 𝑤𝑖−𝑐+1 … 𝑤𝑖 … 𝑤𝑖+𝑐−1 𝑤𝑖+𝑐

Page 17: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Context-based Models

17

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Almost) from Scratch. J. Mach. Learn. Res. 12, 2493-2537.

Lookup

Linear

HTanh

Linear

𝑤𝑖−𝑐 𝑤𝑖−𝑐+1 … 𝑤𝑖 … 𝑤𝑖+𝑐−1 𝑤𝑖+𝑐

E E E E E

𝑤𝑖−𝑐 𝑤𝑖−𝑐+1 … 𝑤𝑖′ … 𝑤𝑖+𝑐−1 𝑤𝑖+𝑐

E E E E E

𝑾𝟏𝒙 + 𝒃𝟏

𝑾𝟐𝒙 + 𝒃𝟐

𝑾𝟏𝒙 + 𝒃𝟏

𝑾𝟐𝒙 + 𝒃𝟐

E

𝑤𝑖

1

d

1 |V|

Page 18: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Context-based Models

Predict based approach word2vec

Objective function

18

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS, 3111-3119.

Loss(context words | target word; Vectors)

Page 19: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Measuring Linguistic Regularity Syntactic/Sementic Test

Linguistic Regularities in Continuous Space Word Representations (Mikolov, et al. 2013)

Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL 2013

Page 20: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Word Embedding Results (Web 130G)

20

electronic products

Bosses of China

Bosses of Company

comparative adjective

feeling adjective

Page 21: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment-Specific Word Embedding

Existing embedding learning models are context-based A word is represented by the company it keeps [Firth, J.R. 1959]

21

… formed the good habit of …

… formed the bad habit of …Same contexts

The words with similar contexts but opposite sentiment polarity are mapped into close vectors.

x1

x2

bad

good

MSFT

Beijing

Shenzhen

MS

Microsoft

Page 22: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment-Specific Word Embedding

The intuition Use contexts of words and sentiment of texts (e.g. sentences)

Solution: Incorporate sentiment information into standard context-based approach

22

Lookup

Linear

HTanh

Linear

𝑤𝑖−𝑐 𝑤𝑖−𝑐+1 … 𝑤𝑖 … 𝑤𝑖+𝑐−1 𝑤𝑖+𝑐

contextcontexts

Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, Ming Zhou. 2016. Sentiment Embeddings with Applications to Sentiment Analysis. IEEE Transactions on Knowledge and Data Engineering (TKDE).

Page 23: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment-Specific Word Embedding

The intuition Use contexts of words and sentiment of texts (e.g. sentences)

Solution: Incorporate sentiment information into standard context-based approach

23

Lookup

Linear

HTanh

Linear

𝑤𝑖−𝑐 𝑤𝑖−𝑐+1 … 𝑤𝑖 … 𝑤𝑖+𝑐−1 𝑤𝑖+𝑐

positivenegative

sentiment

Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, Ming Zhou. 2016. Sentiment Embeddings with Applications to Sentiment Analysis. IEEE Transactions on Knowledge and Data Engineering (TKDE).

contexts

Page 24: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentiment-Specific Word Embedding

The intuition Use contexts of words and sentiment of texts (e.g. sentences)

Solution: Incorporate sentiment information into standard context-based approach

24

Lookup

Linear

HTanh

Linear

𝑤𝑖−𝑐 𝑤𝑖−𝑐+1 … 𝑤𝑖 … 𝑤𝑖+𝑐−1 𝑤𝑖+𝑐

positivenegativecontextcontexts

contexts + sentiment

sentiment

𝑙𝑜𝑠𝑠 𝑡, 𝑡𝑟 = 𝛼 𝑙𝑜𝑠𝑠𝑐𝑤 𝑡, 𝑡𝑟 + 1 − 𝛼 𝑙𝑜𝑠𝑠𝑠(𝑡)

Duyu Tang, Furu Wei, Bing Qin, Nan Yang, Ting Liu, Ming Zhou. 2016. Sentiment Embeddings with Applications to Sentiment Analysis. IEEE Transactions on Knowledge and Data Engineering (TKDE).

Page 25: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Model Training

Use emotions/smileys as sentiment signals to collect massive tweets as training dataWe use 5 positive emoticons, 3 negative emoticon [Hu et al. 2013]

5 million positive and 5 million negative tweets from April, 2013

Parameter LearningBack-propagation, SGD

25

Positive Emoticons :) : ) :-) :D =)

Negative Emoticons :( : ( :-(

Page 26: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Querying Similar Words

goodsweetfavoritecoolmovieexcitedamazingawesomewelllovegreatfavouritehappy

26

bad cry wrong hard alone annoying hate tired lost happened pain sorry jealous mad

Page 27: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Querying Similar Words

Find top 𝑲 nearest neighbors in the embedding space, and calculate the accuracy of sentiment consistency

We conduct experiments on existing sentiment lexicons

x1

cool

awesome

x2

great

bad

nice

interesting

fantastic

excellent

terrible

love

good

27

Lexicon #Positive #Negative #Total

BL-Lex 2,006 4,780 6,786

MPQA-Lex 2,301 4,150 6,451

NRC-Lex 2,231 3,324 5,555

Page 28: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Querying Similar Words

Experimental results

28

65.5

60.158.8

70.8

6563

73.6

68.466.2

78.3

73.1

69.4

50

55

60

65

70

75

80

BL-Lex MPQA-Lex NRC-Lex

C&W word2vec SSWE-s SSWE-Hy

Page 29: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Twitter Sentiment Classification

Determine the sentiment polarity of a tweet

Run experiment on benchmark dataset in SemEval 2013

29

TrainingData

Learning Algorithm

FeatureRepresentation

SentimentClassifier

MassiveTweets

EmbeddingLearning

Dataset #Positive #Negative #Total

Training 2,642 994 3,636

Development 408 219 627

Test 1,570 601 2,171

SVM

Page 30: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

From word vector to tweet vector

Each word is represented by a 50-dimension vector

Each sentence/tweet is represented by a 150-dimension vector (50 dimensions for mean, 50 dimensions for max, 50 dimensions for min)

Optimal: from words to phrasesLearn embeddings for ngrams similar to unigrams

30

Page 31: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Twitter Sentiment Classification

Compare with different classification algorithms

31

73.772.1

83.882.3

85.2

60

65

70

75

80

85

Distant+Ngrams SVM+ngrams SVM+Feature SVM+SSWE SVM+Feature+SSWE

Page 32: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Twitter Sentiment Classification

Compare with different embedding learning algorithms

32

70.2 70.7

79.8

82.3

60

65

70

75

80

85

C&W word2vec SSWE-s SSWE-Hy

Page 33: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Building Sentiment Lexicon

A sentiment lexicon is a list of words, each of which is assigned with a positive/negative score

We treat lexicon construction as a classification problemTrain a word level sentiment classifier by regarding word embedding as

features

33

Positive words Negative words

excellent (0.99); awesome (0.98); good (0.97) bad (-0.98); poor (-0.97); awful (-0.76)

good

bad

love

poor

Page 34: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Building Sentiment Lexicon

The framework

34

SentimentClassifier

SentimentLexicon

Word Embedding

NEG: goon looser

Sentiment Seeds

Tweets with Emoticons

Soooo nice~ :)

It’s horrible :(

Seed Expansion

EmbeddingLearning

POS: good :)NEG: poor :(

NEU: when he

Training Data

POS: wanted fave

NEU: again place

[1.31,0.97]good:

[0.99,1.17]coool:

[-0.81,-0.7]bad:

[-0.8,-0.72]mess:

Learning Algorithm

Page 35: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Building Sentiment Lexicon

Lexicon scale

35

Lexicon #Positive #Negative #Total

BL-Lex 2,006 4,780 6,786

MPQA-Lex 2,301 4,150 6,451

NRC-Lex 2,231 3,324 5,555

HashtagLex 32,048 22,081 54,129

Sentiment140Lex 38,312 24,156 62,468

Our Lexicon 31,591 33,012 64,603

Manually labeled

Automatically

generated

Page 36: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Building Sentiment Lexicon

Applying sentiment lexicon as features to Twitter sentiment classification

Feature templatestotal count of tokens in the tweet with score greater than 0;

the sum of the scores for all tokens in the tweet;

the maximal score;

the non-zero score of the last token in the tweet;

36

Saif Mohammad, Svetlana Kiritchenko, Xiaodan Zhu. 2013. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. In SemEval 2013.

Page 37: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Building Sentiment Lexicon

Compare with different sentiment lexicons

37

66.9

65

62.3 63

70.8

73.7

50

55

60

65

70

75

BL-Lex MPQA-Lex NRC-Lex HashtagLex Sent140Lex Our Lexicon

Page 38: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Building Sentiment Lexicon

Compare with different embedding learning algorithms

38

6869.8

72.873.7

50

55

60

65

70

75

C&W word2vec SSWE-s SSWE-Hy

Page 39: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Extend SkipGram

Extend SkipGram model to encode sentiment information

ei

wi-2 wi-1 wi+1 wi+2

wi

ei

wi-2 wi-1 wi+1 wi+2

wi

polj

sej

sjSkipGram + Sentiment

=

39

Duyu Tang, Furu Wei, Bing Qin, Ting Liu, Ming Zhou. 2014. Building Large-Scale Twitter-Specific Sentiment Lexicon : A Representation Learning Approach. International Conference on Computational Linguistics(COLING).

Page 40: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Extension on SSWE

Incorporate topic information predicting the topic distribution of text based on input n-grams

the topic distribution is generated using LDA (Blei et al., 2003)

40

Lookup

Linear

HTanh

Linear

𝑤𝑖−𝑐 𝑤𝑖−𝑐+1 … 𝑤𝑖 … 𝑤𝑖+𝑐−1 𝑤𝑖+𝑐

positivenegativecontext

contexts + sentiment + topic

topic

𝑙𝑜𝑠𝑠 𝑡, 𝑡𝑟 = 𝛼 𝑙𝑜𝑠𝑠𝑐𝑤 𝑡, 𝑡𝑟 + 𝛽 𝑙𝑜𝑠𝑠𝑠 𝑡 + 1 − 𝛼 − 𝛽 𝑙𝑜𝑠𝑠𝑡𝑜𝑝(𝑡)

Yafeng Ren, Yue Zhang, Meishan Zhang, and Donghong Ji. 2016. Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings. In Proceedings of AAAI.

1 M

topic

sentiment

contextsSoftmax

Page 41: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

42

Page 42: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Aspect Level Sentiment Classification

Task definition Input:Sentence + Aspect

Output: The sentiment of the sentence towards the aspect

43

Sentence Aspect Polarity

great food but the service was dreadful food positive

great food but the service was dreadful service negative

Page 43: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Existing Solutions

Feature based SVM Cons: Rely on feature engineering, ….

44

DataLearning

AlgorithmRepresentation Classifier

Lexicon Parser

Page 44: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Existing Solutions

LSTM RNN Pros: Learning from data

Cons: Could not explicitly reveal the importance/contribution of context words with regard to the aspect

45

LSTML

𝑤1

LSTML

𝑤𝑙

……

ℎ1 ℎ𝑙

LSTML

𝑤𝑙+1

LSTML

𝑤𝑟−1

……

ℎ𝑟−1 ℎ𝑙+1

LSTMR

𝑤𝑙+1

LSTMR

𝑤𝑟−1

……

ℎ𝑟

LSTMR

𝑤𝑟

LSTMR

𝑤𝑛

……

ℎ𝑛

target words target words

ℎ𝑟−1ℎ𝑙+1

Softmax

Duyu Tang, Bing Qin, Xiaocheng Feng, Ting Liu. 2016. Target-Dependent Sentiment Classification with Long Short Term Memory . http://arxiv.org/abs/1512.01100.

Page 45: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

LinearAttention

∑hop 1

LinearAttention

∑hop 3

LinearAttention

∑hop 2

Deep Memory Network

The model

46

great food but the service was dreadful

Aspect

Context Context

Sentence:

Memory

service

Softmax

polarity

Duyu Tang, Bing Qin, Ting Liu. 2016. Aspect Level Sentiment Classification with Deep Memory Network . Conference on Empirical Methods in Natural Language Processing (EMNLP 2016).

Page 46: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Content based Attention

Calculate 𝑣𝑒𝑐 based on the representation of each piece of memory 𝑚𝑖

Calculate the attention weights 𝛼

47

Page 47: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Location Enhanced Attention

Each memory cell 𝑚𝑖 is calculated by elementwise multiplication between word vec 𝑒𝑖 and location vec 𝑣𝑖

48

𝑚𝑖𝑒𝑖 𝑣𝑖

Page 48: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Model Training

Supervised Learning, minimize cross-entropy error

Parameter Learning Use Glove vector, clamp the values

Back-propagation, SGD

49

Page 49: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Experimental Setting

We use two datasets from SemEval 2014

Evaluation metric: classification accuracy

50

Page 50: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Results

Compare with different classification algorithms

51

53.4

72.1

66.4568.13

61.22

72.2

65

80.89

74.2875.63

71.3

80.95

50

55

60

65

70

75

80

85

Majority SVM+Feature LSTM TDLSTM ContextAVG MemNet(9)

Laptop Restaurant

Page 51: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Results

The influence of the number of hops

52

67.66

71.1471.74

72.21 71.89 72.21 72.37 72.05 72.21

76.1

78.6179.06

79.87 80.14 80.05 80.32 80.1480.95

67

70

73

76

79

82

1 2 3 4 5 6 7 8 9

Laptop Restaurant

Page 52: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Visualize the Attention Weights

great food but the service was dreadful

53

Content-based Attention Location-enhanced Attention

Page 53: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

54

Page 54: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Document Level Sentiment Classification

Task Definition Input: A piece of document

Output: The overall sentiment/polarity expressed in the doc

Sentiment positive/negative

1-5 stars

55

I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. Despite it is a little expensive, I love it.

Page 55: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Lexicon based Approach

Basic idea Use the dominant polarity of the opinion words in the document to

determine its polarity

If positive/negative opinion prevails, the opinion document is regarded as positive/negative

Lexicon + Counting

Lexicon + Grammar Rule + Inference Method

56

Minqing Hu and Bing Liu. Mining and summarizing customer reviews. KDD: 168-177, 2004.Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, and Manfred Stede. Lexicon-Based Methods for SentimentAnalysis. Computational Linguistics: 37(2), 267-307. 2011.

Page 56: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Feature based SVM

Basic idea Treat sentiment classification simply as a special case of topic-based

categorization

With the two “topics” being positive sentiment and negative sentiment

Use machine learning approach (e.g. SVM/NB) + features

Pang et al. (2002) show that SVM + bag-of-word feature performs well. A very strong baseline for doc-level sentiment classification.

57

Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine LearningTechniques. EMNLP, 2002.

Page 57: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Latent N-Gram Analysis

Basic idea Project n-gram to low-dimensional latent

semantic space

Word -> Phrase -> Document

End-to-End training with SGD

58

Dmitriy Bespalov, Bing Bai, Yanjun Qi, Ali Shokoufandeh. Sentiment Classification Based on Supervised Latent n-gramAnalysis. Proceedings of the Conference on Information and Knowledge Management, 2011.

Page 58: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Paragraph Vector

Basic idea represents each document by a dense vector which is trained to

predict words in the document

Motivation bag-of-words features have two major weaknesses: they lose the ordering of

the words and they also ignore semantics of the words

59

Quoc Le, Tomas Mikolov. Distributed Representations of Sentences and Documents. In ICML 2014

Page 59: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Convolution NN

Basic ideaWord -> Sentence -> Document

60

Misha Denil, Alban Demiraj, Nal Kalchbrenner, Phil Blunsom, Nando de Freitas. Modelling, Visualising and SummarisingDocuments with a Single Convolutional Neural Network. arxiv.org. 1406.3830

Page 60: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Hierarchical NN

A human writes and reads an article in a hierarchical way.

61

I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. Despite it is a little expensive, I love it.

Word

Sentence

Document

Thought

Human writing Machine readingHuman reading

Word

Sentence

Document

Thought

Word

Sentence

Document

Thought

Page 61: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Hierarchical NN

62

Duyu Tang, Bing Qin, Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In EMNLP 2015.

Page 62: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Sentence Modeling

CNN with multiple filters Use unigram, bigram, trigram information

63

𝑊 𝑒1; 𝑒2; 𝑒3 + 𝑏

𝑒1 𝑒2 𝑒3

Filter length = 3

𝑊 𝑒2; 𝑒3; 𝑒4 + 𝑏

𝑒2 𝑒3 𝑒4

Page 63: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Recurrent Neural Network

Unfolded RNN for Language Modeling

64

ℎ0

𝑥0

ℎ1

𝑦1

<s>

the

𝑥𝑡−1

ℎ𝑡

𝑦𝑡

𝑥1

ℎ2

𝑦2

the

cat

𝑥2

ℎ3

𝑦3

cat

sat

𝑥5

ℎ6

𝑦5

𝑥4

ℎ5

𝑦5

ℎ7

𝑦6

on the mat

the mat </s>

𝑥6𝑥3

ℎ4

𝑦4

sat

on

RNN Unfolded RNN for Language Modeling

Page 64: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Vanishing Gradient Problem

65

𝑥𝑡−2

ℎ𝑡−1

𝑦𝑡−1

ℎ𝑜𝑢𝑡,𝑡−1

ℎ𝑖𝑛,𝑡−1

𝑥𝑡−1

ℎ𝑡

𝑦𝑡

ℎ𝑜𝑢𝑡,𝑡

ℎ𝑖𝑛,𝑡𝑊

𝑉

𝑊

𝑉

𝑈

𝑥0

ℎ1

𝑦1

ℎ𝑜𝑢𝑡,1

ℎ𝑖𝑛,1

𝑥1

ℎ2

𝑦2

ℎ𝑜𝑢𝑡,2

ℎ𝑖𝑛,2𝑊

𝑉

𝑊

𝑉

𝑈…

𝛿𝑖𝑛,1 = 𝛿𝑜𝑢𝑡,𝑡 ×𝜕ℎ𝑜𝑢𝑡,𝑡𝜕ℎ𝑖𝑛,𝑡

×𝜕ℎ𝑖𝑛,𝑡

𝜕ℎ𝑜𝑢𝑡,𝑡−1×⋯×

𝜕ℎ𝑖𝑛,2𝜕ℎ𝑜𝑢𝑡,1

×𝜕ℎ𝑜𝑢𝑡,1𝜕ℎ𝑖𝑛,1

Page 65: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

LSTM: Long Short Term Memory

66

𝑖(𝑡) = 𝜎(𝑊 𝑖 𝑥 𝑡 + 𝑈(𝑖)ℎ(𝑡−1))

ǁ𝑐(𝑡) = 𝑡𝑎𝑛ℎ(𝑊 𝑐 𝑥 𝑡 + 𝑈(𝑐)ℎ(𝑡−1))

𝑐(𝑡) = 𝑓 𝑡 ° ǁ𝑐 𝑡−1 + 𝑖 𝑡 ° ǁ𝑐(𝑡)

𝜎𝑈(𝑖)

𝑊(𝑖)

ℎ(𝑡−1)

𝑥(𝑡) 𝜎𝑈(𝑜)

𝑊(𝑜)

ℎ(𝑡−1)

𝑥(𝑡)

𝑜(𝑡)𝑖(𝑡)

Input:Does 𝑥(𝑡) matter? Output/Exposure:How much 𝑐(𝑡) should be exposed?

𝑈(𝑐)

𝑊(𝑐)

ℎ(𝑡−1)

𝑥(𝑡)

ǁ𝑐(𝑡)

New Memory:Compute new memory?

𝑡𝑎𝑛ℎ

𝜎𝑈(𝑓)

𝑊(𝑓)

ℎ(𝑡−1)

𝑥(𝑡)

𝑓(𝑡)

Forget:Should 𝑐(𝑡−1) be forgotten?

+ 𝑡𝑎𝑛ℎ

𝑐(𝑡−1)

𝑐(𝑡)ℎ(𝑡)

𝑓(𝑡) = 𝜎(𝑊 𝑓 𝑥 𝑡 + 𝑈(𝑓)ℎ(𝑡−1))

𝑜(𝑡) = 𝜎(𝑊 𝑜 𝑥 𝑡 + 𝑈(𝑜)ℎ(𝑡−1))

ℎ(𝑡) = 𝑜 𝑡 °𝑡𝑎𝑛ℎ(𝑐 𝑡 )

Input Gate

Forget Gate

Output Gate

Page 66: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Document Modeling with RNNLSTM

Two options Use the last hidden vector as the document representation

Use all the hidden vectors (average them to get the doc vec)

67

Page 67: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Model Training

Objective functionMinimize the cross-entropy error

Dataset Get massive reviews from Yelp and IMBD, regarding user generated

rating star as the sentiment label. Train/Dev/Test = 8:1:1

Multi-class classification

68

Dataset #documents#sentences/document

#words/document

#vocabulary #Class Class

Distribution

Yelp 1,569,264 8.97 151.9 612,636 5 .10/.09/.14/.30/.37

IMDB 348,415 14.02 325.6 115,831 10 .07/.04/.05/.05/.08/.11/.15/.17/.12/.18

Page 68: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Experimental Results

Compare with different classification algorithms

69

36.1

62.4

56.8

61.4

67.6

10

20

30

40

50

60

70

Majority SVM+Bigram SVM+AvgW2V CNN Our Model

Yelp 2014 IMDB

Acc

ura

cy

Page 69: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Experimental Results

Compare with different classification algorithms

70

36.1

62.4

56.8

61.4

67.6

17.9

40.9

31.9

36.6

45.3

10

20

30

40

50

60

70

Majority SVM+Bigram SVM+AvgW2V CNN Our Model

Yelp 2014 IMDB

Acc

ura

cy

Page 70: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Experimental Results

Compare with different compositional models

71

60.5

30.6

59.1

65.6 65.9

36.6

17.6

34.3

43 41.6

10

20

30

40

50

60

70

Average Recurrent Rec-Avg GatedNN GatedNN-Avg

Yelp 2014 IMDB

Acc

ura

cy

Page 71: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Hierarchical Attention Networks

Four components A word sequence encoder

A word-level attention layer

A sentence encoder

A sentence-level attention layer

72Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In NAACL 2016.

HN stands for Hierarchical Network, AVE indicates averaging, MAXindicates max-pooling, and ATT indicates hierarchical attention model.

Page 72: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

FastText

FastText The word representations are averaged into a text representation, which is in turn

fed to a linear classifier.

Does not use pre-trained word embeddings

73

Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. In arxiv.org 1607.01759.

fastText takes less than a minute to train on these datasets. The GRNNs method of Tang et al. (2015) takes around 12 hours per epoch on CPU with a single thread.

Page 73: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Directly learning embedding of text regions

74

Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. In NAACL 2015Rie Johnson, and Tong Zhang. Semi-supervised convolutional neural networks for text categorization via region embedding. In NIPS 2015.Rie Johnson, and Tong Zhang. Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings. In ICML 2016

seq-CNN

bow-CNN

Apply CNN to high-dimensional (one-hot) text data

CNN with two conv layers in parallel

Page 74: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Character Level CNN

75

Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. In NIPS 2015.Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2016. Very Deep Convolutional Networks for Natural Language Processing.arXiv.org 1606.01781.

Represent text from character with 6 layer → 29 layers convolutional NNs

The alphabet consists of 70 characters, including 26 english letters,

10 digits,

33 other characters and the new line character.

Page 75: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Take User Bias into Consideration

From a sentiment analysis perspective , users have different habits to Assign sentiment ratings on IMDB, Yelp …

Use different sentiment words to express one’s feeling

76

Example(a) Example(b)

Page 76: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

User and Product Enhanced Neural Model for Sentiment Analysis

Take into account of the evidences from text, user and product to infer the sentiment label (numeric rating).

79

Softmax gold rating = 2

w1

h1

Uk Pj

h2 hn

Lookup

Linear

……

Convolution

Poolinguk pjvd

Tanh

w1

× ×

w2Uk Pj w2

× ×

wnUk Pj wn

× ×

Pj: product Uk: user wi: word

User-Rating

User-Text

Product-Rating

Product-Text

Duyu Tang, Bing Qin, Ting Liu. Learning semantic representations of users and products for document level sentiment classification. In ACL 2015.

Page 77: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Experimental Results

The effects of different preferences

80

60.858.5

59.7

43.5

40.942.6

30

35

40

45

50

55

60

65

UPNN(full) UPNN - rating UPNN - word

Yelp 2014 IMDB

60.8

57.759.5

43.5

32.4

39.7

30

35

40

45

50

55

60

65

UPNN(full) UPNN - user UPNN - product

Yelp 2014 IMDB

Page 78: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

User Product Attention

Calculate sentence/doc vec with UP attention

81

Huimin Chen, Maosong Sun, Cunchao Tu, Yankai Lin, Zhiyuan Liu. Neural Sentiment Classification with User and Product Attention. In EMNLP 2016.

Page 79: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Emotion Cause Extraction

It is a new task for sentiment analysis

Objective: Given an emotional document Identify the cause of emotion.

Tasks: Clause level classification

Phrase level extraction

Data: http://hlt.hitsz.edu.cn/?page_id=694

82Lin Gui, Dongyin Wu, Ruifeng Xu*, Qin Lu, Yu Zhou. Event-Driven Emotion Cause Extraction with Corpus Construction. In EMNLP 2016.

Example:

在劝说过程中

消防官兵了解到

该女子是由于对方拖欠工程款

家中又急需用钱

无奈才选择跳楼轻生emotion

cause

clause

phrase

Page 80: Sentiment Analysis with Neural Networksƒ…感分析_part1.pdfSentiment Analysis Tasks Objective: Given an opinion document Discover all/parts of sentiment quadruples (t, s, h, time)

Emotion Cause Extraction

Gui et al. proposed an event-driven method: Use linguistic rules to extract events

Use multi-kernel SVMs to identify the cause

Discussion No deep learning approach on this task.

Performance of existing method is limited (0.67 F-measure)

83Lin Gui, Dongyin Wu, Ruifeng Xu*, Qin Lu, Yu Zhou. Event-Driven Emotion Cause Extraction with Corpus Construction. In EMNLP 2016.