19
1 Automatic Identification of Pro and Con Reason in Online Reviews Soo-Min and Eduard Hovy COLING’06 Advisor: Chia-Hui Chang Presenter: Teng-Kai Fan Date: 2008-05-20

Automatic Identification Of Pro And Con Reason In Online Reviews

  • Upload
    ceya

  • View
    1.252

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Automatic Identification Of Pro And Con Reason In Online Reviews

1

Automatic Identification of Pro and Con Reason in Online Reviews

Soo-Min and Eduard HovyCOLING’06

Advisor: Chia-Hui ChangPresenter: Teng-Kai Fan

Date: 2008-05-20

Page 2: Automatic Identification Of Pro And Con Reason In Online Reviews

2

Abstract

Authors present a system that automatically extracts the pros and cons from online reviews. Their focus is on extracting the reasons of the

opinions, which may be in the form of either fact or opinions.

They proposed a system based on maximum entropy model for aligning the pros and cons to their sentence in review texts.

Page 3: Automatic Identification Of Pro And Con Reason In Online Reviews

3

Outline

Introduction Pro and Con in Online Reviews Finding Pros and Cons Dataset Experiments and Results Conclusion

Page 4: Automatic Identification Of Pro And Con Reason In Online Reviews

4

Introduction

Many opinions are being expressed on the Web in such settings as product reviews, personal blogs, and news group message....

The trend has raised many interesting research topics such as subjectivity detection, semantic orientation classification, and review classifications.

Page 5: Automatic Identification Of Pro And Con Reason In Online Reviews

5

Introduction cont.

Subjectivity detection: It is the task of identifying subjective

words, expressions, and sentences.

Semantic orientation classification: It is the task of determining positive or

negative sentiment of words (phrases, sentence or document).

Page 6: Automatic Identification Of Pro And Con Reason In Online Reviews

6

Introduction cont.

The opinion reason identification problem seeks to answer the question “What are the reasons that the author of this review likes or dislikes the product?”

Hence, they focus on extracting pros and cons which include not only sentences that contain opinion-bearing expressions about products and features but also sentences with reasons.

Page 7: Automatic Identification Of Pro And Con Reason In Online Reviews

7

Introduction cont.

Labeling each sentence is a time consuming and costly task. Authors propose a framework for automatically

identifying reasons in online reviews and introduce a novel technique to label training data.

The experimental results show that the pros and cons with 66% precision and 76% recall.

Page 8: Automatic Identification Of Pro And Con Reason In Online Reviews

8

Pros and Cons in Online Reviews

Researchers study opinions at three different levels: word, sentence, and document level.

They assume that reasons in a review are closely related of pros and cons expressed in the review. Pros in a product review are sentences that

describe reasons why an author of the review likes the product.

Page 9: Automatic Identification Of Pro And Con Reason In Online Reviews

9

Automatically Labeling Pro and Con Sentences

Many web sites that have product reviews such as amzaon.com and epinions.com explicitly state pros and cons phrases.

Hence, the automatic labeling system first collects phrases in pro and con fields and then searches the main reviews text in order to collect sentences corresponding to those phrase.

Page 10: Automatic Identification Of Pro And Con Reason In Online Reviews

10

Automatically Labeling Pro and Con Sentences cont.

First, generating two sets of phrases: {P1, P2,…,Pn}, {C1, C2,…,Cn} by extracting each pro and con fileds. Ex.: beautiful display.

Then, the system checks each sentence to find a sentence that covers most of the words in the phrase. Ex.: I’m personally quite happy

with it because of the beautiful display.

Last, the system annotates this sentence with the “pro” label.

Pro

Con

Main

Review

Page 11: Automatic Identification Of Pro And Con Reason In Online Reviews

11

Modeling with Maximum Entropy Classification

They use Maximum Entropy classification for the task of finding pro and con sentences in a given review.

The conditional probability of a class c given a feature vector x:

where:

fi (c, x): feature function with boolean value. λ a weight parameter for the feature function.

Page 12: Automatic Identification Of Pro And Con Reason In Online Reviews

12

Modeling with Maximum Entropy Classification cont.

To build an efficient model, the task of finding pro and con sentence is separated into two phases: The Identification separates pro and cons candidate

sentences (PR and CR) from sentences irrelevant to either of them (NR).

The Classification classifies candidates into pros and cons.

IdentificationClassification

Page 13: Automatic Identification Of Pro And Con Reason In Online Reviews

13

Features

1. News Corpus2. WordNet.

Page 14: Automatic Identification Of Pro And Con Reason In Online Reviews

14

DataSet

Two different source: Epininos.com for training. Complaints.com for testing.

Dataset1: Automatically Labeled Data Mp3 player: 3241 reviews (115029 sentences) Restaurant: 7524 reviews (194391 sentences)

Dataset2: Complaints.com Data Mp3 player: 59 reviews. Restaurant: 322 reviews.

Page 15: Automatic Identification Of Pro And Con Reason In Online Reviews

15

Experimental Results

Two goals: How well our pro and con detection mode

l. How well the trained model performs on c

omplaints.com 80 % for training, 10 % for developmen

t, and 10 % for testing.

Page 16: Automatic Identification Of Pro And Con Reason In Online Reviews

16

Experiments on Dataset 1 Identification step

Page 17: Automatic Identification Of Pro And Con Reason In Online Reviews

17

Experiments on Dataset 1Classification step

Page 18: Automatic Identification Of Pro And Con Reason In Online Reviews

18

Experiment on DataSet 2

Gold Standard Annotation: Four humans annotated test sets.

Only Identification:

Page 19: Automatic Identification Of Pro And Con Reason In Online Reviews

19

Conclusions

This paper propose a framework for identifying the online product review.

They present a novel technique that automatically labels a large set of pro and con sentences by using clue phrases.