Upload
ceya
View
1.252
Download
2
Embed Size (px)
Citation preview
1
Automatic Identification of Pro and Con Reason in Online Reviews
Soo-Min and Eduard HovyCOLING’06
Advisor: Chia-Hui ChangPresenter: Teng-Kai Fan
Date: 2008-05-20
2
Abstract
Authors present a system that automatically extracts the pros and cons from online reviews. Their focus is on extracting the reasons of the
opinions, which may be in the form of either fact or opinions.
They proposed a system based on maximum entropy model for aligning the pros and cons to their sentence in review texts.
3
Outline
Introduction Pro and Con in Online Reviews Finding Pros and Cons Dataset Experiments and Results Conclusion
4
Introduction
Many opinions are being expressed on the Web in such settings as product reviews, personal blogs, and news group message....
The trend has raised many interesting research topics such as subjectivity detection, semantic orientation classification, and review classifications.
5
Introduction cont.
Subjectivity detection: It is the task of identifying subjective
words, expressions, and sentences.
Semantic orientation classification: It is the task of determining positive or
negative sentiment of words (phrases, sentence or document).
6
Introduction cont.
The opinion reason identification problem seeks to answer the question “What are the reasons that the author of this review likes or dislikes the product?”
Hence, they focus on extracting pros and cons which include not only sentences that contain opinion-bearing expressions about products and features but also sentences with reasons.
7
Introduction cont.
Labeling each sentence is a time consuming and costly task. Authors propose a framework for automatically
identifying reasons in online reviews and introduce a novel technique to label training data.
The experimental results show that the pros and cons with 66% precision and 76% recall.
8
Pros and Cons in Online Reviews
Researchers study opinions at three different levels: word, sentence, and document level.
They assume that reasons in a review are closely related of pros and cons expressed in the review. Pros in a product review are sentences that
describe reasons why an author of the review likes the product.
9
Automatically Labeling Pro and Con Sentences
Many web sites that have product reviews such as amzaon.com and epinions.com explicitly state pros and cons phrases.
Hence, the automatic labeling system first collects phrases in pro and con fields and then searches the main reviews text in order to collect sentences corresponding to those phrase.
10
Automatically Labeling Pro and Con Sentences cont.
First, generating two sets of phrases: {P1, P2,…,Pn}, {C1, C2,…,Cn} by extracting each pro and con fileds. Ex.: beautiful display.
Then, the system checks each sentence to find a sentence that covers most of the words in the phrase. Ex.: I’m personally quite happy
with it because of the beautiful display.
Last, the system annotates this sentence with the “pro” label.
Pro
Con
Main
Review
11
Modeling with Maximum Entropy Classification
They use Maximum Entropy classification for the task of finding pro and con sentences in a given review.
The conditional probability of a class c given a feature vector x:
where:
fi (c, x): feature function with boolean value. λ a weight parameter for the feature function.
12
Modeling with Maximum Entropy Classification cont.
To build an efficient model, the task of finding pro and con sentence is separated into two phases: The Identification separates pro and cons candidate
sentences (PR and CR) from sentences irrelevant to either of them (NR).
The Classification classifies candidates into pros and cons.
IdentificationClassification
13
Features
1. News Corpus2. WordNet.
14
DataSet
Two different source: Epininos.com for training. Complaints.com for testing.
Dataset1: Automatically Labeled Data Mp3 player: 3241 reviews (115029 sentences) Restaurant: 7524 reviews (194391 sentences)
Dataset2: Complaints.com Data Mp3 player: 59 reviews. Restaurant: 322 reviews.
15
Experimental Results
Two goals: How well our pro and con detection mode
l. How well the trained model performs on c
omplaints.com 80 % for training, 10 % for developmen
t, and 10 % for testing.
16
Experiments on Dataset 1 Identification step
17
Experiments on Dataset 1Classification step
18
Experiment on DataSet 2
Gold Standard Annotation: Four humans annotated test sets.
Only Identification:
19
Conclusions
This paper propose a framework for identifying the online product review.
They present a novel technique that automatically labels a large set of pro and con sentences by using clue phrases.