Upload
leena
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
CMPE 545 Artificial Neural Networks. Estimating review score from words. Işık Barış Fidaner. S. = 1/N . score i. Metascore. The rating given to this product. r t =. The source of this review. Score. Reviewer. Quote. + affectionate. A few sentences that summarize this review. - PowerPoint PPT Presentation
Citation preview
Estimating reviewscore from words
Işık Barış Fidaner
CMPE 545 Artificial Neural Networks
Metascore= 1/N . scorei
Score Reviewer
Quote
The rating given to this product
The source of this review
A few sentences that summarize
this review
xt = ?
rt =
+ exuberant
+ embrace
+ affectionateBag of wordsrepresentation
Existence of somewords in the quote
Purposes
1. A new database that relates text to score
(...)An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon.(...)
90?
Purposes
2. Quantify meaning with machine learning
rivetingexhileratingaffectionatecraftedexuberantdulllackingembrace
00101001
Review quote:
An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon.
xt
73
70
65
wT
Purposes
3. Meta-metacritic deductions, such as
Positive words
rivetingexhileratingcraftedsuperbextraordinarybrilliant
Negative words
unfunnytediousfailsmessdulllacking
Obtaining the database
• Developed a PHP web crawler• It ran for a few days• TV show reviews– 8,335 records
• Music album reviews– 62,293 records
• Movie reviews– 113,456 records
MySQL
PHP
Bag of words assumption
• Features affect the result independently
=An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon.
phenomenon from an exuberant picture those into a portugese don’t pop-culture affectionate to embrace bring klingon of who know seeks
• Semantic organization does not matter
Bag of words assumption
• The problem with modifiers:
This is not good. Is this not good?
• We rely on the information encoded in the vocabulary, not grammar
• Opinions expressed clearly and simply:
Excellent, wonderful! This is dreadful.
Word selection
1. Quote count (QC)2. Product count (PC)
• Meaningful words (SS < SSmax = 20)
• Frequently used words (PC > PCmin = 20)
• Non-grammatical words (PC < PCmax = 100)
3. Score mean (SM)4. Score stdev (SS)
~20 thousand words ~300 words
Significant words for TV and movies
unfunny
wastedisappointmentsupposed, fails
fancy words!casual words!
Movies areoverrated!
TV takes toomuch time!
Significant words for music albums
masterpieceartists
Music is art
datemodern
Music agesquickly
personalityAlbums are attachedto the musician’spersonality
The input vector and estimation
• Example input vector (divided by quote size)– xt = [1 0 0 1 0 0 0 1 0 0 0 0 ... 0] / 3
• Estimation function
• There is a weight for every selected word• xt chooses the subset of contained words• Estimation is the sum of w0 and the
arithmetic mean of the weights of contained words
Linear and SVM regression
• Linear regression uses square difference err.
• Which imply these update equations:
• SVM regression uses -sensitive error func.
• With these simpler update equations
Linear regression learning
Unstable learning in validation set
Error of 17 points
Error of 14 points
SVM regression learning
Robustness increased, because SVM error function is linear and tolerant to error.
Error of 13 points
Error of 11 points
Better resultswith SVM!
Possible improvements
• Non-linear model that actually weighs the importance of words
• Normalization by estimating reviewer parameters
• Adding two-word combinations to the input vector
Estimating reviewscore from words
Işık Barış Fidaner
CMPE 545 Artificial Neural Networks