Upload
krk269
View
90
Download
3
Tags:
Embed Size (px)
Citation preview
GoodReviews
Deliver the most helpful book reviews
Krishna Karthik
Readers often rely on book reviews by other users to choose a book
Currently, reviews on Amazon are rated by users
Potentially helpful reviews can go unnoticed if unrated
Let a machine classify reviews as helpful
GoodReviews : Motivation
goodreviews.co
Excerpt of a helpful unrated review
Text file with book reviews from Amazon
Unstructured data
Data
Book reviews as they appear on Amazon
Response used in the modelingFraction of users who rated the review that found it to be useful
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.
Three class classification
One vs Rest with Random Forest classifier
Features (NLP)
No. of adverbs, adjectives ….. in review and summary
References to genre
No. of words, sentences
Subjectivity, polarity and lexical diversity
• Precision and Recall used as success metrics
• ~70% average precision and recall in the test and validation sets
• ~75% precision and recall for the most helpful reviews
Nothelpful Middle Helpful
20% 80%
Fractionalhelpfulness
Helpful reviews have more neutral sentiment
Helpful reviews haveno extremes of sentiment
Review polarity
Mean Variance
Helpful 0.17 0.12
Not helpful 0.05 0.25
Helpful reviews are longer and have fewer unique words
Helpful reviews longer on average
Helpful reviews areless lexically diverse
Number of wordsin the review
Lexical diversity
Mean Variance
Helpful 0.61 0.1
Not helpful 0.73 0.13
About me
Krishna Karthik
Experimental particle physics
New York University and CERN
Reading, hiking and soccer
Backup
Details of data and algorithm
• Features extracted using NLTK and TextBlob
• Trained on ~ 60,000 reviews evenly split between the 3 classes
• In reality, the dataset is distributed as 3:3:1 (Bad:Middle:Good)
• Test and validation sets consisting of 30,000 reviews each
• One vs Rest using Random Forest classifier
Feature importance
Book rating
Number of words
Polarity
Lexical diversity
Test set plotsTest set - ROC
Test set - Precision Recall
~4% helpful misclassified as not helpful
Validation set plotsValidation set
Validation set
~4% helpful misclassified as not helpful
Screenshot 1
Screenshot 2
Screenshot 3