Demo june29

GoodReviews

Deliver the most helpful book reviews

Krishna Karthik

Readers often rely on book reviews by other users to choose a book

Currently, reviews on Amazon are rated by users

Potentially helpful reviews can go unnoticed if unrated

Let a machine classify reviews as helpful

GoodReviews : Motivation

goodreviews.co

Excerpt of a helpful unrated review

http://goodreviews.co

Text file with book reviews from Amazon

Unstructured data

Data

Book reviews as they appear on Amazon

Response used in the modelingFraction of users who rated the review that found it to be useful

J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.

http://i.stanford.edu/~julian/pdfs/recsys13.pdf

Three class classification

One vs Rest with Random Forest classifier

Features (NLP)

No. of adverbs, adjectives ….. in review and summary

References to genre

No. of words, sentences

Subjectivity, polarity and lexical diversity

• Precision and Recall used as success metrics

• ~70% average precision and recall in the test and validation sets

• ~75% precision and recall for the most helpful reviews

Nothelpful Middle Helpful

20% 80%

Fractionalhelpfulness

Helpful reviews have more neutral sentiment

Helpful reviews haveno extremes of sentiment

Review polarity

Mean Variance

Helpful 0.17 0.12

Not helpful 0.05 0.25

Helpful reviews are longer and have fewer unique words

Helpful reviews longer on average

Helpful reviews areless lexically diverse

Number of wordsin the review

Lexical diversity

Mean Variance

Helpful 0.61 0.1

Not helpful 0.73 0.13

About me

Krishna Karthik

Experimental particle physics

New York University and CERN

Reading, hiking and soccer

Backup

Details of data and algorithm

• Features extracted using NLTK and TextBlob

• Trained on ~ 60,000 reviews evenly split between the 3 classes

• In reality, the dataset is distributed as 3:3:1 (Bad:Middle:Good)

• Test and validation sets consisting of 30,000 reviews each

• One vs Rest using Random Forest classifier

Feature importance

Book rating

Number of words

Polarity

Lexical diversity

Test set plotsTest set - ROC

Test set - Precision Recall

~4% helpful misclassified as not helpful

Validation set plotsValidation set

Validation set

~4% helpful misclassified as not helpful

Screenshot 1

Screenshot 2

Screenshot 3

Data & Analytics

Demo june29