15
GoodReviews Deliver the most helpful book reviews Krishna Karthik

Demo june29

  • Upload
    krk269

  • View
    90

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Demo june29

GoodReviews

Deliver the most helpful book reviews

Krishna Karthik

Page 2: Demo june29

Readers often rely on book reviews by other users to choose a book

Currently, reviews on Amazon are rated by users

Potentially helpful reviews can go unnoticed if unrated

Let a machine classify reviews as helpful

GoodReviews : Motivation

goodreviews.co

Excerpt of a helpful unrated review

Page 3: Demo june29

Text file with book reviews from Amazon

Unstructured data

Data

Book reviews as they appear on Amazon

Response used in the modelingFraction of users who rated the review that found it to be useful

J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. RecSys, 2013.

Page 4: Demo june29

Three class classification

One vs Rest with Random Forest classifier

Features (NLP)

No. of adverbs, adjectives ….. in review and summary

References to genre

No. of words, sentences

Subjectivity, polarity and lexical diversity

• Precision and Recall used as success metrics

• ~70% average precision and recall in the test and validation sets

• ~75% precision and recall for the most helpful reviews

Nothelpful Middle Helpful

20% 80%

Fractionalhelpfulness

Page 5: Demo june29

Helpful reviews have more neutral sentiment

Helpful reviews haveno extremes of sentiment

Review polarity

Mean Variance

Helpful 0.17 0.12

Not helpful 0.05 0.25

Page 6: Demo june29

Helpful reviews are longer and have fewer unique words

Helpful reviews longer on average

Helpful reviews areless lexically diverse

Number of wordsin the review

Lexical diversity

Mean Variance

Helpful 0.61 0.1

Not helpful 0.73 0.13

Page 7: Demo june29

About me

Krishna Karthik

Experimental particle physics

New York University and CERN

Reading, hiking and soccer

Page 8: Demo june29

Backup

Page 9: Demo june29

Details of data and algorithm

• Features extracted using NLTK and TextBlob

• Trained on ~ 60,000 reviews evenly split between the 3 classes

• In reality, the dataset is distributed as 3:3:1 (Bad:Middle:Good)

• Test and validation sets consisting of 30,000 reviews each

• One vs Rest using Random Forest classifier

Page 10: Demo june29

Feature importance

Book rating

Number of words

Polarity

Lexical diversity

Page 11: Demo june29

Test set plotsTest set - ROC

Test set - Precision Recall

~4% helpful misclassified as not helpful

Page 12: Demo june29

Validation set plotsValidation set

Validation set

~4% helpful misclassified as not helpful

Page 13: Demo june29

Screenshot 1

Page 14: Demo june29

Screenshot 2

Page 15: Demo june29

Screenshot 3