Predicting Star Ratings based on Annotated Reviewss of Mobile Apps [Slides]

Predicting Star Ratings

based on Annotated Reviews

of Mobile Apps

Talk at the 6th International Workshop on Advances in Semantic Information Retrieval

ASIR 2016

Prof. Dr. Dagmar Monett, Hermann Stolte

D. Monett

Reviews and star ratings

2Gdańsk, Poland, September 11 – 14, 2016

Example of reviews and star ratings of the

Evernote App, Google Play Store (07/2016)

D. Monett

Star ratings matter


15% would consider downloading an app with a 2-star rating



Source: Aptentive 2015 Consumer Study

The Mobile Marketer‘s Guide to App Store Ratings & Reviews

D. Monett

Star ratings matter


© and source: Aptentive 2015 Consumer Study

The Mobile Marketer‘s Guide to App Store Ratings & Reviews

D. Monett 5Gdańsk, Poland, September 11 – 14, 2016

Our motivation

D. Monett

Some questions…


■ Could we (a program) teach users how to rate

apps consistently with the review they are writing

for a mobile app?

■ I.e., could we (a program) suggest to users the

most adequate star rating they should give to a

product depending on the semantic orientation of

what they have already written in the review?

■ Would it mean an improvement of users'

engagement and satisfaction with the app?


Background


Review rating prediction

■ Also sentiment rating prediction:

■ …a task that deals with the inference of an

author's implied numerical rating, i.e. on the

prediction of a rating score, from a given written

review

■ E.g., recommendation systems often suggest

products based on star ratings of similar

products previously rated by other users


Suggested readings


Other related work

■ Analysing textual reviews and inferring sentiment

polarity –positive/negative/neutral– (Pang et al. 2002;

Liu, 2010)

■ Using not only textual semantics but also other

information, e.g., about the author and/or the

product (Tang et al., 2015; Li et al. 2011)

■ Considering phrase-level sentiment polarity (Qu et

al., 2010)

■ Considering aspect-based opinion mining (Zhang et

al., 2006; Ganu et al., 2013; Klinger & Cimiano, 2013; Sänger, 2015)


Our approach


Our approach

■ We do not deal with aspect identification nor with

sentiment classification

■ We are assuming that these tasks are already

performed before the star ratings are predicted

■ We focus on predicting star ratings based solely

on available annotated, fine-granular opinions

■ I.e., a complement to works like (Sänger, 2015) which

extends (Klinger & Cimiano, 2013) and use a German

annotated corpus of mobile apps


The Data


SCARE Corpus

Mario Sänger, Ulf Leser, Steffen Kemmerer, Peter Adolphs, and Roman Klinger.

SCARE - The Sentiment Corpus of App Reviews with Fine-grained Annotations in

German. In Proceedings of the Tenth International Conference on Language

Resources and Evaluation (LREC'16), Portorož, Slovenia, May 2016. European

Language Resources Association (ELRA).

■ Fine-grained annotations for mobile application

reviews from the Google Play Store

■ 1,760 German application reviews with 2,487

aspects and 3,959 subjective phrases

■ SCARE corpus v.1.0.0 (annotations only)

■ Available at http://www.romanklinger.de/scare/

http://www.romanklinger.de/scare/


Analysing the Data


Polarity and star ratings

69.1%

23.1%

Thumbs-up-thumbs-down

(Liu, 2012)

D. Monett

Avg. of labelled star ratings vs.

avg. of subjective phrases polarity


D. Monett

Number of star ratings vs.

number of subjective phrases



Predicting

Star Ratings

D. Monett

Prediction process



We “played” with

different models

D. Monett

Computational models


For example,x0=1

x1 : no. of subjective phrases with positive polarity

x2 : no. of subjective phrases with negative polarity

x3 : no. of subjective phrases with neutral polarity

D. Monett

Computational models


RSS: review rating score (Ganu et al., 2009, 2013)

D. Monett

Experiments


(1) Assessing the importance of sentiment in the

reviews:

■ Neutral phrases (yes/no)?

■ Reviews with no sentiment (yes/no)?

(2) Using other predictors

■ Each individual experiment is run 10,000 times

■ A Monte Carlo cross-validation: 70% training

dataset and 30% testing dataset, randomly on each

iteration.


Some results

D. Monett

“Best” model, exp. (1)


■ It considers only the average value of the

polarities of a review in one feature:

■ Plus:

■ filtering both subjective phrases with neutral

polarity and reviews with no sentiment

orientation at all

■ No normalisation

D. Monett

Results



Conclusion

D. Monett

Conclusion


■ Textually-derived rating prediction can be

performed well even when only phrase-level

sentiment polarity is available

■ Phrases with neutral sentiment could be filtered

out of the corpus

■ Computing the overall sentiment of a review using

the review rating score (Ganu et al., 2009, 2013) provides

the best star rating predictions

D. Monett

Further work


■ To consider the aspects’ relevance

■ aspect-oriented subjective phrases

■ To analyse the strengths of the opinions (Wilson et al.,

2004)

■ not only positive/negative/neutral sentiment

■ To deal with other types of models different than

linear, multivariate regression ones

D. Monett

Sources


Related work:

- See references list on our paper!

■ https://www.researchgate.net/publication/304244445_Predi

cting_Star_Ratings_based_on_Annotated_Reviews_of_Mo

bile_Apps

http://www.psdgraphics.com/buttons/round-rating-buttons-psd/

[email protected]

monettdiaz

Contact:

Data & Analytics

Predicting Star Ratings based on Annotated Reviewss of Mobile Apps [Slides]