32
Predicting Star Ratings based on Annotated Reviews of Mobile Apps Talk at the 6th International Workshop on Advances in Semantic Information Retrieval ASIR 2016 Prof. Dr. Dagmar Monett, Hermann Stolte

Predicting Star Ratings based on Annotated Reviewss of Mobile Apps [Slides]

Embed Size (px)

Citation preview

Predicting Star Ratings

based on Annotated Reviews

of Mobile Apps

Talk at the 6th International Workshop on Advances in Semantic Information Retrieval

ASIR 2016

Prof. Dr. Dagmar Monett, Hermann Stolte

D. Monett

Reviews and star ratings

2Gdańsk, Poland, September 11 – 14, 2016

Example of reviews and star ratings of the

Evernote App, Google Play Store (07/2016)

D. Monett

Star ratings matter

3Gdańsk, Poland, September 11 – 14, 2016

15% would consider downloading an app with a 2-star rating

50% would consider downloading an app with a 3-star rating

96% would consider downloading an app with a 4-star rating

Source: Aptentive 2015 Consumer Study

The Mobile Marketer‘s Guide to App Store Ratings & Reviews

D. Monett

Star ratings matter

4Gdańsk, Poland, September 11 – 14, 2016

© and source: Aptentive 2015 Consumer Study

The Mobile Marketer‘s Guide to App Store Ratings & Reviews

D. Monett 5Gdańsk, Poland, September 11 – 14, 2016

Our motivation

D. Monett

Some questions…

6Gdańsk, Poland, September 11 – 14, 2016

■ Could we (a program) teach users how to rate

apps consistently with the review they are writing

for a mobile app?

■ I.e., could we (a program) suggest to users the

most adequate star rating they should give to a

product depending on the semantic orientation of

what they have already written in the review?

■ Would it mean an improvement of users'

engagement and satisfaction with the app?

D. Monett 7Gdańsk, Poland, September 11 – 14, 2016

Background

D. Monett 8Gdańsk, Poland, September 11 – 14, 2016

Review rating prediction

■ Also sentiment rating prediction:

■ …a task that deals with the inference of an

author's implied numerical rating, i.e. on the

prediction of a rating score, from a given written

review

■ E.g., recommendation systems often suggest

products based on star ratings of similar

products previously rated by other users

D. Monett 9Gdańsk, Poland, September 11 – 14, 2016

Suggested readings

D. Monett 10Gdańsk, Poland, September 11 – 14, 2016

Other related work

■ Analysing textual reviews and inferring sentiment

polarity –positive/negative/neutral– (Pang et al. 2002;

Liu, 2010)

■ Using not only textual semantics but also other

information, e.g., about the author and/or the

product (Tang et al., 2015; Li et al. 2011)

■ Considering phrase-level sentiment polarity (Qu et

al., 2010)

■ Considering aspect-based opinion mining (Zhang et

al., 2006; Ganu et al., 2013; Klinger & Cimiano, 2013; Sänger, 2015)

D. Monett 11Gdańsk, Poland, September 11 – 14, 2016

Our approach

D. Monett 12Gdańsk, Poland, September 11 – 14, 2016

Our approach

■ We do not deal with aspect identification nor with

sentiment classification

■ We are assuming that these tasks are already

performed before the star ratings are predicted

■ We focus on predicting star ratings based solely

on available annotated, fine-granular opinions

■ I.e., a complement to works like (Sänger, 2015) which

extends (Klinger & Cimiano, 2013) and use a German

annotated corpus of mobile apps

D. Monett 13Gdańsk, Poland, September 11 – 14, 2016

The Data

D. Monett 14Gdańsk, Poland, September 11 – 14, 2016

SCARE Corpus

Mario Sänger, Ulf Leser, Steffen Kemmerer, Peter Adolphs, and Roman Klinger.

SCARE - The Sentiment Corpus of App Reviews with Fine-grained Annotations in

German. In Proceedings of the Tenth International Conference on Language

Resources and Evaluation (LREC'16), Portorož, Slovenia, May 2016. European

Language Resources Association (ELRA).

■ Fine-grained annotations for mobile application

reviews from the Google Play Store

■ 1,760 German application reviews with 2,487

aspects and 3,959 subjective phrases

■ SCARE corpus v.1.0.0 (annotations only)

■ Available at http://www.romanklinger.de/scare/

D. Monett 15Gdańsk, Poland, September 11 – 14, 2016

Analysing the Data

D. Monett 16Gdańsk, Poland, September 11 – 14, 2016

Polarity and star ratings

69.1%

23.1%

Thumbs-up-thumbs-down

(Liu, 2012)

D. Monett

Avg. of labelled star ratings vs.

avg. of subjective phrases polarity

17Gdańsk, Poland, September 11 – 14, 2016

D. Monett

Number of star ratings vs.

number of subjective phrases

18Gdańsk, Poland, September 11 – 14, 2016

D. Monett 19Gdańsk, Poland, September 11 – 14, 2016

Predicting

Star Ratings

D. Monett

Prediction process

20Gdańsk, Poland, September 11 – 14, 2016

D. Monett 21Gdańsk, Poland, September 11 – 14, 2016

We “played” with

different models

D. Monett

Computational models

22Gdańsk, Poland, September 11 – 14, 2016

For example,x0=1

x1 : no. of subjective phrases with positive polarity

x2 : no. of subjective phrases with negative polarity

x3 : no. of subjective phrases with neutral polarity

D. Monett

Computational models

23Gdańsk, Poland, September 11 – 14, 2016

RSS: review rating score (Ganu et al., 2009, 2013)

D. Monett

Experiments

24Gdańsk, Poland, September 11 – 14, 2016

(1) Assessing the importance of sentiment in the

reviews:

■ Neutral phrases (yes/no)?

■ Reviews with no sentiment (yes/no)?

(2) Using other predictors

■ Each individual experiment is run 10,000 times

■ A Monte Carlo cross-validation: 70% training

dataset and 30% testing dataset, randomly on each

iteration.

D. Monett 25Gdańsk, Poland, September 11 – 14, 2016

Some results

D. Monett

“Best” model, exp. (1)

26Gdańsk, Poland, September 11 – 14, 2016

■ It considers only the average value of the

polarities of a review in one feature:

■ Plus:

■ filtering both subjective phrases with neutral

polarity and reviews with no sentiment

orientation at all

■ No normalisation

D. Monett

Results

27Gdańsk, Poland, September 11 – 14, 2016

D. Monett 28Gdańsk, Poland, September 11 – 14, 2016

Conclusion

D. Monett

Conclusion

29Gdańsk, Poland, September 11 – 14, 2016

■ Textually-derived rating prediction can be

performed well even when only phrase-level

sentiment polarity is available

■ Phrases with neutral sentiment could be filtered

out of the corpus

■ Computing the overall sentiment of a review using

the review rating score (Ganu et al., 2009, 2013) provides

the best star rating predictions

D. Monett

Further work

30Gdańsk, Poland, September 11 – 14, 2016

■ To consider the aspects’ relevance

■ aspect-oriented subjective phrases

■ To analyse the strengths of the opinions (Wilson et al.,

2004)

■ not only positive/negative/neutral sentiment

■ To deal with other types of models different than

linear, multivariate regression ones

D. Monett

Sources

31Gdańsk, Poland, September 11 – 14, 2016

Related work:

- See references list on our paper!

■ https://www.researchgate.net/publication/304244445_Predi

cting_Star_Ratings_based_on_Annotated_Reviews_of_Mo

bile_Apps

[email protected]

monettdiaz

Contact: