Exploiting Social Context for Review Quality Prediction

Preview:

DESCRIPTION

Exploiting Social Context for Review Quality Prediction. Yue Lu University of Illinois at Urbana-Champaign Panayiotis Tsaparas Microsoft Research Alexandros Ntoulas Microsoft Research - PowerPoint PPT Presentation

Citation preview

Exploiting Social Context for Review Quality Prediction

Yue Lu University of Illinois at Urbana-ChampaignPanayiotis Tsaparas Microsoft ResearchAlexandros Ntoulas Microsoft ResearchLivia Polanyi Microsoft

April 28, WWW’2010 Raleigh, NC

2

Why do we care about Predicting Review Quality?

User reviews (1764)

User “helpfulness” voteshelp prioritize reading

But not all reviews have votes1. New reviews2. Reviews aggregated from

multiple sources

3

What has been done?• As classification or regression problem

√ ×?

???

??

?

?

?

[Zhang&Varadarajan`06] [Kim et al. `06][Liu et al. `08] [Ghose&Ipeirotis `10]

Labeled

Unlabeled

• Textual features• Meta-data features

4

Reviews are NOTStand-Alone Documents

We also observe…

Reviewer Identity

Social Network Social Context=+

Our Work:Exploiting Social Context for Review Quality Prediction

5

Roadmap

• Motivation• Review Quality Prediction Algorithms • Experimental Evaluation• Conclusions

6

• SentiPositive• SentiNegative

Text-only Baseline

Textual Features

Text Statistics

• NumSent• NumTokens• SentLen• CapRatio• UniqWordRatio

Syntactic

• POS:RB• POS:PP• POS:V• POS:CD• POS:JJ• POS:NN• POS:SYM• POS:COM• POS:FW

Conformity

• KLDiv

Sentiment

FeatureVector( )=

7

Base Model: Linear Regression

w = argmin= argmin{ }

Quality( ) = Weights×FeatureVector( )i

i

Closed-form: w=

8

Straight-forward Approach: Adding Social Context as Features

Reviewer History

• NumReview

• AvgRating

Social Network

• InDegree• OutDegree• PageRank

Textual Features

Social Context Features

FeatureVector( )=

Disadvantages:•Social context features not always available• Anonymous reviews?• A new reviewer?•Need more training data

9

Our Approach: Social Context as Constraints

Reviewer Identity

Social Network

Quality( )Quality( )

is related to

Quality( ) is related to its Social Network

Our Intuitions:

How to combine such intuitions with Textual info?

10

Formally: Graph-based Regularizers

{ + β× Graph Regularizer }w = argmin

Trade-off parameter

Designed to “favor”our intuitions

BaselineLoss function

Advantages:• Semi-supervised: make use of unlabeled data• Applicable to reviews without social context

Labeled Unlabeled

We will define four regularizers base on four hypotheses.

11

1.Reviewer Consistency Hypothesis

Quality( )

Quality( ) ~

1 23 4

1

4

Quality( ) 2

Quality( ) ~3

Reviewers are consistent!

12

Regularizer for Reviewer Consistency

Reviewer Regularizer =∑ [ Quality( ) -

Quality( ) ]21 2

Sum over all data (train + test) for all pairs reviews in the same-author graph

Closed-form solution!1 2

3 4

Same-Author Graph (A)

[Zhou et al. 03] [Zhu et al. 03] [Belkin et al 06]

w=Graph LaplacianReview-Feature

Matrix

13

2.Trust Consistency Hypothesis

Quality( ) - Quality( ) ≤ 0

I trust people with quality at least as good as mine!

AVG ( Quality( ) )Defined as

14

Regularizer for Trust ConsistencyTrust Regularizer=∑max[0, Quality( ) -

Quality( )]2

Sum over all data (train + test) for all pairs ofreviewers connected in the trust graph

No closed-form solution…Still convexGradient Descent

Trust Graph

15

3.Co-Citation Consistency Hypothesis

Quality( ) - Quality( ) → 0

Trust Graph Co-citation Graph

I am consistent with my “trust standard”!

16

Regularizer for Co-citation Consistency

Co-citation Regularizer

=∑[ Quality( ) - Quality( ) ]2

Closed-form solution!

Sum over all data (train + test) for all pairs ofreviewers connected in the co-citation graph

Co-citation Graph (C)

w=Review-Reviewer Matrix

17

4.Link Consistency Hypothesis

Quality( ) - Quality( ) → 0

Trust Graph Link Graph

I trust people with similar quality as mine!

18

Regularizer for Link ConsistencyLink Regularizer

=∑[ Quality( ) - Quality( ) ]2

Closed-form solution!

Sum over all data (train + test) for all pairs ofreviewers connected in the co-citation graph

Link Graph

19

Roadmap

• Motivation• Review Quality Prediction Algorithms• Experimental Evaluation• Conclusions

20

Data from Ciao UKStatistics Cellphone Beauty Digital Camera# Reviews 1943 4849 3697Reviews/Reviewer ratio 2.21 2.84 1.06

Trust Graph Density 0.0075 0.014 0.0006

Summary Cellphone Beauty Digital CameraSocial Context rich rich sparse

Gold-std Quality Distribution balanced skewed balanced

21

Hypotheses Testing:Reviewer Consistency

Qg( ) -1 Qg( ) 2

Qg( ) -1 Qg( ) 3

Reviewer Consistency Hypothesis supported by data

Difference in Review QualityDe

nsityFrom same reviewer

From different reviewers

(Cellphone)

22

Hypotheses Testing:Social Network-based Consistencies

Qg( ) - Qg( ) B is not linked to AB trusts AB is co-cited with AB is linked to A

B A

Social Network-based Consistencies supported by data

Difference in Reviewer QualityDe

nsity

(Cellphone)

23

-15%

-10%

-5%

0%

5%

10%

15%

Prediction Performance:Exploiting Social Context

% o

f MSE

Diff

eren

ce

Percentage of Training Data10% 25% 50% 100%

AddFeatures is most effective given sufficient training data

With limited training data, Reg methods work best

Reg:Reviewer > Reg:Trust > Reg:Cocitation > Reg:Link

(Cellphone)Better

Reg:

Link

AddF

eatu

res

Reg:

Revi

ewer

Reg:

Coci

tatio

nRe

g:Tr

ust

24

Prediction Performance:Compare Three Categories

-15%-13%-11%

-9%-7%-5%-3%-1%

% o

f MSE

Diff

eren

ce Cellphone Beauty Digital Camera

Better

Reg:

Link

Reg:

Revie

wer

Reg:

Cocit

ation

Reg:

Trus

t

Improvement on Digital Camera is smaller due to sparse social context

Reviews/Reviewer ratio = 1.06

25

Parameter Sensitivity

Text-only Baseline

(Cellphone) (Beauty)Regularization Parameter

Mea

n Sq

uare

d Er

ror

consistently better than Baseline when parameter < 0.1

Better

26

Conclusions

• Improve Review Quality Prediction using Social Context

• Formalize into a Semi-supervised Graph Regularization framework• Utilize both labeled and unlabeled data• Applicable on data with no social context

• Promising results on real world data– Esp. limited labels, rich social context

27

Future Work

• Combine multiple regularizers• Optimize by nDCG instead of MSE• Infer trust network• Spam detection

Thank you!&

Questions?

Recommended