Download pdf - Tutorial on Robustness of Recommender Systems

Tutorial on Robustness of Recommender Systems

Tutorial on Robustness of RecommenderSystems

Neil Hurley

Complex Adaptive System LaboratoryComputer Science and Informatics

University College Dublin

Clique Strategic Research Clusterclique.ucd.ie

October 2011

RecSys 2011: Tutorial on Recommender Robustness

clique.ucd.ie


Outline

1 What is Robustness?Profile Injection AttacksRelevance of RobustnessMeasuring Robustness

2 Attack StrategiesAttacking kNN Algorithms

3 Attack DetectionPCA-based Attack DetectionStatistical Attack DetectionCost-Benefit Analysis

4 Robustness of Model-based Algorithms

5 Attack Resistant Recommendation AlgorithmsProvably Manipulation Resistant Algorithms

6 Stability, Trust and Privacy



Outline









Outline









Outline









Outline









Outline






6 Stability, Trust and PrivacyRecSys 2011: Tutorial on Recommender Robustness


What is Robustness?

Outline








What is Robustness?

Profile Injection Attacks

Outline








What is Robustness?


Defining the Problem

Recommender Systems use personal information aboutend-users to make useful personalised recommendations.

When ratings are provided explicitly, recommender algorithmshave been designed on the assumption that the providedinformation is correct.

However . . .

“One can have, some claim, as many electronic per-sonas as one has time and energy to create”

–Judith Donath (1998) as quoted in Douceur (2002)

How does the system perform if multiple identities are used totry to deliberately bias the recommender output?



What is Robustness?



In 2002, John Douceur of Microsoft Research coined the termSybil Attack to refer to an attack against identity onpeer-to-peer systems in which an individual entitymasquerades as multiple separate entities

“If the local entity has no direct physical knowledge of remote entities, it perceives them onlyas informational abstractions that we call identities. The system must ensure that distinctidentities refer to distinct entities; otherwise, when the local entity selects a subset of identitiesto redundantly perform a remote operation, it can be duped into selecting a single remoteentity multiple times, thereby defeating the redundancy”

In the same year, the first paper (O’Mahony et al. 2002)appeared on the vulnerability of Recommender Systems tomalicious strategies for “recommendation promotion” – laterdubbed profile injection attacks.



What is Robustness?



Robustness refers to the ability of a system to operate understressful conditions.

While there are many possible stresses that can be applied toRecommender Systems, research on RS robustness hasfocused on performance when the dataset is stressedspecifically when

the dataset is full of noisy, erroneous data;typically, imagined to have been corrupted through a concertedsybil attack, with an aim of biasing the recommender output.



What is Robustness?


Robust RS Research

The goal of robust recommendation is to prevent attackersfrom manipulating an RS through large-scale insertion of falseuser profiles: a profile injection attack

An attack is a concerted effort to bias the results of arecommender system by the insertion of a large number ofprofiles using false identities or sybils.Each identity is referred to as an attack profile.Research has concentrated on attacks designed to achieve aparticular recommendation outcome

A Product Push attack: attempt to secure positiverecommendations for an item or items;A Product Nuke attack: attempt to secure negativerecommendations for an item or items.

We can also think of attacks that aim to simply destroy theaccuracy of the system.



What is Robustness?


Robust RS Research

The goal of robust recommendation is to prevent attackersfrom manipulating an RS through large-scale insertion of falseuser profiles: a profile injection attackAn attack is a concerted effort to bias the results of arecommender system by the insertion of a large number ofprofiles using false identities or sybils.

Each identity is referred to as an attack profile.Research has concentrated on attacks designed to achieve aparticular recommendation outcome





What is Robustness?


Robust RS Research

The goal of robust recommendation is to prevent attackersfrom manipulating an RS through large-scale insertion of falseuser profiles: a profile injection attackAn attack is a concerted effort to bias the results of arecommender system by the insertion of a large number ofprofiles using false identities or sybils.Each identity is referred to as an attack profile.

Research has concentrated on attacks designed to achieve aparticular recommendation outcome





What is Robustness?


Robust RS Research

The goal of robust recommendation is to prevent attackersfrom manipulating an RS through large-scale insertion of falseuser profiles: a profile injection attackAn attack is a concerted effort to bias the results of arecommender system by the insertion of a large number ofprofiles using false identities or sybils.Each identity is referred to as an attack profile.Research has concentrated on attacks designed to achieve aparticular recommendation outcome





What is Robustness?


Robust RS Research


A Product Push attack: attempt to secure positiverecommendations for an item or items;

A Product Nuke attack: attempt to secure negativerecommendations for an item or items.




What is Robustness?


Robust RS Research






What is Robustness?


Robust RS Research






What is Robustness?


Robust RS Research

We assume that the attacker has no direct access to theratings database – manipulation achieved via the creation offalse profiles only.

We ignore system-level methods (e.g. Captchas) forpreventing the generation of false identities or ratings

Focus is on the recommendation algorithm’s ability to resistmanipulation either by

Identifying false profiles from their statistical properties andignoring or lessening their impact on the generation ofrecommendations; orGenerating recommendations in a manner that is inherentlyinsensitive to manipulation.



What is Robustness?


Robust RS Research







What is Robustness?


Robust RS Research







What is Robustness?


Robust RS Research




Identifying false profiles from their statistical properties andignoring or lessening their impact on the generation ofrecommendations;

orGenerating recommendations in a manner that is inherentlyinsensitive to manipulation.



What is Robustness?


Robust RS Research







What is Robustness?


Example



What is Robustness?


Example



What is Robustness?


Example



What is Robustness?


Threats to Reputation Systems I

It is interesting to compare the scenario studied in RS researchwith the threats identified for reputation systems in the 2007ENISA report (Carrara and Hogben 2007):

1 Whitewashing attack: reseting a poor reputation byrejoining the system with a new identity.

2 Sybil attack or pseudospoofing: the attacker uses multipleidentities (sybils) in order to manipulate a reputation score.

3 Impersonation and reputation theft: acquiring the identityof another and stealing her reputation.

4 Bootstrap issues and related threats: the initial reputationof a newcomer may be particularly vulnerable to attack.



What is Robustness?


Threats to Reputation Systems II

5 Extortion: co-ordinated campaigns aimed at blackmail bydamaging an individual’s reputation for malicious motives.

6 Denial-of-reputation: attack designed to damage reputationand create an opportunity for blackmail in order to have thereputation cleaned.

7 Ballot stuffing and bad mouthing: reporting of a falsereputation score; the attackers collude to givepositive/negative feedback, to increase or lower a reputation.

8 Collusion: multiple users conspire to influence a givenreputation.

9 Repudiation of data and transaction: denial that atransaction occurred, or denial of the existence of data forwhich individual is responsible.



What is Robustness?


Threats to Reputation Systems III

10 Recommender dishonesty: dishonest reputation scoring.

11 Privacy threats for voters and reputation owners: forexample, anonymity improves the accuracy of votes.

12 Social threats: Discriminatory behaviour, herd behaviour,penalisation of innovative, controversial opinions, vocalminority effect etc.

13 Threats to the underlying networks: e.g. denial of serviceattack.

14 Trust topology threats: e.g.targeting most highly influentialnodes.

15 Threats to ratings: exploiting features of metrics used bythe system to calculate the aggregate reputation rating



What is Robustness?


The Recommendation Attack Game

An attack has an associated context-dependent cost

real dollar cost if the insertion of a sybil rating requires thepurchase of the corresponding item;time/effort cost;risk cost, associated with the likelihood of being detected;

In any case, we may model the cost as proportional to

The number of sybil profiles created ;the total number of constituent ratings within the sybil profiles.



What is Robustness?




real dollar cost if the insertion of a sybil rating requires thepurchase of the corresponding item;

time/effort cost;risk cost, associated with the likelihood of being detected;





What is Robustness?




real dollar cost if the insertion of a sybil rating requires thepurchase of the corresponding item;time/effort cost;

risk cost, associated with the likelihood of being detected;





What is Robustness?









What is Robustness?









What is Robustness?






The number of sybil profiles created ;

the total number of constituent ratings within the sybil profiles.



What is Robustness?









What is Robustness?


Impact Curve

Figure: Impact curve from Burke et al. (2011)



What is Robustness?


Notation

Consider R to be an n×m database of ratings provided by aset of n genuine users for m items in a system catalogue.

Let G be the set of n genuine users, and letra = (ra,1, . . . , ra,m)T represent the set of ratings provided byuser a for each item.

ra,i ∈ L, some quality label, typically a numerical value oversome discrete range. Write ra,i = ∅ if user a has not rateditem i.

Let A be the set of attack profiles of size nA = number ofprofiles and mA = number of ratings. Let ai denote a singleattack profile 1 ≤ i ≤ nA.

Let R′ be the (n+ nA)×m database of genuine and attackprofiles available to the recommendation algorithmpost-attack.



What is Robustness?


Notation








What is Robustness?


Notation








What is Robustness?


Notation








What is Robustness?


Notation

Let cA = cA(nA,mA) denote the cost of mounting an attack.

The recommendation algorithm may be represented as afunction φ, that uses the rating database R and a user’shistory of previous ratings, ru, to make a set of predictionsp(u, i) = φ(i, R, ru) ∈ L on the set of items.

More generally, φ(.) results in a probability mass function overthe label space L for each predicted item.

Let l(φ(., R′, .), φ(., R, .)) be a loss function representing thequality loss of the recommendation process due to an attack.

This function may depend on the goal of the attack e.g. theoverall loss in accuracy, if the attack seeks to distort generalrecommendation performance,or the shift in ratings for some targeted set of items over sometargeted set of users, in a focused attack.



What is Robustness?


Notation








What is Robustness?


Notation








What is Robustness?


Notation








What is Robustness?


Notation





This function may depend on the goal of the attack e.g. theoverall loss in accuracy, if the attack seeks to distort generalrecommendation performance,

or the shift in ratings for some targeted set of items over sometargeted set of users, in a focused attack.



What is Robustness?


Notation








What is Robustness?


The Recommender Attack Game

Then the manipulation game, from the attacker’s point ofview is to choose the attack A∗ that maximises the loss for agiven cost bound cmax.

Attack Goal

A∗ = arg max{A|cA≤cmax}minφ l(φ(R′), φ(R))

While the system designer strives to find a recommendationalgorithm that militates against attack

Defense Goal

φ∗ = arg minφ max{A|cA≤cmax} l(φ(R′), φ(R))



What is Robustness?


The Recommender Attack Game

Then the manipulation game, from the attacker’s point ofview is to choose the attack A∗ that maximises the loss for agiven cost bound cmax.

Attack Goal

A∗ = arg max{A|cA≤cmax}minφ l(φ(R′), φ(R))

While the system designer strives to find a recommendationalgorithm that militates against attack

Defense Goal

φ∗ = arg minφ max{A|cA≤cmax} l(φ(R′), φ(R))



What is Robustness?

Relevance of Robustness

Outline








What is Robustness?


Our Original Motivation

Inspired by Work in Digital Watermarking . . .



What is Robustness?


Our Original Motivation

Inspired by Work in Digital Watermarking . . .



What is Robustness?


How Realistic Is this Scenario?

news.bbc.co.uk/2/hi/4741259.stm (2001) “ ... ads for films including Hollow Man and A Knight’s Tale quoted

praise from a reviewer called David Manning, who was exposed as being invented ...”

http://tinyurl.com/3d6d969 (Sept. 2003) “AuctionBytes conducted a reader survey to find out how serious

these problems were. According to the survey, 39% of respondents felt that feedback retaliation was a very big

problem on eBay. Nineteen percent of respondents had received retaliatory feedback within the previous 6

months, and 16% had been a victim of feedback extortion within the previous 6 months.”

http://tinyurl.com/n3d5lp (June 2009) “Elsevier officials said Monday that it was a mistake for the publishing

giant’s marketing division to offer $25 Amazon gift cards to anyone who would give a new textbook five stars in a

review posted on Amazon or Barnes & Noble. While those popular Web sites’ customer reviews have long been

known to be something less than scientific, and prone to manipulation if an author has friends write on behalf of

a new work, the idea that a major academic publisher would attempt to pay for good reviews angered some

professors who received the e-mail pitch..”


news.bbc.co.uk/2/hi/4741259.stm

http://tinyurl.com/3d6d969

http://tinyurl.com/n3d5lp


What is Robustness?




















What is Robustness?




















What is Robustness?



http://tinyurl.com/cfmqce (2009) Paul Lamere cites an example of “precision hacking” of a Time Poll.

http://tinyurl.com/mupy7d (2009) Hotel review manipulation

Lang et al. (2010) Social Manipulation in Buzznet “...Hey, I know you don’t know me, but could you do me a

huge fav and vote for me in this contest? All you have to do is buzz me...”


http://tinyurl.com/cfmqce

http://tinyurl.com/mupy7d


What is Robustness?











What is Robustness?











What is Robustness?







All examples of types of manipulation attacks onrecommender systems.

No hard evidence of automated shilling attacks by sybilinsertion bots.





What is Robustness?

Measuring Robustness

Outline








What is Robustness?


Simple Measures of Attack Impact

Average Prediction Shift

The change in an item’s predicted rating before and after attack,averaged over all predictions or over predictions that are targettedby the attack.

pshift(u, i) = φ(i,R′, ru)− φ(i,R, ru)

Average Hit Ratio

The average likelihood over tested users that a top-Nrecommendation will recommend an item that is the target of anattack. For each such item i and each tested user, u, in a test setof t users, let h(u, i) = 1 if i ∈ Ru, the recommended set. Then,H(i) = 1

t

∑u∈U h(u, i)



What is Robustness?


Simple Measures of Attack Impact

Average Prediction Shift

The change in an item’s predicted rating before and after attack,averaged over all predictions or over predictions that are targettedby the attack.

pshift(u, i) = φ(i,R′, ru)− φ(i,R, ru)

Average Hit Ratio

The average likelihood over tested users that a top-Nrecommendation will recommend an item that is the target of anattack. For each such item i and each tested user, u, in a test setof t users, let h(u, i) = 1 if i ∈ Ru, the recommended set. Then,H(i) = 1

t

∑u∈U h(u, i)



What is Robustness?


Other Measures of Attack Impact

Considering φ(i,R, ru) as a pmf over L, attack impact can bemeasured in terms of the change in this distribution as aresult of the attack.

For instance (Yan and Roy 2009), measure impact in terms ofthe average Kullback-Liebler distance between φ(i,R, ru) andφ(i,R′, ru) over the set of inspected items.

Resnick and Sami (2007) propose loss functions L(l, q) wherel ∈ L, q = φ(i,R, ru) where the true label of item i is l

e.g. the quadratic loss over a two rating scale [HI,LO], withq the probability of HI is given by

L(HI, q) = (1− q)2; L(LO, q) = q2



What is Robustness?


Other Measures of Attack Impact

Considering φ(i,R, ru) as a pmf over L, attack impact can bemeasured in terms of the change in this distribution as aresult of the attack.

For instance (Yan and Roy 2009), measure impact in terms ofthe average Kullback-Liebler distance between φ(i,R, ru) andφ(i,R′, ru) over the set of inspected items.

Resnick and Sami (2007) propose loss functions L(l, q) wherel ∈ L, q = φ(i,R, ru) where the true label of item i is l

e.g. the quadratic loss over a two rating scale [HI,LO], withq the probability of HI is given by

L(HI, q) = (1− q)2; L(LO, q) = q2



Attack Strategies

Outline








Attack Strategies

Forming Attack Profiles

(Mobasher et al. 2007) introduce the following notation forthe components of an attack profile:



Attack Strategies



IT , the target item(s) receive typically the maximum (resp.minimum) rating for a push (resp. nuke) attack.



Attack Strategies



IS , the selected item(s) are chosen and rated in a manner tosupport the attack.



Attack Strategies



IF , the filler items(s) fill out the remainder of the ratings inthe attack profile.



Attack Strategies


The goal of attack profile creation must be to

Effectively support the purpose of the attack – selection oftarget rating and IS ;

But, remain unobtrusive so as to avoid detection and filtering– selection of the filler items IF .

Must also consider what information is available to theattacker

Typically assume some knowledge of the statistics of the ratingdatabase is available;May also have knowledge of the recommendation algorithm –an informed attack.

Note Kerkchoff’s principal – avoid “security throughobscurity”.



Attack Strategies



Effectively support the purpose of the attack – selection oftarget rating and IS ;But, remain unobtrusive so as to avoid detection and filtering– selection of the filler items IF .






Attack Strategies









Attack Strategies





Typically assume some knowledge of the statistics of the ratingdatabase is available;

May also have knowledge of the recommendation algorithm –an informed attack.




Attack Strategies









Attack Strategies









Attack Strategies

Attacking kNN Algorithms

Outline








Attack Strategies


User-based kNN Attack

The first CF profile insertion attack (O’Mahony et al. 2002),was an informed attack that exploited a particular weakness inthe basic version of Resnick’s user-based kNN algorithm.

User-based CF predicts a rating pa,j for item j, user a asfollows:

– Form a neighbourhood by picking the top-k most similar usersto a

– Pearson Correlation

wa,i =P

j(ra,j−ra)(ri,j−ri)√Pj∈Na∩Ni

(ra,j−ra)2P

j(ri,j−ri)2

– Make a prediction by taking a weighted average of

neighbours ratings using: pa,j = ra + κ∑n

i=1 wa,i(ri,j − ri)where κ is a normalising factor



Attack Strategies







wa,i =P


(ra,j−ra)2P

j(ri,j−ri)2






Attack Strategies







wa,i =P


(ra,j−ra)2P

j(ri,j−ri)2






Attack Strategies



Correlation calculated over the items which the target userand attack profile have in common. A small intersection setcan lead to high correlations.

Therefore a small profile of popular items will have a largecorrelation with users who have rated these items

Implies a low-cost, effective attack on a large proportion ofthe userbase.

However, not at all unobtrusive.



Attack Strategies









Attack Strategies









Attack Strategies









Attack Strategies


Other Attacks Strategies

Many other attack variants proposed by (Lam and Riedl 2004) and(Mobasher et al. 2007). Assuming a push attack – a target item isgiven the maximum rating and the attack profile is filled out asfollows:

Random Attack Randomly chosen filler items get randomlydrawn rating values.

Average Attack Randomly chosen filler items. Ratings drawnfrom normal distribution with item means set tothose of rating database.

Probe Attack Filler items filled by starting with a set of seedratings, querying the recommender system to fillthe ratings of the remaining items.



Attack Strategies









Attack Strategies









Attack Strategies




Segment Attack Identifying popular items in a particular usersegment, these are given maximum ratingand remaining filler items are given minimumrating.

Bandwagon Attack A set of popular items is given the maximumrating, while remaining filler items are givenrandom ratings.



Attack Strategies




Segment Attack Identifying popular items in a particular usersegment, these are given maximum ratingand remaining filler items are given minimumrating.

Bandwagon Attack A set of popular items is given the maximumrating, while remaining filler items are givenrandom ratings.



Attack Strategies


Evaluation User-based kNNToward Trustworthy Recommender Systems • 23:21

Fig. 8. Comparison of attacks against user-based algorithm, prediction shift (left) and hit ratio(right). The baseline in the right panel indicates the hit ratio results prior to attack.

target item. The filler size, in this case, is the proportion of the remaining itemsthat are assigned random ratings based on the overall data distribution acrossthe whole database (see Figure 4). In the case of the MovieLens data, thesefrequently-rated items are predictable box office successes including such titlesas Star Wars, Return of Jedi, Titanic, etc. The attack profiles consist of highratings given to these popular titles in conjunction with high ratings for thepushed movie. Figure 7 shows the effect of filler size on the effectiveness ofthis attack. In this particular experiment, we selected only a single popularmovie as the “bandwagon” movie with which to associate the target and usedan attack size of 10% (i.e., the number of attack profiles were about 10% of thesize of the original database).

For the bandwagon attack, the best results were obtained by using a 3%filler size (set IF containing ratings for approximately 3% of the movies in thedatabase). This number seems to correspond closely to the average number ofmovies per user in the database. For subsequent experiments with this attackmodel we used five “bandwagon” movies. The five movies chosen were those withthe most ratings in the database. Obviously, this is a form of system-specificknowledge, increasing the knowledge requirements of this attack somewhat.However, we verified the general popularity of these movies using externaldata sources2,3 and found they would be among anyone’s list of movies likelyto have been seen by many viewers.

Figure 8 shows the results of a comparative experiment examining threealgorithms at different attack sizes. The algorithms included the average attack(3% filler size), the bandwagon attack (using one frequently rated item and3% filler size), and the random attack (6% filler size). These parameters werechosen pessimistically as they are the versions of each attack that were foundto be most effective. We see that even without system-specific data an attacklike the bandwagon attack can be successful at higher attack levels. The moreknowledge-intensive average attack was still better, with the best performance

2http://www.the-numbers.com/movies/records/inflation.html.3http://www.imdb.com/boxoffice/alltimegross.

ACM Transactions on Internet Technology, Vol. 7, No. 4, Article 23, Publication date: October 2007.

Results from (Mobasher et al. 2007) on the Movielens 100K dataset. User-based kNN algorithm withk = 20. All users who have rated at least 20 items.

Attack Size given as a percentage of the total number of users in the dataset on the x-axis. Results shownfor different filler sizes, written as a percentage of the total number of items.

Bandwagon attack uses a single popular item and 3% filler size.

Baseline refers to hit-ratio pre-attack



Attack Strategies


Evaluation Item-based kNN

Results from (Mobasher et al. 2007) on the Movielens 100K dataset. Item-based kNN algorithm withk = 20.



Attack Strategies


Evaluation Item-based kNNToward Trustworthy Recommender Systems • 23:23

Fig. 10. Prediction shift and hit ratio results for the horror movie segment in the item-basedalgorithm.

the target item for use as the segment portion of the attack profile IS . In therealm of movies, we might imagine selecting movies of a similar genre or moviescontaining the same actors.

If we evaluate the segmented attack based on its average impact on all users,there is nothing remarkable. The attack has an effect but does not approach thenumbers reached by the average attack. However, we must recall our marketsegment assumption: namely, that recommendations made to in-segment usersare much more useful to the attacker than recommendations to other users. Ourfocus must therefore be with the “in-segment” users, those users who have ratedthe segment movies highly and presumably are desirable customers for pusheditems that are similar: an attacker using the Horror segment would presumablybe interested in pushing a new movie of this type.

To build our segmented attack profiles, we identified the user segment asall users who had given above average scores (4 or 5) to any three of the fiveselected horror movies, namely, Alien, Psycho, The Shining, Jaws, and TheBirds.4 For this set of five movies, we then selected all combinations of threemovies that had at least 50 users, support, and chose 50 of those users randomlyand averaged the results.

The power of the segmented attack is emphasized in Figure 10, in which theimpact of the attack is compared within the targeted user segment and withinthe set of all users. The left panel in the figure shows the comparison in termsof prediction shift and varying attack sizes, while the right panel depicts thehit ratio at 1% attack.5 While the segmented attack does show some impactagainst the system as a whole, it truly succeeded in its mission: to push theattacked movie precisely to those users defined by the segment. Indeed, in thecase of in-segment users, the hit ratio was much higher than average attack.The chart also depicts the effect of hit ratio before any attack. Clearly thesegmented attack had a bigger impact than any other attack we have previouslyexamined against item-based algorithm. Our prediction shift results show thatthe segmented attack was more effective against in-segment users than even

4The list was generated from online sources of the popular horror films: http://www.imdb.

com/chart/horror and http://www.filmsite.org/afi100thrillers1.html.5Note that our previous hit ratio figures have used 10% attack size. Here we see comparableperformance with 1/10 of the number of attack profiles.

ACM Transactions on Internet Technology, Vol. 7, No. 4, Article 23, Publication date: October 2007.

Results from (Mobasher et al. 2007) on the Movielens 100K dataset. Item-based kNN algorithm withk = 20.

Prediction shift and hit ratio results for the Horror Movie Segment.



Attack Detection

Outline








Attack Detection


Two well-studied attacks :

Construct spam profile with max (or min) rating for targeteditem.Choose a set of filler items at randomRandom Attack – insert ratings in filler items according toN (µ, σ)Average Attack – insert ratings in filler item i according to

N (µi, σi)

For evaluations, genuine profiles are drawn from 1,000,000rating Movielens dataset.



Attack Detection

Detection Flow



Attack Detection

PCA-based Attack Detection

Outline








Attack Detection


PCA Detector (Mehta et al. 2007)

A method of identifying and removing a cluster ofhighly-correlated attack profiles.

A cluster C defined by an indicator vector x such that xi = 1if user ui ∈ C and xi = 0 otherwise.S a profile similarity matrix, with eigenvectors/values ei, λi

Quadratic form

xT Sx =∑

i∈C,j∈C

S(i, j) =m∑

i=1

(x.ei)2λi .

Find x that correlates most with first few eigenvectors of S.



Attack Detection



Success of PCA depends on how S is calculated.S = Z0, covariance of profiles ignoring missing values.



Attack Detection



Success of PCA depends on how S is calculated.S = Z1, covariance of profiles treating missing values as 0.



Attack Detection



Success of PCA depends on how S is calculated.Small Overlap → low genuine/attack covariance.



Attack Detection

Statistical Attack Detection

Outline








Attack Detection


Neyman-Pearson Statistical Detection

y ∈ Y be an observation i.e. a user profile over all possibleprofiles YTwo hypotheses

H0 – that y is a genuine profile, associated pdf fY|H1(y)

H1 – that y is an attack profile, associated pdf fY|H0(y)

N-P criterion sets a bound α on false alarm probability pf andmaximises the good detection probability pD

ψ∗(y) ={H1 if l(y) > ηH0 if l(y) < η

where l(y) =fY|H1

(y)

fY|H0(y)

(likelihood ratio)



Attack Detection


Modelling Attack Profiles

As attacks follow well-defined construction, easy to constructmodel:

m∏i=1

Pr[Yi = yi]

=m∏i=1

Pr[Yi = φ]θi (Pr[Yi = yi|Yi 6= φ]Pr[Yi 6= φ])1−θi

Which items are rated?



Attack Detection




m∏i=1

Pr[Yi = yi]

=m∏i=1

Pr[Yi = φ]θi (Pr[Yi = yi|Yi 6= φ]Pr[Yi 6= φ])1−θi

What ratings used?



Attack Detection




m∏i=1

Pr[Yi = yi]

=m∏i=1

Pr[Yi = φ]θi

(Q

(yi − 1

2 − µiσi

)−Q

(yi + 1

2 − µiσi

))1−θi

Q(.) = Gaussian Q-function



Attack Detection


Modelling Genuine Profiles

Simple model – identical to attack model except thatprobability of selecting a filler item is estimated aspi , Pr[Yi = φ] from a dataset of genuine profiles.

m∏i=1

Pr[Yi = yi]

=m∏i=1

piθi (Pr[Yi = yi|Yi 6= φ](1− pi))1−θi

Not a realistic model of genuine ratings – assumes user’sratings are all independent

But sufficient (almost) to distinguish from attack profiles.



Attack Detection


Random Attack (Filler Size=5%)



Attack Detection


Average Attack (Filler Size=5%)



Attack Detection


Lesson Learned

Filler item selection is key to the success of the standardattacks on k-NN user-based algorithm.

Low (attack profile / genuine profile) overlap (few commonratings) makes extreme correlations possible – hence attackprofiles are unusually influential. (Basis of original attackproposed in O’Mahony et al. (2002).)But also allows for successful detection – also highlyperceptible.



Attack Detection


Attack Obfuscation

Several obfuscation strategies evaluated previously (Williamset al. 2006).

Effective obfuscation must try to reduce the statisticaldifferences between genuine and attack profiles.



Attack Detection


Average over Popular (AoP) attack

It is clear that a major weakness of the average and randomattacks is their unrealistic selection of filler items.

To be imperceptible an attacker must choose items to rate ina similar fashion to genuine users.

Average Over Popular Attack – identical to average

attack, but filler items are chosen from x-% most popularitems.

AoP is a less perceptible but also less effective attack onkNN user-based algorithm.

Nevertheless it does work . . .



Attack Detection


AoP Prediction Shift (Attack Size=3%)



Attack Detection


AoP Hits (rating of ≥ 4) (Attack Size=3%)



Attack Detection


AoP PCA Detection



Attack Detection


Improved Genuine Profile Model

Adopt model that takes account of correlations betweenratings.

Assume ratings follow multivariate normal distribution –parameters:

µ0, m-dimensional vector of mean-ratings for each item; andΣ0, m×m matrix of item correlations.

Assume attack profiles also multivariate gaussian withparameters µ1, Σ1

Hence, attacked database can be modelled as a GaussianMixture Model.



Attack Detection



DifficultyVery high dimensional vectorsMissing values

Factor Model : Assume that the rating matrix can berepresented by the linear model

YT = AXT + N ,

X : n× k matrix, xu,j = extent user u likes category jA : m× k matrix ai,j = extent item i belongs in category jN : independent noiseX(i, :) assumed independent normal.

Expectation maximisation to learn A from dataset ([Canny,2002]).



Attack Detection



N-P test on multivariate normal k-dimensional vectorobtained by projection with A

w = ATYT = AT (AXT + N)

X : n× k matrix, xu,j = extent user u likes category jA : m× k matrix ai,j = extent item i belongs in category jN : independent noiseX(i, :) assumed independent normal.



Attack Detection


Projection by Clustering

N-P test on multivariate normal k-dimensional vectorobtained by projection with P

w = PTYT

Obtain a clustering of item-set into k clusters of similar items.P : n× k projection matrix sums all ratings belonging to acluster.



Attack Detection


Supervised AoP Detection

N-P test reduces to

(w−µw0)TΣw0−1(w−µw0)−(w−µw1)TΣw1

−1(w−µw1) ≶ η

Need

Database of genuine profiles andDatabase of attack profiles

to train the detector.



Attack Detection


AoP Detection (Factor Analysis)

Actual attack=20% AoP Attack. Plot shows training ondifferent attack types.



Attack Detection


AoP Detection (Clustering)

Actual attack=20% AoP Attack. Plot shows training ondifferent attack types.



Attack Detection


Unsupervised AoP Detection

Unsupervised Gaussian mixture model to learn modelparameters

µw0, Σw0, µw1, Σw1, a0 and a1, where ai is the probabilitythat a profile belongs in cluster i.

Implementation from William Wong, Purdue University,http://web.ics.purdue.edu/~wong17/.


http://web.ics.purdue.edu/~wong17/


Attack Detection


AoP Detection



Attack Detection

Cost-Benefit Analysis

Outline








Attack Detection



In O’Mahony et al. (2006), we examined a simple cost-benefitmodel

The ROI for an attack on item j given by:

ROIj =cjN(n′j−nj) −

Pk∈Ij

ckPk∈Ij

ck

Simplify by assuming cp = cq, ∀ p, q ∈ IROIj = N(n′j−nj) − sj

sj

sj – total number of ratings inserted in an attack on item j

Approximated the fractions nj & n′j using count of goodpredictions and browser-to-buyer conversion probability



Attack Detection


Results

Profitj = c(αj(GP′j − GPj)− sj

); c = $10, α = 10%

0 2000 4000 6000 8000 10000−200

0

200

400

600

N

Pro

fit (

$)

r2 = 0.9894



Attack Detection


Cost-benefit Analysis

Vu et al. (2010) examines the adversarial cost to attacking aranking system in terms of nA = number of profiles andmA=number of ratings.

They assume a rating function r(u, i) ∈ {−1, 0, 1} and aquality-popularity score for each item

f(i) =∑u∈G

r(u, i)



Attack Detection


Cost-benefit Analysis

Using a trust mechanism that can detect a malicious ratingwith probability γ, they show that a ranking function of theform

f(i) =∑

u∈G∪Ar(u, i)t(u, i)

can be used to design a ranking system in which the minimumadversarial cost in expectation to boost the rank of an itemfrom k to 1 includes the cost of posting mA=nA ratings, with

nA = (x1 + xk)1− 2ε+ εγ

1− γ

where xi denotes the number of honest ratings for item i andε is the probability of an erroneous rating.



Robustness of Model-based Algorithms

Outline









Model-based Algorithms

Attack takes effect when model is re-trained, using corruptedratings database.


































Manipulation-resistance of Model-based Algorithms

Initial work such as Mobasher et al. (2006) showed thatmodel-based algorithms are more resistant to manipulationthan memory-based algorithms.











The user-base is clustered into segments using k-means orPLSA clustering and users matched to closest segmentprofiles.






Sybil profiles tend to be clustered into same segment, therebyreducing power of attack.






However, as shown in Cheng and Hurley (2009a), a diversifiedattack can create less similar but yet still effective attackprofiles.





From Mobasher et al. (2006): Evaluation on MovieLens 100Kdataset





From Cheng and Hurley (2009a): Evaluation on MovieLens 1Mdataset




Manipulation-resistance of Matrix Factorisation

Modern matrix factorization algorithms use a least squaresstep to find the factors. This is known to be sensitive tooutliers.




Manipulation-resistance of Matrix Factorisation

Cheng and Hurley (2010) Prediction shift of basic Bellkoralgorithm kNN for a single randomly chosen unpopular item.




Summary of Robustness Results on Standard Algorithms

A number of general attack strategies have been proposedthat work effectively in particular on memory-basedalgorithms.

Standard attacks are generally detectable:Cost effectiveness implies that a sybil profile should beunusually influential.

Special purpose (informed) attacks can be tailored towardsparticular CF algorithms

e.g. a high diversity attack proved effective against thek-means clustering algorithm.

Sybil profiles can be obfuscated to make them less detectableSelection of filler items is Achille’s heal of standard averageattack, but also explains its power.Obfuscation reduces the effectiveness of attacks butmemory-based algorithms are still vulnerable.






Standard attacks are generally detectable:

Cost effectiveness implies that a sybil profile should beunusually influential.









Standard attacks are generally detectable:Cost effectiveness implies that a sybil profile should beunusually influential.






Attack Resistant Recommendation Algorithms

Outline









VarSelectSVD

Some early work (O’Mahony et al. 2004) appliedneighbourhood filtering to the standard user-based algorithm,to cluster and filter suspicious neighbours.

Mehta and Nejdl (2008) introduces the the VarSelectSVDmethod – a matrix factorisation strategy taking robustnessinto account.

An SVD factorization of the rating matrix R requires thefollowing to be solved:

arg minG,H‖R−GH‖

Users are flagged as suspicious using PCA clustering.Flagged users do not contribute to the update of righteigenvector – they do not contribute to the model.




VarSelectSVD









VarSelectSVD









VarSelectSVD





Users are flagged as suspicious using PCA clustering.

Flagged users do not contribute to the update of righteigenvector – they do not contribute to the model.




VarSelectSVD









VarSelectSVD




VarSelectSVD Results

MAE, Prediction Shift and Hit Ratio for 5% Average Attack,7% Filler on Movielens 1M




Manipulation Resistance Through Robust Statistics

Robust statistics describes an alternative approach to classicalstatistics where the motivation is to produce estimators thatare not unduly affected by small departures from the model.

Robust regression uses a bounded cost function which limitsthe effect of outliers.

Two Approaches







Two Approaches

1 M-estimators –

arg minG,H

∑rij 6=0

ρ(rij−gihj) where ρ(r) ={ 1

2γ r2 |r| ≤ γ

|r| − γ2 |r| > γ







Two Approaches

2 Least Trimmed Squares –

arg minG,H

h∑i=1

e2(i) where e2(1) ≤ e2(2) ≤ · · · ≤ e

2(n)




Robust Statistics Results on Bellkor Method




Provably Manipulation Resistant Algorithms

Outline










The Influence Limiter (Resnick and Sami 2007)

Output of recommendation systemqj passes through aninfluence-limiting process toproduce a modified output qj .

qj is a weighted average of qj−1

and qj

The weighting depends on thereputation that user j hasaccumulated with respect to thetarget.





The Influence Limiter (Resnick and Sami 2007)

A scoring function assignsreputation to j based on whetheror not the target actually likes theitem.

Tuned so that honest users canreach full credibility after O(log n)steps.

The main result states that thetotal impact of n sybils in terms ofperformance reduction is bounded

by −neλ .





Yan and Roy (2009)’s Manipulation Robust Algorithm

A class of CF algorithms – which the authors’ call linear CFalgorithms – is presented in which the manipulationdistortion is bounded above by

1n

11−r .

where r is the fraction of the data that is generated bymanipulators and n is the number of products that havealready been rated by a user whose future ratings are to bepredicted.

“Suppose a CF system that accepts binary ratings predictsfuture ratings correctly 80% of the time in the absence ofmanipulation. If 10% of all ratings are provided bymanipulators, according to our bound, the system can maintaina 75% rate of correct predictions by requiring each new user torate at least 21 products before receiving recommendations.”






Let r ∈ L be a rating in the label space.

Let ν be a permutation of the items such that νn is the nth

item rated by an active user, a.

Let rn−1a be the active user’s profile after rating n− 1 items.

Then a CF algorithm may be described as

A PMF over the rating that the active user gives for the νn

item.Depending on the active user’s current profilethe training set, Rthe order in which the user rates items.











P (ra,νn = r|rn−1a ,R,ν)


item.Depending on the active user’s current profilethe training set, Rthe order in which the user rates items.













item.

Depending on the active user’s current profilethe training set, Rthe order in which the user rates items.













item.

Depending on the active user’s current profile

the training set, Rthe order in which the user rates items.













item.Depending on the active user’s current profile

the training set, R

the order in which the user rates items.













item.Depending on the active user’s current profilethe training set, R

the order in which the user rates items.






A probabilistic CF algorithm predicts independently of theorder ν.

Let R′ = (R,Y) be a database divided in proportion (1− r)to r between R and Y.

Then a linear probabilistic CF algorithm is one in which

P (.|rn−1a , (R,Y)) = (1− r)P (.|rn−1

a ,R) + rP (.|rn−1a ,Y)

As the active user rates more and more products, his ratingswill tend to be distinguished as sampled from P (.|rn−1

a ,R)and hence the influence of P (.|rn−1

a ,Y) diminishes as n grows.






The kNN algorithm is not linear and does not satisfy thebound

The Naive Bayes (NB) algorithm is an asymptotically linearCF algorithm.

Kernel Density Estimation (KDE) is a linear CF algorithm.



Stability, Trust and Privacy

Outline









Robustness and Related Concepts

Stability refers to a recommender system’s ability to provideconsistent recommendations

in the face of noise in the dataset.using different training sample;over time, if new ratings ‘agree’ with past ratings

Adomavicius and Zhang (2010), explores stability from thepoint of view of adding a CF algorithm’s predictions to thedataset as new ratings, and measuring the prediction shiftthat is incurred.






in the face of noise in the dataset.

using different training sample;over time, if new ratings ‘agree’ with past ratings







in the face of noise in the dataset.using different training sample;

over time, if new ratings ‘agree’ with past ratings







in the face of noise in the dataset.using different training sample;over time, if new ratings ‘agree’ with past ratings






Amatriain et al. (2009) carries out a user study of test re-testreliability and shows that inconsistencies negatively impact thequality of the predictions.





Trust has been widely explored in collaborative filtering

. . . avoid manipulation by only using rating of trusted users orbuild trust based on user behaviour (c.f. the Influence Limiter). . . nevertheless systems based on trust are still manipulatable– tinyurl.com/6z6k6kr two fictitious Facebook userssuccessfully made friends with 95 users within two weeks.


tinyurl.com/6z6k6kr





. . . avoid manipulation by only using rating of trusted users orbuild trust based on user behaviour (c.f. the Influence Limiter)

. . . nevertheless systems based on trust are still manipulatable– tinyurl.com/6z6k6kr two fictitious Facebook userssuccessfully made friends with 95 users within two weeks.


tinyurl.com/6z6k6kr





. . . avoid manipulation by only using rating of trusted users orbuild trust based on user behaviour (c.f. the Influence Limiter). . . nevertheless systems based on trust are still manipulatable– tinyurl.com/6z6k6kr two fictitious Facebook userssuccessfully made friends with 95 users within two weeks.


tinyurl.com/6z6k6kr




Privacy is an important security concern from thepoint-of-view of the end-user

In Cheng and Hurley (2009b), we demonstrate that a systemarchitecture to support privacy can provide new opportunitiesfor manipulation attacks.Differential privacy has been studied in the context ofrecommender systems by a number of researchers. Oneapproach to differential privacy is the use of robust statistics.The connection to manipulation resistance may be worthpursuing.






In Cheng and Hurley (2009b), we demonstrate that a systemarchitecture to support privacy can provide new opportunitiesfor manipulation attacks.

Differential privacy has been studied in the context ofrecommender systems by a number of researchers. Oneapproach to differential privacy is the use of robust statistics.The connection to manipulation resistance may be worthpursuing.






In Cheng and Hurley (2009b), we demonstrate that a systemarchitecture to support privacy can provide new opportunitiesfor manipulation attacks.Differential privacy has been studied in the context ofrecommender systems by a number of researchers. Oneapproach to differential privacy is the use of robust statistics.The connection to manipulation resistance may be worthpursuing.




More complicated shilling scenarios

We have focused in this tutorial on rating manipulation – thecreation of sybil profiles and ratings that distort arecommendation system’s output.

However, in the real world, shilling can have a morecomplicated form.

e.g. The text of hotel reviews can be used to persuade users toselect certain hotels.Wu et al. (2010) has carried out work on identifying suspiciousreviews in TripAdvisor.




Conclusion

We’ve reviewed research that has been carried out in lastnumber of years on robustness of RS.The conclusions are quite positive from system managers’points-of-view:

If desired, recommendation algorithms that are largelymanipulation-resistant may be adopted.Filtering strategies can effectively find unusual rating patterns.Obfuscating attack profiles to avoid filtering generally resultsin less effective attacks.

Good recommendation systems are personalised and hence,should be sensitive to the peculiarities of each user’s ratingbehaviour.

A system designer needs to find the right balance betweensensitivity to the melting pot of human behaviour andavoiding easy manipulation.




Thank You

My research is sponsored by Science Foundation Ireland undergrant 08/SRC/I1407: Clique: Graph and Network Analysis Cluster




References I

Adomavicius, G. and Zhang, J.: 2010, On the stability ofrecommendation algorithms, Proceedings of the fourth ACMconference on Recommender systems, RecSys ’10, ACM, NewYork, NY, USA, pp. 47–54.URL: http://doi.acm.org/10.1145/1864708.1864722

Amatriain, X., Pujol, J. M. and Oliver, N.: 2009, I like it . . . I likeit not: Evaluating user ratings noise in recommender systems,Proceedings of the 17th International Conference on UserModeling, Adaptation, and Personalization: formerly UM andAH, UMAP ’09, Springer-Verlag, Berlin, Heidelberg,pp. 247–258.URL: http://dx.doi.org/10.1007/978-3-642-02247-0 24




References II

Burke, R., O’Mahony, M. P. and Hurley, N. J.: 2011, Robustcollaborative recommendation, in F. Ricci, L. Rokach, B. Shapiraand P. B. Kantor (eds), Recommender Systems Handbook,Springer, pp. 805–835.

Carrara, E. and Hogben, G.: 2007, Enisa position paper no.2,reputation-based systems: a security analysis, Technical report,ENISA.

Cheng, Z. and Hurley, N.: 2009a, Effective diverse and obfuscatedattacks on model-based recommender systems, in L. D.Bergman, A. Tuzhilin, R. D. Burke, A. Felfernig andL. Schmidt-Thieme (eds), RecSys, ACM, pp. 141–148.




References III

Cheng, Z. and Hurley, N.: 2009b, Trading robustness for privacy indecentralized recommender systems, in K. Z. Haigh andN. Rychtyckyj (eds), IAAI, AAAI.

Cheng, Z. and Hurley, N.: 2010, Robust collaborative filtering byleast trimmed squares matrix factorisation, Proceedings of theInternational Conference on Tools in Artificial Intelligence.

Donath, J.: 1998, Communities in Cyberspace, Routledge, chapterIdentity and Deception in the Virtual Community.

Douceur, J.: 2002, The sybil attack, Proceedings of the FirstInternational Workshop on Peer-to-Peer Systems.

Lam, S. K. and Riedl, J.: 2004, Shilling recommender systems forfun and profit, In Proceedings of the 13th International WorldWide Web Conference pp. 393–402.




References IV

Lang, J., Spear, M. and Wu, S. F.: 2010, Social manipulation ofonline recommender systems, Proceedings of the Secondinternational conference on Social informatics, SocInfo’10,Springer-Verlag, Berlin, Heidelberg, pp. 125–139.URL: http://dl.acm.org/citation.cfm?id=1929326.1929336

Mehta, B., Hofmann, T. and Fankhauser, P.: 2007, Lies andpropaganda: Detecting spam users in collaborative filtering,Proceedings of the 12th international conference on Intelligentuser interfaces, pp. 14–21.

Mehta, B. and Nejdl, W.: 2008, Attack resistant collaborativefiltering, Proceedings of the 31st annual international ACMSIGIR conference on Research and development in informationretrieval, SIGIR ’08, ACM, New York, NY, USA, pp. 75–82.URL: http://doi.acm.org/10.1145/1390334.1390350




References V

Mobasher, B., Burke, R., Bhaumik, R. and Williams, C.: 2007,Toward trustworthy recommender systems: An analysis of attackmodels and algorithm robustness, ACM Transactions on InternetTechnology 7(4).

Mobasher, B., Burke, R. D. and Sandvig, J. J.: 2006, Model-basedcollaborative filtering as a defense against profile injectionattacks, AAAI, AAAI Press.

O’Mahony, M. P., Hurley, N. J. and Silvestre, C. C. M.: 2004, Anevaluation of neighbourhood formation on the performance ofcollaborative filtering, Artificial Intelligence Review21(1), 215–228.




References VI

O’Mahony, M. P., Hurley, N. J. and Silvestre, G. C. M.: 2002,Promoting recommendations: An attack on collaborativefiltering, in A. Hameurlain, R. Cicchetti and R. Traunmuller(eds), DEXA, Vol. 2453 of Lecture Notes in Computer Science,Springer, pp. 494–503.

O’Mahony, M. P., Hurley, N. J. and Silvestre, G. C. M.: 2006,Attacking recommender systems: The cost of promotion,Workshop on Recommender Systems at the 17th EuropeanConference on Artificial Intelligence (ECAI’06), 28th–29th, Rivadel Garda, Italy.




References VII

Resnick, P. and Sami, R.: 2007, The influence limiter: provablymanipulation-resistant recommender systems, RecSys ’07:Proceedings of the 2007 ACM conference on Recommendersystems, ACM, New York, NY, USA, pp. 25–32.

Vu, L.-H., Papaioannou, T. and Aberer, K.: 2010, Impact of trustmanagement and information sharing to adversarial cost inranking systems, in M. Nishigaki, A. Jsang, Y. Murayama andS. Marsh (eds), Trust Management IV, Vol. 321 of IFIPAdvances in Information and Communication Technology,Springer Boston, pp. 108–124.




References VIII

Williams, C., Mobasher, B., Burke, R., Bhaumik, R. and Sandvig,J.: 2006, Detection of obfuscated attacks in collaborativerecommender systems, In Proceedings of the 17th EuropeanConference on Artificial Intelligence (ECAI’06) .

Wu, G., Greene, D. and Cunningham, P.: 2010, Merging multiplecriteria to identify suspicious reviews, in X. Amatriain,M. Torrens, P. Resnick and M. Zanker (eds), RecSys, ACM,pp. 241–244.

Yan, X. and Roy, B. V.: 2009, Manipulation robustness ofcollaborative filtering systems, CoRR abs/0903.0064.