View
37
Download
0
Category
Tags:
Preview:
DESCRIPTION
A Privacy-Preserving Framework for Personalized Social Recommendations Zach Jorgensen 1 and Ting Yu 1,2. 1 NC State University Raleigh, NC, USA. 2 Qatar Computing Research Institute Doha, Qatar. EDBT March 24-28, 2014 Athens, Greece. Motivation. - PowerPoint PPT Presentation
Citation preview
A Privacy-Preserving Framework for Personalized
Social Recommendations
Zach Jorgensen1 and Ting Yu1,2
1 NC State University Raleigh, NC, USA
2 Qatar Computing Research Institute Doha, Qatar
EDBT March 24-28, 2014Athens, Greece
Motivation
• Social recommendation task – to predict items a user might like based on the items his/her friends like
i2 i3 i4 i5
SocialRecommendati
onSystem
i1
Social Relations
Item Preferences
recommendations
Motivation
Model: Top-n Social RecommenderFor every item i For every user u
Compute μ(i, u)For every user u Sort items by utility Recommend top n items
Input• Items• Users• Social Graph• Preference Graph• # of recs, n
The utility of recommending item i to user u
OutputA personalized list of top n items (by utility), for each user
Motivation = utility of recommending item i to user u
u, i ∑𝑣 ∈U sers
¿ (𝑢 ,𝑣 ) ∙𝑤 (𝑣 , 𝑖) 𝜇 (𝑖 ,𝑢) ∈ℝ ≥0
¿SocialGraph
Social similaritymeasure
ℝ ≥0
μ
s ℑ (𝑢 ,𝑣 )=¿𝑛𝑏𝑟𝑠 (𝑢) ∩𝑛𝑏𝑟𝑠 ( 𝑣 )∨¿
u, v
e.g., Common Neighbors
1 if pref. exists0 otherwise
Motivation
• Many existing structural similarity measures could be used [Survey: Lu & Zhou, 2011]
• We considered– Common Neighbors– Adamic-Adar– Graph Distance– Katz
Motivation
Two main privacy problems:1. Protect privacy of user data from
malicious service provider (i.e., the recommender)
2. Protect privacy of user data from malicious/curious users
Our focus: preventing disclosure of individual item preferences through the output
Motivation
Simple attack on Common Neighbors.
Bob listens to Bieber!
Bob Alice
Motivation
• Knowledge of all preferences except target edge
• Observes all recommendations
• Knowledge of the algorithm
Goal: to deduce the presence/absence of a single preference edge (the target edge)
Adversary
Motivation
Differential Privacy [Dwork, 2006]• Provides strong, formal privacy
guarantees• Informally: guarantees that
recommendations will be (almost) the same with/without any one preference edge in the input
Motivation
• Task: For each node, recommend node with highest social similarity (Common Neighbors, Katz).
• No distinction between user/items or between preferences/social edges.
• Negative theoretical results.
Related work: Machanavajjhala et al. (VLDB 2011)
Motivation
• We assume that social graph is public
• Often true in practice…
…
Motivation
• Main Contribution: a framework that enables differential privacy guarantees for preference edges
• Demonstrate on real data sets that making accurate and private social recommendation is feasible
Outline
• Motivation• Differential Privacy• Our Approach• Experimental Results• Conclusions
Differential Privacy
A randomized algorithm A gives ε-differential privacy if for any neighboring data sets D, D’ and any :
X1
…Xi
…Xn
𝐷
Neighboring data sets differ ina single record
𝐷 ′X1
…Xi
…Xn
[Dwork, 2006.]
Achieving Differential Privacy
X1
…Xi
…Xn
𝐷
𝐴 : 𝐷𝑛 →𝑅𝑑
noised
Global sensitivity of A: 1
Theorem: satisfies ε-differential privacy
typically Smaller ε = more noise/privacy
Properties of Differential Privacy
• Sequential Composition
𝐴 (𝐷 ) ,𝜀1 𝐴 (𝐷 )+𝐿𝑎𝑝 (∆𝐴/𝜀1)
𝐴 (𝐷 ) ,𝜀𝑛
... ...
𝐴 (𝐷 )+𝐿𝑎𝑝 (∆𝐴/𝜀𝑛) ∑ 𝜀𝑖-differential privacyD
DP Interface
• Parallel Composition
𝐴 (𝐷1 ) ,𝜀
𝐴 (𝐷𝑛) ,𝜀𝐷𝑖
𝐷1
𝐷𝑛
𝐴 (𝐷1 )+𝐿𝑎𝑝(∆ 𝐴/𝜀)
𝐴 (𝐷𝑛)+𝐿𝑎𝑝(∆𝐴/𝜀)ε-differentially private
... ...
Outline
• Motivation• Differential Privacy• Our Approach
– Simplifying observations– Naïve Approaches– Our Approach
• Experimental Results• Conclusions
Simplifying Observations
For every item i For every user u
Compute μ(i, u)For every user u Sort items by utility Recommend top n items
Iterations use disjoint
inputs
Post-processing
Our focus: an ε-differentially private procedure for computing μ(i, u), for all
users u and a given i
Naïve Approaches
Approach 1: Noise-on-Utilities
For each item i For every user u
Compute For each user u Sort items by utility Recommend top n items
∆=max𝑢∑𝑣
𝑠𝑖𝑚(𝑣 ,𝑢) Satisfies ε-differential privacy, but…destroys accuracy!
Naïve Approaches
Approach 2: Noise-on-Edges1. Add Laplace noise independently
to each edge, 2. Run the non-private algorithm with
the resulting sanitized preference graph
Example: let
Noise will destroy accuracy!
Our Approach
ic1
c2
c3
0
0
01
1 1
11
𝐺𝑖
ClusterEdges
Strategy S
u1
u2
u4
u5
u3
u6u7u8
For now, assume S randomly assigns edges to clusters
Our Approach
ic1
c2
c3
+ noise
noise + noise
0
0
01
1 1
11
𝐺𝑖
For each cluster, compute noisy average weight
u1
u2
u4
u5
u3
u6u7u8
Our Approach
i𝑐1
𝑐1
𝑐1
𝑐2𝑐2
𝑐3
𝑐3
𝑐3
𝐺𝑖
c1
c2
c3
+ noise
noise + noise
Replace edge weights w/ noisy
average of respective cluster
u1
u2
u4
u5
u3
u6u7u8
Our Approach
i𝑐1
𝑐1
𝑐1
𝑐2𝑐2
𝑐3
𝑐3
𝑐3
𝐺𝑖
For every item i For each user u
Compute μ(i, u)For each user u Sort items by utility Recommend top n items
u1
u2
u4
u5
u3
u6u7u8
• Adding/removing a single preference edge affects one cluster average by at most 1/|ci|
• Noise added to average for cluster is • The bigger the cluster, the smaller the noise
Example: let ε = 0.1, |c| = 50 edges
Intuition: the bigger the cluster, the less sensitive its average weight is to any one
preference edge
Our Approach: Rationale
Our Approach: Rationale
• The catch – averaging introduces approximation error!
• Need a better clustering strategy that will keep approx. error relatively low
• Strategy must not leak privacy.
Our Approach: Clustering Strategy
Community
Detection
u1
u2
u4
u5
u3
u6u7u8
Social Graphu1
u3
u6u7
u2u4u5
u8
c0
c1
Cluster the users based on the natural community structure of the public social graph.
Our Approach: Clustering Strategy
CommunityDetection
u1
u2
u4
u5
u3
u6u7u8
Social Graphc0
c1
(𝑢1 , 𝑖)(𝑢3 , 𝑖)
(𝑢7 , 𝑖)(𝑢6 ,𝑖)
(𝑢2 , 𝑖)
(𝑢8 , 𝑖)
(𝑢4 ,𝑖)(𝑢5 , 𝑖)
For each item, derive clusters for preference edges based on the user clusters
Our Approach: Clustering Strategy
Community
Detection
u1
u2
u4
u5
u3
u6u7u8
Social Graphc0
c1
(𝑢1 , 𝑖)(𝑢3 , 𝑖)
(𝑢7 , 𝑖)(𝑢6 ,𝑖)
(𝑢2 , 𝑖)
(𝑢8 , 𝑖)
(𝑢4 ,𝑖)(𝑢5 , 𝑖)
Note: we only need to cluster the social graph once; resulting clusters used for all items
Our Approach: Clustering Strategy
Community
Detection
u1
u2
u4
u5
u3
u6u7u8
Social Graphc0
c1
(𝑢1 , 𝑖)(𝑢3 , 𝑖)
(𝑢7 , 𝑖)(𝑢6 ,𝑖)
(𝑢2 , 𝑖)
(𝑢8 , 𝑖)
(𝑢4 ,𝑖)(𝑢5 , 𝑖)
Key point: clustering based on the publicsocial graph does not leak privacy!
Our Approach: Clustering Strategy
• Louvain Method [Blondel et al. 2008]– Greedy modularity maximization– Well-studied and known to produce
good communities– Fast enough for graphs with millions of
nodes– No parameters to tune
Outline
• Motivation• Preliminaries• Our Approach• Experimental Results• Conclusions
Data Sets
• 1,892 users• 17,632 items• Avg. user deg. = 13.4
(std. 17.3)• Avg. prefs per user =
48.7 (std. 6.9)
• 137,372 users• 48,756 items• Avg. user deg. = 18.5
(std. 31.1)• Avg. prefs per user =
54.8 (std. 218.2)
Publicly available:Last.fm <http://ir.ii.uam.es/hetrec2011/datasets>Flixster <http://www.sfu.ca/~sja25/datasets>
Measuring Accuracy
• Normalized Discounted Cumulative Gain [Järvelin and Kekäläinen. 2002]
• NDCG at n – measures quality of the private recommendations relative to non-private recommendations, taking rank and utility into account
• Ranges from 0.0 to 1.0, with 1.0 meaning private recommender achieves ideal ranking
• Average over all users in data set
Experiments: Last.fmAvg. Accuracy (NDCG at n=50) vs. Privacy
Acc
ura
cy
PrivacyLow High
Experiments: FlixsterAvg. NDCG at 50; 10,000 random users
Acc
ura
cy
PrivacyLow HighNote: different y-axis scale
Experiments: Naïve Approaches
Katz Common Graph Adamic-Adar Nbrs. Dist.
Katz Common Graph Adamic-Adar Nbrs. Dist.
𝜀=1.0 𝜀=0.1
• Naïve approaches on Last.fm data set
Conclusions
• Differential privacy guarantees for item preferences
• Use clustering and averaging to trade Laplace noise for some approx. error
• Clustering via the community structure of the social graph is a useful heuristic for clustering the edges without violating privacy
• Personalized social recommendations can be both private and accurate
THANK YOU!
BACKUP SLIDES
Accuracy Metric: NDCG
• Normalized Discounted Cumulative Gain– items recommended to user u by private
recommender; sorted by noisy utility– items recommended to user u by non-
private recommender; sorted by true utility
– NDCG ranges from 0…1– Averaged over all users in a data set
Social Similarity Measures
• Adamic-Adar
• Graph Distance
• Katz
𝑠𝑖𝑚 (𝑢 ,𝑣 )= ∑𝑥∈ commonneighbors
1log ¿𝑛𝑏𝑟𝑠 (𝑥 )∨¿
¿
𝑠𝑖𝑚 (𝑢 ,𝑣 )=1/ ShortestPathLength(u , v )
𝑠𝑖𝑚 (𝑢 ,𝑣 )=∑𝑙=1
𝑘
𝛼𝑙 ∙| h𝑝𝑎𝑡 𝑠𝑢 , 𝑣𝑙 |
paths of length l between u,v
small dam-ping factor
Experiments: Last.fm
NDCG at 10 NDCG at 100
Experiments: Flixster
NDCG at 10 NDCG at 100
Comparison of approaches on Last.fm data set.
Low Rank Mechanism (LRM) – Yuan et al. PVLDB’12Group and Smooth (GS) – Kellaris & Papadopoulos. PVLDB’13
Relationship between user degree and accuracy, due to approx. error (Common Neighbors).
Recommended