A Privacy-Preserving Framework for Personalized Social Recommendations

A Privacy-Preserving Framework for Personalized

Social Recommendations

Zach Jorgensen1 and Ting Yu1,2

1 NC State University Raleigh, NC, USA

2 Qatar Computing Research Institute Doha, Qatar

EDBT March 24-28, 2014Athens, Greece

Motivation

• Social recommendation task – to predict items a user might like based on the items his/her friends like

i2 i3 i4 i5

SocialRecommendati

onSystem

Social Relations

Item Preferences

recommendations

Motivation

Model: Top-n Social RecommenderFor every item i For every user u

Compute μ(i, u)For every user u Sort items by utility Recommend top n items

Input• Items• Users• Social Graph• Preference Graph• # of recs, n

The utility of recommending item i to user u

OutputA personalized list of top n items (by utility), for each user

Motivation = utility of recommending item i to user u

u, i ∑𝑣 ∈U sers

¿ (𝑢 ,𝑣 ) ∙𝑤 (𝑣 , 𝑖) 𝜇 (𝑖 ,𝑢) ∈ℝ ≥0

¿SocialGraph

Social similaritymeasure

ℝ ≥0

s ℑ (𝑢 ,𝑣 )=¿𝑛𝑏𝑟𝑠 (𝑢) ∩𝑛𝑏𝑟𝑠 ( 𝑣 )∨¿

e.g., Common Neighbors

1 if pref. exists0 otherwise

Motivation

• Many existing structural similarity measures could be used [Survey: Lu & Zhou, 2011]

• We considered– Common Neighbors– Adamic-Adar– Graph Distance– Katz

Motivation

Two main privacy problems:1. Protect privacy of user data from

malicious service provider (i.e., the recommender)

2. Protect privacy of user data from malicious/curious users

Our focus: preventing disclosure of individual item preferences through the output

Motivation

Simple attack on Common Neighbors.

Bob listens to Bieber!

Bob Alice

Motivation

• Knowledge of all preferences except target edge

• Observes all recommendations

• Knowledge of the algorithm

Goal: to deduce the presence/absence of a single preference edge (the target edge)

Adversary

Motivation

Differential Privacy [Dwork, 2006]• Provides strong, formal privacy

guarantees• Informally: guarantees that

recommendations will be (almost) the same with/without any one preference edge in the input

Motivation

• Task: For each node, recommend node with highest social similarity (Common Neighbors, Katz).

• No distinction between user/items or between preferences/social edges.

• Negative theoretical results.

Related work: Machanavajjhala et al. (VLDB 2011)

Motivation

• We assume that social graph is public

• Often true in practice…

Motivation

• Main Contribution: a framework that enables differential privacy guarantees for preference edges

• Demonstrate on real data sets that making accurate and private social recommendation is feasible

Outline

• Motivation• Differential Privacy• Our Approach• Experimental Results• Conclusions

Differential Privacy

A randomized algorithm A gives ε-differential privacy if for any neighboring data sets D, D’ and any :

Neighboring data sets differ ina single record

𝐷 ′X1

[Dwork, 2006.]

Achieving Differential Privacy

𝐴 : 𝐷𝑛 →𝑅𝑑

noised

Global sensitivity of A: 1

Theorem: satisfies ε-differential privacy

typically Smaller ε = more noise/privacy

Properties of Differential Privacy

• Sequential Composition

𝐴 (𝐷 ) ,𝜀1 𝐴 (𝐷 )+𝐿𝑎𝑝 (∆𝐴/𝜀1)

𝐴 (𝐷 ) ,𝜀𝑛

... ...

𝐴 (𝐷 )+𝐿𝑎𝑝 (∆𝐴/𝜀𝑛) ∑ 𝜀𝑖-differential privacyD

DP Interface

• Parallel Composition

𝐴 (𝐷1 ) ,𝜀

𝐴 (𝐷𝑛) ,𝜀𝐷𝑖

𝐷𝑛

𝐴 (𝐷1 )+𝐿𝑎𝑝(∆ 𝐴/𝜀)

𝐴 (𝐷𝑛)+𝐿𝑎𝑝(∆𝐴/𝜀)ε-differentially private

... ...

Outline

• Motivation• Differential Privacy• Our Approach

– Simplifying observations– Naïve Approaches– Our Approach

• Experimental Results• Conclusions

Simplifying Observations

For every item i For every user u

Compute μ(i, u)For every user u Sort items by utility Recommend top n items

Iterations use disjoint

inputs

Post-processing

Our focus: an ε-differentially private procedure for computing μ(i, u), for all

users u and a given i

Naïve Approaches

Approach 1: Noise-on-Utilities

For each item i For every user u

Compute For each user u Sort items by utility Recommend top n items

∆=max𝑢∑𝑣

𝑠𝑖𝑚(𝑣 ,𝑢) Satisfies ε-differential privacy, but…destroys accuracy!

Naïve Approaches

Approach 2: Noise-on-Edges1. Add Laplace noise independently

to each edge, 2. Run the non-private algorithm with

the resulting sanitized preference graph

Example: let

Noise will destroy accuracy!

Our Approach

𝐺𝑖

ClusterEdges

Strategy S

u6u7u8

For now, assume S randomly assigns edges to clusters

Our Approach

+ noise

noise + noise

𝐺𝑖

For each cluster, compute noisy average weight

u6u7u8

Our Approach

i𝑐1

𝑐2𝑐2

𝐺𝑖

+ noise

noise + noise

Replace edge weights w/ noisy

average of respective cluster

u6u7u8

Our Approach

i𝑐1

𝑐2𝑐2

𝐺𝑖

For every item i For each user u

Compute μ(i, u)For each user u Sort items by utility Recommend top n items

u6u7u8

• Adding/removing a single preference edge affects one cluster average by at most 1/|ci|

• Noise added to average for cluster is • The bigger the cluster, the smaller the noise

Example: let ε = 0.1, |c| = 50 edges

Intuition: the bigger the cluster, the less sensitive its average weight is to any one

preference edge

Our Approach: Rationale

• The catch – averaging introduces approximation error!

• Need a better clustering strategy that will keep approx. error relatively low

• Strategy must not leak privacy.

Our Approach: Clustering Strategy

Community

Detection

u6u7u8

Social Graphu1

u2u4u5

Cluster the users based on the natural community structure of the public social graph.

CommunityDetection

u6u7u8

Social Graphc0

(𝑢1 , 𝑖)(𝑢3 , 𝑖)

(𝑢7 , 𝑖)(𝑢6 ,𝑖)

(𝑢2 , 𝑖)

(𝑢8 , 𝑖)

(𝑢4 ,𝑖)(𝑢5 , 𝑖)

For each item, derive clusters for preference edges based on the user clusters

Community

Detection

u6u7u8

Social Graphc0

(𝑢1 , 𝑖)(𝑢3 , 𝑖)

(𝑢7 , 𝑖)(𝑢6 ,𝑖)

(𝑢2 , 𝑖)

(𝑢8 , 𝑖)

(𝑢4 ,𝑖)(𝑢5 , 𝑖)

Note: we only need to cluster the social graph once; resulting clusters used for all items

Community

Detection

u6u7u8

Social Graphc0

(𝑢1 , 𝑖)(𝑢3 , 𝑖)

(𝑢7 , 𝑖)(𝑢6 ,𝑖)

(𝑢2 , 𝑖)

(𝑢8 , 𝑖)

(𝑢4 ,𝑖)(𝑢5 , 𝑖)

Key point: clustering based on the publicsocial graph does not leak privacy!

• Louvain Method [Blondel et al. 2008]– Greedy modularity maximization– Well-studied and known to produce

good communities– Fast enough for graphs with millions of

nodes– No parameters to tune

Outline

• Motivation• Preliminaries• Our Approach• Experimental Results• Conclusions

Data Sets

• 1,892 users• 17,632 items• Avg. user deg. = 13.4

(std. 17.3)• Avg. prefs per user =

48.7 (std. 6.9)

• 137,372 users• 48,756 items• Avg. user deg. = 18.5

(std. 31.1)• Avg. prefs per user =

54.8 (std. 218.2)

Publicly available:Last.fm <http://ir.ii.uam.es/hetrec2011/datasets>Flixster <http://www.sfu.ca/~sja25/datasets>

Measuring Accuracy

• Normalized Discounted Cumulative Gain [Järvelin and Kekäläinen. 2002]

• NDCG at n – measures quality of the private recommendations relative to non-private recommendations, taking rank and utility into account

• Ranges from 0.0 to 1.0, with 1.0 meaning private recommender achieves ideal ranking

• Average over all users in data set

Experiments: Last.fmAvg. Accuracy (NDCG at n=50) vs. Privacy

PrivacyLow High

Experiments: FlixsterAvg. NDCG at 50; 10,000 random users

PrivacyLow HighNote: different y-axis scale

Experiments: Naïve Approaches

Katz Common Graph Adamic-Adar Nbrs. Dist.

𝜀=1.0 𝜀=0.1

• Naïve approaches on Last.fm data set

Conclusions

• Differential privacy guarantees for item preferences

• Use clustering and averaging to trade Laplace noise for some approx. error

• Clustering via the community structure of the social graph is a useful heuristic for clustering the edges without violating privacy

• Personalized social recommendations can be both private and accurate

THANK YOU!

BACKUP SLIDES

Accuracy Metric: NDCG

• Normalized Discounted Cumulative Gain– items recommended to user u by private

recommender; sorted by noisy utility– items recommended to user u by non-

private recommender; sorted by true utility

– NDCG ranges from 0…1– Averaged over all users in a data set

Social Similarity Measures

• Adamic-Adar

• Graph Distance

• Katz

𝑠𝑖𝑚 (𝑢 ,𝑣 )= ∑𝑥∈ commonneighbors

1log ¿𝑛𝑏𝑟𝑠 (𝑥 )∨¿

𝑠𝑖𝑚 (𝑢 ,𝑣 )=1/ ShortestPathLength(u , v )

𝑠𝑖𝑚 (𝑢 ,𝑣 )=∑𝑙=1

𝛼𝑙 ∙| h𝑝𝑎𝑡 𝑠𝑢 , 𝑣𝑙 |

paths of length l between u,v

small dam-ping factor

Experiments: Last.fm

NDCG at 10 NDCG at 100

Experiments: Flixster

NDCG at 10 NDCG at 100

Comparison of approaches on Last.fm data set.

Low Rank Mechanism (LRM) – Yuan et al. PVLDB’12Group and Smooth (GS) – Kellaris & Papadopoulos. PVLDB’13

Relationship between user degree and accuracy, due to approx. error (Common Neighbors).

A Privacy-Preserving Framework for Personalized Social Recommendations

Documents

Privacy-Preserving Data Mining - users.cis.fiu.eduusers.cis.fiu.edu/~lpeng/Privacy/Privacy-preserving data mining.pdf · [Cra99b] [AC99] [LM99] [LEW99]). Paper Organization We discuss

Privacy-Preserving Transaction Escrow

Privacy-Preserving RFID Systems: Model and Constructions · her. This leads to the notion of “privacy-preserving” authentication. Several models for privacy preserving RFID authentication

Personalized Privacy Preservation

PRIVACY PRESERVING MACHINE LEARNING

Privacy-Preserving Distributed Mining of Association Rules ...gkmc.utah.edu/7910F/papers/IEEE TKDE privacy preserving associati… · Previous work in privacy-preserving data mining

Privacy Preserving P2P technique

Privacy Preserving Biometric Authentication

Accuracy Constrained Privacy Preserving

Personalized Privacy-Preserving Social …jundongl/paper/AAAI18_PrivSR.pdfallowed to hide their sensitive private data using privacy set-tings, the data being shared can still be abused

Privacy Preserving In LBS

Data Publishing Techniques and Privacy Preserving - Infonomics … · 2020-01-15 · Healthcare data with privacy, maintaining privacy of healthcare big-data, privacy preserving cloud

Privacy Preserving Fingerprint Identiﬁcation

privacy preserving collaborative

Privacy-Preserving Matrix Factorization

Privacy-Preserving Schema Reuse

Privacy-Preserving Data Mining - Yale University · •Policies for privacy-preserving data mining: languages, reconciliation, and enforcement. •Incentive-compatible privacy-preserving

Privacy-Preserving Location Services

P2E: Privacy-preserving and Effective Cloud Data Sharing Servicepersonal.stevens.edu/~ychen6/papers/Privacy-preserving... · 2014-10-21 · P2E: Privacy-preserving and Effective Cloud

A privacy preserving-location_monitoring_system_for_wireless_sensor_networks