Privacy preserving hybrid recommender system based on deep

Turk J Elec Eng & Comp Sci(2021) 29: 2385 – 2402© TÜBİTAKdoi:10.3906/elk-2010-40

Turkish Journal of Electrical Engineering & Computer Sciences

http :// journa l s . tub i tak .gov . t r/e lektr ik/

Research Article

Privacy preserving hybrid recommender system based on deep learning

Sangeetha SELVARAJ1,∗, Sudha Sadasivam GANGADHARAN21Department of Information Technology, PSG College of Technology, Tamil Nadu, India

2Department of Computer Science and Engineering, PSG College of Technology, Tamil Nadu, India

Received: 08.10.2020 • Accepted/Published Online: 19.04.2021 • Final Version: 23.09.2021

Abstract: Deep learning models are widely being used to provide relevant recommendations in hybrid recommendersystems. These hybrid systems combine the advantages of both content based and collaborative filtering approaches.However, these learning systems hamper the user privacy and disclose sensitive information. This paper proposesa privacy preserving deep learning based hybrid recommender system. In hybrid deep neural network, user’s sideinformation such as age, location, occupation, zip code along with user rating is embedded and provided as input. Theseembedding’s pose a severe threat to individual privacy. In order to eliminate this breach of privacy, we have proposed aprivate embedding scheme that protects user privacy while ensuring that the nonlinear latent factors are also learnt. Inthis paper, we address the privacy in hybrid system using differential privacy, a rigorous mathematical privacy mechanismin statistical and machine learning systems. In the reduced feature set, the proposed adaptive perturbation mechanismis used to achieve higher accuracy. The performance is evaluated using three datasets with root mean square error(RMSE), mean absolute error (MAE), mean squared error (MSE), R squared, precision and recall. These evaluationmetrics are compared with varying values of privacy parameter ϵ . The experimental results show that the proposedsolution provides high user privacy with reasonable accuracy than the existing system. As the engine is generic, it canbe used on any recommendation framework.

Key words: Differential privacy, adaptive perturbation, private hybrid recommender, embedding perturbation, deepneural network, laplace noise, randomized response

1. IntroductionRecommendation systems facilitate users to choose from a wide range of items by providing suggestions onrelevant items based on the user’s interests. Furthermore, recommendations are useful to people as it helpsthem to choose from a variety of items available with the service provider. As they provide relevant suggestionsto customers, these systems are crucial in ecommerce based industries. Hence, state of art research focuses ondesigning and implementing optimized algorithms to provide personalized user recommendations. Adding moreinformation about the user, results in good recommendations. But the results are generated at the cost of user’sprivacy. Hence, the objective of the research is to propose an optimal recommendation algorithm that protectsthe privacy of individual and to provide relevant recommendations.

Recommender systems are broadly classified into three categories: content based, collaborative andhybrid method. Content based mechanism uses both the user profile and the product information to offerrecommendations. The term ’content’ signifies the attribute of the product that is liked by the user, which canbe obtained from the tags, keywords, or side information associated with a product. Collaborative mechanism∗Correspondence: [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.2385

https://orcid.org/0000-0003-2513-5502

https://orcid.org/0000-0001-6845-1869

SELVARAJ and GANGADHARAN/Turk J Elec Eng & Comp Sci

uses information from user profiles to calculate similarities between users or similarity between items and providerecommendations based on it. Matrix factorization, which is a type of collaborative filtering, constructs latentfeatures for users and items. These features are learnt from the past ratings provided by the user. Based onthe learned features, predictions for unrated products can be made and the product with topmost predictionrate is recommended to the user. Hybrid method is a combination of content based and collaborative filteringtechniques. To improve efficiency without losing the advantage of the above two mechanisms, a hybrid approachoffers greater synergy, as compared to the individual recommender system.

These traditional linear models can effectively memorize sparse feature interactions using cross productfeature transformation, whereas deep neural networks can generalize interactions through low dimensionalembedding to previously unseen features. These networks, therefore, have a clear understanding of the userand item, so it delivers exceptional results. Adoption of deep learning techniques in tabular or structured dataresulted in a huge improvement in performance. But this adoption also created few shortcomings. For example,deep neural networks make use of a huge amount of user data to make decisions, that apparently pose a threatto an individual’s privacy.

From a privacy perspective, recommendation systems are broadly classified into trusted recommendersystem and untrusted recommender system [1]. In the trusted recommender, the system is trusted by the user,and they send original raw data. In addition to this, a private recommendation algorithm is run by the trustedrecommender to produce the results. Whereas, in an untrusted recommender system, users are confined fromsending the original raw data and adds noise in the user rating. Further recommendations are made with usualnonprivate algorithms. Such trusted and untrusted recommender systems are called global differential privacy(GDP) and local differential privacy (LDP) respectively. In this paper the development of a privacy-preservingalgorithm for trusted recommender system is discussed.Major contributions of the paper are as follows:

1. Analyze and experimentally evaluate the differentially private hybrid deep neural network.2. Improve the performance using adaptive perturbation, which provides varying perturbation for outlier

features and common features.3. Avoid huge noise addition by employing dimensionality reduction techniques.4. Combine randomized response with Laplace noise addition for improved accuracy.5. Experimentally compare the recommendation results generated by the differentially private hybrid system

with other non-private baseline algorithms.The rest of this research paper is organized as follows:- Section 2 includes a literature review of the

existing privacy mechanisms. Section 3 outlines the background of differential privacy and the nonprivate deepneural network architecture used in the paper. Section 4 details the proposed private deep learning algorithmalong with the theoretical proof for utility and privacy. Section 5 summarizes the experimental results withthree datasets namely Movielens100K, Book-Crossing, FilmTrust and result comparison with other baselinenonprivate algorithms. Section 6 deals with the conclusion.

2. Related workA brief survey of the privacy preserving mechanisms such as anonymity, perturbation, and suppression basedmethods was reviewed by Sangeetha and Sudha Sadasivam [2]. A study by Zhang et al. [3] broadly classifiesprivacy preservation in collaborative filtering into secure multiparty communication, homomorphic encryption,and differential privacy in recommender system. Several researchers [4–6] use differential privacy in distributed

2386


multiparty computation.This paper focuses on introducing differential privacy in the DL model-based hybrid recommendation

system. Hence, the literature survey includes sections on privacy in recommenders, and privacy in DL systems.

2.1. Privacy in model and content based recommender system

Narayanan and Shmatikov [7] performed a statistical de-anonymization of large sparse datasets. The authorslaunched a deanonymization attack on the anonymized Netflix dataset. They proved that an adversary with littleinformation about the subscriber can easily provide the identity of a particular person in the database includingthe person’s entire movie watching history. The attack was demonstrated using the Internet Movie Database(IMDb) as the source of background knowledge. Even though the researchers used the movie dataset, the attackis generic and applicable to any recommendation system like healthcare, e-commerce, etc. Later researchersprovided solutions for deanonymization attacks in the recommender system and our literature review presentsan overview of solutions to the attack.

A novel application of differential privacy in recommendation was introduced by McSherry and Mironov[8]. The authors perturbed the rating matrix with differentially private noise addition and proved that itis feasible to design a private recommendation system without losing accuracy. The authors also concludedthat loss in accuracy decreases as more data becomes available. Hence, differential privacy based solutionsare more suitable for problems involving big data. In this scenario, recommendation system is a practical bigdata framework that is popular and used by industries like Amazon, Netflix, MovieLens, etc. Thus, a privacypreserving algorithm designed using differential privacy based approach offers realistic solution to the issue ofbig data privacy.

Friedman et al. [9] proposed a generic framework to apply differential privacy to matrix factorization.The privacy framework proposed by Friedman et al. categorizes privacy-preserving algorithms into inputperturbation, gradient perturbation, and output perturbation. In input perturbation, the rating matrix isperturbed with Laplace noise. While in the gradient perturbation two private algorithms based on alternatingleast square (ALS) and stochastic gradient descent (SGD) are proposed, in the output perturbation thenonprivate Matrix factorization algorithms are executed and the resulting latent factors are perturbed. Theauthors also compared the results and concluded that input perturbation produced better results.

It can be observed from the literature that the existing techniques cannot prevent the inference of usersfrom the output of neural network model and are orthogonal to the techniques discussed in this paper.

2.2. Privacy in deep learning

Shokri et al. [10] demonstrated that machine learning models leak the information about the individual dataon which the model is trained. Such attacks are called inference attacks. Shokri et al. trained the model withcommercial machine learning as a service and proved that the model is vulnerable to membership inferenceattacks. Fredrikson et al. [11] perform a model inversion attack that recovers images from the facial recognitionsystem. Fredrikson et al. demonstrated that the inversion attack only requires black-box access to the trainedmodel. Abadi et al. [12] proposed a deep learning-based privacy-preserving algorithm with differential privacy.The primary focus of the work is to design a private algorithm for the classification task. The differentiallyprivate noise is added to the nonconvex optimization technique called stochastic gradient descent to protectthe user from model inversion attack and inference attack. Though model inversion attack is demonstrated onimage classification, the same attack is possible in any trained model. These attacks indicate the need for robust

2387


model training. Hence our work protects users from such attacks on trained model through perturbation of theuser embedding’s. The differential privacy used in proposed private algorithm design provides a mathematicalguarantee against the attacks. Another benefit of our approach is that it takes advantage of offline perturbationand can converge faster than gradient perturbation.

Other than differential privacy, few cryptography based solutions are available to protect model inversionattack. Hence, crytography based solutions are surveyed further. CryptoNet [13] is a combination of homomor-phic encryption and neural network. The data are shared by secret encryption and predictions are obtainedfrom the pretrained neural network model. The core part of CryptoNet is that it is able to make such predictionsusing the encrypted data and returns the predictions in encrypted form. The encrypted classification resultsobtained, thus, can be decrypted only by the corresponding sender device and the predictions are used by thesender. Ma et al. [14] created a privacy-preserving ensemble-based classification algorithm for face recognitionbased on secret sharing and edge computing. The features are collaboratively learned from encrypted face im-ages using two edge servers. Ma et al. [15] proposed privacy-preserving long short term memory (LSTM) basedneural network for smart Internet of Things (IoT) devices. Secret sharing is used for secure communication ofvoice information from IoT devices. The features are extracted from encrypted audio information using edgedevices. These cryptography based mechanisms demand higher computational costs. Our proposed solutionavoids this overhead by using differentially private perturbations. Also, homomorphic encryption with machinelearning assumes that a pretrained model is available [13]. Hence, secret sharing is primarily used in the in-ference stage and it cannot be used for secured model training. Our work focuses on creating a private modelthrough private training. Differential privacy is more suitable for our problem approach. Usage of differentialprivacy during model training controls the amount of information leaked from an individual record in a dataset.

It should be highlighted that differential privacy is a powerful mechanism that protects the privacy ofusers in a dataset with a strong privacy guarantee. As inferred from the literature there are several worksusing differential privacy in recommendation systems using matrix factorization and classification tasks in deeplearning. Most of the privacy works in deep learning [12, 16, 18, 19] address the classification task. Butthe privacy model in classification setting cannot be directly used for recommender systems where each andevery user rating and presence or absence of rating is a threat to user privacy. Recently, deep learning-basedmechanisms are gaining popularity and are extensively used for improved accuracy in recommender system.As observed in most of the deep learning-based recommender systems [20–23], accuracy can be improved byleveraging user-item rating matrix and side information. Thus, this improved accuracy comes at the cost ofuser’s privacy, and our goal is to develop a novel deep learning-based privacy-preserving hybrid recommendationsystem. The proposed work is first of its kind with a private hybrid algorithm. Section 3 elaborates on thebackground of the work with the definition of differential privacy and explains the proposed nonprivate hybridrecommender.

3. Preliminaries3.1. Differential privacy

Differential privacy was originally proposed for privacy preserving statistical data release by DWork [24, 25]. Itprovides a mathematical guarantee for private data release. However, the original differential privacy was laterused in numerous other applications in industries [26–28] and academic research [8, 9, 29].Definition1: A randomized algorithm M satisfies ϵ - differential privacy if for any two neighboring databases

2388


D and D′ any measurable subset [25],

Pr[M(x) ∈ S] ≤ exp(ϵ)× Pr[M(y) ∈ S] + δ (1)

Where the probability is over randomness of ϵ . If δ = 0 we say that M is ϵ – differentially private. Theprivacy parameter ϵ controls privacy and accuracy tradeoff. The neighboring databases D and D

′ differ in onerecord in the rating matrix.Definition2: The Laplace distribution (centered at 0) with scale b is the distribution with probability densityfunction [25]:

Lap(x) =1

2bexp

(−|x|

b

)(2)

Definition3:(Sequential Composition[1])suppose a set of privacy mechanisms M = M1, .....Mm are se-quentially performed on a dataset, and each Mi provides, ϵi privacy guarantee, M will provide (

∑mi=1 ϵi) -

differential privacy. The set of randomized mechanisms are performed sequentially on a dataset, and the finalprivacy guarantee is determined by the summation of total privacy budgets. In our work, Algorithm 1 usesLaplace noise addition (Definition 2) along with sequential composition (Definition 3).

3.2. Deep learning-based hybrid recommender systemHybrid recommendation algorithms are more significant and produces outstanding results when compared tonon hybrid solutions. Such improvement is achieved with the combination of rating and side information usedin the algorithm. Two key issues in recommendation system are cold start problem and accuracy improvement.A cold start problem occurs when sufficient information about a user or item is not available. Recent hybridrecommendation algorithms based on deep learning addresses these issues and produces outstanding results.But these results comes at the cost of user privacy. The existing works does not address privacy in hybridalgorithms. Hence, we extend the recommender proposed by Kiran et al.[20]. Kiran et al. devised a novelhybrid deep learning-based recommender that uses side information and their primary focus was to improve theaccuracy. Further, we investigate and propose a privacy-preserving hybrid algorithm.

The deep neural network consists of multiple layers between the input and output layer [30]. Each layerconsists of multiple simple processing units called neurons. Neurons in each layer are connected with every otherneuron in the previous layer. Every connection is associated with a weight that is randomly initialized and it isimproved through multiple epochs. These weights are modified based on the optimization function. The inputuser id and item id are embedded and concatenated along with the side information (Table 1). Each hidden layercomputes a linear function which is input to LeakyReLU (rectified linear unit) function, followed by DropOut.LeakyReLU is a popular activation function used in the deep neural network that overcomes the ”dying ReLU”issue. The ReLU returns the value provided as input directly for positive input and returns a 0.0 for negativeinputs. The major drawback of ReLU is ”dying ReLU” which occurs when a set of nodes output an activationvalue of 0.0 forever in the training process. ”dying ReLU” is solved by LeakyReLU by permitting small negativevalues. Dropout is a regularization mechanism used in the neural network. During the neural network training,randomly chosen neurons are dropped out by this mechanism to avoid overfitting. The number of neurons ineach hidden layer and the activation function used in each layer are all hyperparameters that can be tuned. Eachhidden layer employs batch normalization. Batch normalization is used to normalize the values in the hiddenlayer which in turn ensures a faster convergence. The output from the previous activation layer is normalized

2389


by subtracting batch mean and dividing by batch standard deviation. The final layer is a fully-connected layerwith 1 node, which is input to the sigmoid activation function that predicts the ratings in the range 0–5.

In a deep hybrid algorithm, the traditional representation of user id and item id is replaced by embedding.Embedding identifies correlation among data and enables the deep learning model to extract more features fromuser and item id when compared to one-hot encoding. An embedding is a representation of categorical valueas a vector in N-dimensional space. Embeddings are lookup matrices of size K , where K is the number ofembeddings. For each user, an array of size Ku is returned, and the user embedding matrix collected from allthe users is denoted by Euser . For each item, an array of size KI is returned, and the item embedding matrixcollected from all the users is denoted by Eitem . Ku and KI are the hyperparameters and it is tuned bythe analyst. These hyperparameters used in embedding are more suitable for a scalable big data environment;whereas, one hot encoded matrices requires a column updation for every change in the item addition or removal.These user Euser and item Eitem features are similar to latent factors in collaborative filtering based matrixfactorization algorithms. But the latent factors in collaborative filtering only capture linear features; whereas,embeddings are capable of capturing both linear and nonlinear user and item factors.

As stated earlier, a hybrid recommender is a combination of content-based and collaborative filteringapproaches. In our hybrid algorithm, the embedded user Euser and item Eitem features are similar to thecollaborative filtering approach. Further content-based features are added to make the deep learning model ahybrid model. Table 1 indicates the side information of various datasets used in the hybrid model training.

Table 1. Side information about users and items in different datasets.

Dataset User or item Side information

MovieLens 100K Item Tag, title, genre of moviesUser Not Available

Film trust Item Not availableUser Trust ratings on the user on other users and vice versa

Book-crossing Item Year of publication. publisher, book titleUser Age, location, author name

In Section 4, the proposed differentially private algorithm is described. The section also elaborates theusage of differentially privacy on the deep learning architecture.

4. System overviewThis section presents the system overview and the proposed Global differentially private trusted recommenderalgorithm.

The global differential private algorithm performs Laplace noise addition along with sequential composi-tion. In this setting, the recommender system is trusted, and the user sends original rating information to theserver. However, privacy has to be ensured during model training to prevent model inversion attack [11] andinference attack [10]. An algorithm designed using differential privacy is resistant against these attacks. Theproposed algorithm is ϵ differentially private and the privacy analysis is proved in Section 4.2.

Differential privacy is already used [12] as resistance against these attacks. Hence our algorithm uses Dif-ferentially Private perturbation to preserve user privacy and to protect from these attacks. Existing algorithmsadd noise to the optimization functions like stochastic gradient descent and alternating least square [9, 12].

2390


But our new approach perturbs the input user embedding and bias used by the deep learning algorithm. In ahybrid recommendation system, user embedding and bias added to the weights in the neural network conveyuser-specific information, and perturbing these values preserves user privacy.

For training any deep neural network, the categorical features in the dataset cannot be used directly andit requires some preprocessing. As a preprocessing step, the categorical features present in the dataset have tobe converted to numerical form and given as input to the neural network. A naive approach is to convert thecategorical text information to numerical form using one hot encoding. For example, if there are 10000 uniquewords present in the dataset, then one-hot encoding constructs matrix with a slot for each word. Further, asthe name one-hot encoding suggests if a word is used by an item, the value for the corresponding index in thematrix is made ’1’, and the remaining indices values are made ’0’. But one hot encoding produces a highlysparse input features with very few non zero values. Such sparse input features need more weights in a neuralnetwork with a large amount of data and higher computation. Another major issue with one-hot encoding isthat it is not capable of capturing the semantic relationship between the features.

As a solution to the issues in one-hot encoding, embeddings are used. It transforms the large sparsefeatures into lower-dimensional space. The obtained lower-dimensional features are capable of identifying andpreserving the semantic relationship between the categorical items. The embedding to our neural networkis a combination of user rating and side information like genre, tag, etc. These semantic relationships helpin improved model accuracy, using personal information about user behavior. Such personal information issensitive, and its usage in the model thwarts user privacy. So the embedding is perturbed to ensure privacy.For training a network model, it is not mandatory to use bias information. If a model is trained without bias,it provides only a generic recommendation. It is used in the model to comprehend the user and the item better.Thus the bias added for a user is also sensitive and hence it is also perturbed to enhance privacy.

Initially, the collaborative filtering based model is used to train the original input, from which theembeddings are obtained. These embeddings, thus obtained contain user and item embeddings, along withuser and item biases. From these values our approach perturbs only the user embedding and bias values, usingAlgorithm 1. However, the user embedding obtained has a higher dimension, and this increased dimensionalityresults in more noise addition, which degrades the model accuracy. Hence, the dimensionality of obtained userembeddings are reduced using principal component analysis (PCA), and user features are obtained. In thereduced feature set users with similar tastes are close to each other and the users whose characteristics are notsimilar to others are far away.

The features that are close to each other are similar to the K anonymization mechanism and user privacyis retained for the points that are next to each other. K anonymization is a privacy preservation mechanismproposed by Latanya Sweeney [31]. Sweeny defines K-anonymity as the privacy requirement for publishingmicrodata that requires each equivalent class to contain atleast K records.

But, for the outlier features the privacy is easily violated. So, the outlier user information that is at aconsiderable distance from other features is more prone to attacks, and these features are identified initially.Further, more noise is added to the outliers, and less random noise is added to other features. The noise isobtained from the differentially private Laplace mechanism. The user bias information is also perturbed usingLaplace mechanism. The principal user features are converted back to dense embeddings before deep neuralnetwork training. The overall flow of our proposed methodology is depicted in Figure 1.

2391


Matrix

FactorizationUser Ratingand

Side Information

Pretraineduser

weights and biases

Perturbed

Pretraineduser

weights and biases

Global DifferentiallyPrivate

Algorithm

Deep NeuralNetwork

TransferLearningOutput

Figure 1. Overview flow diagram.

4.1. Proposed private recommender

In the proposed system, the user’s privacy is protected by adding a differentially private noise to the embeddingand bias values. The user, item embedding are generated from user id, user side information, movie id, andmovie side information respectively. Such embedding is not suitable for private model training; hence, perturbedembedding is obtained using Algorithm 1. The deep learning model depicted in Figure 2 is trained with theperturbed embeddings and biases using transfer learning mechanism. Transfer learning is a key advancementin deep learning that supports model training with pretrained weights. The perturbed pretrained embeddingsbring in randomness and prevents the model from memorizing user-specific information. The fc in the figureindicates fully connected layer. Totally three fully connected deep neural networks are used for transfer learning.

Perturbed User

Embedding

Perturbed Item

Embedding

Concat

fc1 (Leaky ReLU+

Dropout)

fc2 (Leaky ReLU+

Dropout)

fc3 (Leaky ReLU+

Dropout)outSigmoid

Predicted

rating

Figure 2. Proposed private hybrid recommendation system.

Most of the existing works [9, 12, 29] perform noise addition during model training stage, which is splitamong multiple epochs and noise addition increases with epochs. Offline noise addition in our work ensures areduction in noise compared to the existing algorithms. Algorithm1 is executed offline and the steps are briefedin the following paragraphs.

Before the execution of the proposed algorithm, the embeddings with ’d’ dimensions are obtained from

2392


Algorithm 1 Global differentially private algorithmInput: ϵ1 and ϵ2 - Privacy parameter

∆S1,∆S2 - SensitivityE(n×d) - Embedding for ’n’ users with dimension ’d’bi, i ∈ 1, 2, ....n - User bias parameter of the model for ’n’ users

Output:E

′(n×d) , b′i, i ∈ 1, 2, ....n - Perturbed user Embedding and bias

1. Apply PCA on E(n×d) and obtain the reduced feature set f (n×g)

2. Identify the outlier in f (n×g) by applying z score normalization3. Store the outlier user id and feature id in O4. for i = 1, 2, ...n do5. for j = 1, 2, ...g do6. if i,j in O

7. f′

ij = fij + Lap(

g∗∆S1

ϵ1

)8. Clamp f

′

ij to [fijmin, fijmax]9. else

10. v = randomized response()11. if v = = True12. Retain the True Value f

′

ij = fij13. else14. f

′

ij = fij + Lap(

g∗∆S2

ϵ1

)15. Clamp f

′

ij to[fijmin, fijmax

]16. end for17. b

′i = bi + Lap

(∆S3

ϵ2

)18. end for19. Convert f

′(n×g) into dense embeddings W′(n×d)

function randomized response()1. Initialize v1 = 1

2. v′1 =

1 with probability 1

2p

0 with probability 12p

v1 with probability 1− p

3. ifv′1 == 1

4. Return True5. else6. Return False

collaborative filtering based matrix factorization. A naïve approach is to add ‘d’ dimensional noise to theembeddings. However, adding ‘d’ dimensional noise E

′userij = Euserij+Lap

(d∗∆S

ϵ

)introduces a huge amount

of noise and completely degrades the utility of data. Hence, the proposed solution uses principal componentanalysis (PCA), and a reduced dimension ‘g’ is obtained. The PCA algorithm produces a reduced featureset f (n×g) in step2. Further, a Laplace noise addition is performed on the reduced dimension ‘g’, whicheventually increases recommendation accuracy. The algorithm uses adaptive noise addition that adds largernoise to outlier and minimal random noise to remaining features. The outliers in the embedding signify theusers who have unique features and these users completely deviate from other users and they are more proneto attacks. Therefore, our algorithm initially identifies the outliers present in the feature set in step 3 using z

2393


score normalization using Equation 3.

z =

(x− µ

σ

)(3)

Where, µ signifies mean and σ denotes standard deviation. After outlier identification more noise isadded to outliers with sensitivity ∆s1 = fjmax

− fjminand ϵ1 in step 7. To avoid excess noise addition, the

feature is clamped (Step 15) as follows:

f′

ij =

fij

minif f

′

ij < fijmin

fijmax

if f′

ij > fijmax

fij otherwise

(4)

Hence, steps 3 to 8 identify the outliers and adds more noise to the outlier features. The remaining featuresare perturbed with minimal noise by a randomized response mechanism. In step 10, randomized response iscalled, which is a traditional coin flip used in differential privacy. The coin flip is made with probability 0.75returning the true answer and 0.25 returning the perturbed answer. The minimal perturbation is made in step14 with ∆s2 = 1 and ϵ1 . The coin flip and less sensitivity are chosen to minimize the randomness introducedin the model and to improve the utility. Hence, our perturbation mechanism is an adaptive mechanism.

In step 17 user bias is perturbed with sensitivity ∆s3 = bimax− bimin and epsilon of ϵ2 . Throughout thealgorithm, the perturbation is applied in two steps. Therefore, based on the composability property (Definition3)of differential privacy the epsilon is divided into 0.75 for ϵ1 and 0.25 for ϵ2 . This division is user-defined andit is made in-line with the existing algorithms [8], where the composability partition is made based on theimportance of the feature.

Since the proposed global differentially private algorithm (Algorithm 1) is generic it can be extended andapplied to any deep learning network. Such network can perform classification or regression, but the privatealgorithm requires perturbed embedding as input.

4.2. Privacy and utility analysisIn this section, we present a theoretical analysis for privacy and utility of the proposed global differentiallyprivate algorithm.

4.2.1. Privacy analysisThe proposed algorithm contains one private operation that is embedding perturbation. In this section, weanalyze the privacy guarantee of the embedding perturbation. Along with embedding the bias values arealso perturbed whose privacy is guaranteed by the composition of differential privacy [17]. The compositionundertakes the privacy guarantee for a sequence of differentially private computation.

Theorem 1 Algorithm 1 satisfies ϵ - differential privacy.

Proof Suppose two Datasets D and D’ differ in the ratings of one user. Let ’f’ be the reduced feature set.

Pr(R(D))

Pr (R (D′))=

∏gj=1(Pr(fj(D) + Lap(∆S

ϵ ) = R)∏gj=1(Pr(fj(D

′) + Lap(∆Sϵ ) = R)

≤ eϵ

2394


=

∏gj=1 exp

(−∥R−fj(D)∥1ϵ

∆S

)∏g

j=1 exp(

−∥R−fj(D′)∥1ϵ

∆S

)=

g∏j=1

exp( ϵ

∆S(∥R− fj(D)∥1 − ∥R− fj(D

′)∥1))

=

g∏j=1

exp( ϵ

∆S(∥fj(D)− fj(D

′))∥1))≤ eϵ

2

Lemma 1 Composition [17], M = m1,m2, .....mn if each mi provides ϵ′ privacy guarantee, the sequence ofM will provide n ∗ ϵ′ differential privacy.

Proof In the reduced feature set we add independent Laplace noise to the features.

f′

ij = fij +g ∗∆S1

ϵ1

further noise is added to bias

b′

ij = bij +∆S3

ϵ2

when combining both operations the proposed method preserves ϵ -differential privacy by applying the compo-sition lemma. 2

4.2.2. Utility analysis

In order to protect data, privacy noise is added to the low dimensional embeddings. A naive solution addsnoise to all the embeddings, which degrades the performance of the algorithm. This is due to the fact that themagnitude of noise added is directly proportional to the performance of the algorithm.

Theorem 2 For a given privacy parameter ϵ , Algorithm1 adds less noise compared to the naive solution.

Proof In algorithm1 the reduced feature set has n ∗ g elements. To each element noise drawn from Lap(∆Sϵ

)is added. The magnitude of noise M1 = O(n∗g∗∆S2

ϵ2 ) .In naive solution noise is added to n ∗m elements. To each element noise drawn from Lap

(∆Sϵ

)is added. The

magnitude of noise M2 = O(n∗m∗∆S2

ϵ2 ) .

M1 = O(n ∗ g ∗∆S2

ϵ2) < O(

n ∗m ∗∆S2

ϵ2) = M2

where g<<m. Hence, we observe that M1 < M2. That is the Algorithm.1 adds less noise than naive approach.2

2395


5. Experimental evaluation5.1. DatasetThe experiments are conducted using three datasets MovieLens(100k), Book-Crossing, and FilmTrust. Theproperties of the dataset are briefed in Table 2.

Table 2. Datasets.

Dataset Users Items Ratings Sparsity ScaleBook-Crossing 278,858 271,379 1,149,780 99.99% [0-10]ML100K 610 9,742 100,836 93.7% [1-5]FilmTrust 1,508 2,071 35,497 98.86% [1-5]

5.2. Computing environmentThe experiments are implemented in python 3.7.3 by leveraging the libraries like scikit-learn 0.20.3, pandas0.24.2, and numpy 1.16.2. The computing environment with Nvidia GPU and Linux operating system with12GB RAM are used. PyTorch 1.1.0 library was used for training the deep learning models.

5.3. Evaluation criteriaThe algorithm is evaluated with varying privacy parameter ϵ and six evaluation metrics root mean squarederror (RMSE), mean squared error (MSE), mean absolute error (MAE), R-squared, Precision@N, and Recall@Nvalues are used. The accuracy metrics are calculated with eight runs. The mathematical definitions of theevaluation metrics namely MSE, RMSE, and MAE are in the Equations 5,6 and 7. For these equations, pij isa matrix with a cell value pij = 1 only if the user i rated item j and 0 otherwise. In Equations 5,6 and 7 ractualij

denotes the rating provided by user i to item j and rpredictedij is the rating predicted by the model. For a user u,

Precision@N, and Recall@N are computed by |Relu ∩Recu||Recu| and |Relu ∩Recu|

|Relu| , respectively. Where Recu denote

a set of N items recommended to u, and Relu denote a set of items considered relevant.

RMSE =

√√√√m,n∑i,j

pij

(ractualij − rpredictedij

)2

(5)

MSE =

m,n∑i,j

pij

(ractualij − rpredictedij

)2

(6)

MAE =

m,n∑i,j

pij

∣∣∣ractualij − rpredictedij

∣∣∣ (7)

5.4. ResultsThe nonprivate deep neural network (DNNRec) recommendation algorithm results are chosen as the baselineand the private recommendation results are computed for varying values of epsilon ϵ . In Table 3,4, and 5 theboldfaced results signify that the accuracy is close to the nonprivate results.

2396


Choosing a value for ϵ is an open question and the works done so far [9, 12, 24, 28] record the results withvarious epsilon values of ϵ ranging from 0.1 (very low) to 40 (very high). As per the literature our experimentalresults tabulate the ϵ value of 0.1 to 40. Where 0.1 denotes high privacy and 40 denotes low privacy.

Table 3. Performance of global differentially private algorithm on Movielens 100K dataset.

Measure MSE(lower is better)

RMSE(lower is better)

MAE(lower is better)

R-Squared(higher is better)

Precision @10(higher is better)

Recall @10(higher is better)

non private 0.747 0.864 0.666 0.338 0.907 0.783ϵ =0.1 0.883 0.939 0.739 0.199 0.903 0.780ϵ =0.5 0.874 0.935 0.734 0.207 0.904 0.782ϵ =1 0.898 0.947 0.741 0.186 0.901 0.782ϵ =5 0.878 0.937 0.734 0.203 0.903 0.780ϵ =10 0.854 0.924 0.723 0.225 0.901 0.778ϵ =15 0.826 0.909 0.711 0.250 0.906 0.780ϵ =20 0.813 0.901 0.701 0.262 0.904 0.778ϵ =25 0.805 0.897 0.699 0.270 0.905 0.779ϵ =30 0.789 0.888 0.696 0.284 0.906 0.781ϵ =35 0.780 0.883 0.685 0.292 0.905 0.780ϵ =40 0.787 0.887 0.689 0.286 0.905 0.782

Table 4. Performance of global differentially private algorithm on film trust dataset.








As observed in Table 3,4 MovieLens 100K, and Film Trust performance match the nonprivate algorithmwith ϵ = 25 to 40. On the other hand the Book-Crossing dataset Table 4 match the nonprivate results for allthe values of ϵ . As expected the performance improves with lower privacy and vice versa. The findings clearlyindicate that the private algorithm is practical and it provides higher accuracy even for highly sparse datasets.As observed from Table 2, Book-crossing have the highest sparsity of 99% and it produces outstanding results.Also, it can be observed from Table 2 that Book-crossing has more users with a count of 278,858 and it is inferredfrom the results that the differentially private algorithm is more suitable for datasets with a large number ofusers. Hence, we conclude that our algorithm is more suitable for the realistic recommendation, which is highlysparse and consists of a large number of users. The experimental results confirm that the proposed algorithmis on par with the baseline concerning less error, good recommendation quality, and high coverage rate. The

2397


Table 5. Performance of global differentially private algorithm on Book-Crossing dataset.








error rate is measured with MSE, RMSE, MAE, and R-Squared. Recommendation quality and coverage rateare evaluated with Precision @ 10 and Recall @ 10, respectively.

5.5. Accuracy comparison to other nonprivate collaborative filtering approaches

In this section, we have compared the RMSE, MAE of our private algorithm with other nonprivate algorithms.The proposed deep learning-based private approach produces outstanding results. Table 6 highlights thebaseline, nonprivate algorithms used for comparison.

The accuracy measures are compared with the baseline algorithms and graphs are plotted accordingly inthis section. The results that are better than the baseline indicate high utility. We also plot the non-privatematrix factorization, non-deep neural network, and deep neural network algorithms which produce outstandingresults. For private algorithms, such exceptional results are difficult to achieve. Therefore we chose all thesethree algorithms as the upper bound and plot them along with our results for brevity.

Table 6. Summary of nonprivate baseline RMSE and MAE.

ML 100K Film Trust Book CrossingBaseline RMSE MAE RMSE MAE RMSE MAEGlobal average 1.062 0.842 0.915 0.716 1.859 1.509Item average 1.005 0.782 0.915 0.723 1.952 1.537User KNN Pearson 0.902 0.693 0.839 0.655 1.851 1.429SVD 0.937 0.718 0.814 0.629 1.932 1.537

The experimental results indicate that the baseline global and item average is achieved by all the datasets.However, attaining the performance of the remaining algorithms varies from one dataset to the other andis explained separately for every dataset. Figure 3a RMSE comparison indicates that beyond ϵ = 5 theperformance of the private algorithm is better than SVD and it is better than user KNN Pearson beyondϵ = 20. Hence the baseline RMSE accuracy is achieved for Movielens 100 K dataset, but the upper boundis not achieved. Figure 3b MAE comparison indicates that beyond ϵ = 11 the performance of the privatealgorithm is better than SVD and it is better than user KNN Pearson. The upper bound non-deep neural

2398


network performance is achieved beyond ϵ = 30. So, the baseline MAE accuracy is achieved for Movielens 100K dataset, and one upper bound algorithm result is achieved.

Figure 4a RMSE comparison indicates that beyond ϵ = 30, the performance of the private algorithm isbetter than SVD, and it is not able to achieve the user KNN Pearson results. Thus, the Film Trust privatealgorithm is better than three baseline RMSE results, but the upper bound results are not attained. In Figure 4btwo baseline, MAE accuracy results are achieved and upper bound algorithm results are not achieved by theFilm Trust dataset.

0.1 0.5 1 5 10 15 20 25 30 35 40

Epsilon

0.875

0.900

0.925

0.950

0.975

1.000

1.025

1.050

RM

SE

Global Average

Item Average

User Knn Pearson

SVDBiased MF

non Deep NN

DNNRec

Private DNNRec

(a) RMSE

0.675

0.700

0.725

0.750

0.775

0.800

0.825

0.850

MA

E

0.1 0.5 1 5 10 15 20 25 30 35 40

Epsilon

Global Average

Item Average

User Knn Pearson

SVD

Biased MF

non Deep NN

DNNRec

Private DNNRec

(b) MAE

Figure 3. Accuracy of Movielens100K.

0.80

0.82

0.84

0.86

0.88

0.90

0.92Global Average

Item Average

User Knn Pearson

SVD

Biased MF

non Deep NN

DNNRec

Private DNNRec

0.1 0.5 1 5 10 15 20 25 30 35 40

Epsilon

RM

SE

(a) RMSE

0.64

0.66

0.68

0.70

0.72 Global Average

Item Average

User Knn Pearson

SVD

Biased MF

non Deep NN

DNNRec

Private DNNRecMA

E

0.1 0.5 1 5 10 15 20 25 30 35 40

Epsilon

(b) MAE

Figure 4. Accuracy of Film Trust.

Figure 5a and Figure 5b indicate that our algorithm produces extraordinary performance for the Book-Crossing dataset and satisfies all the baseline accuracy. In upper bound, it shows good performance for allalgorithms except deep neural network.

To the best of our knowledge, none of the existing private algorithms attain the performance of state-of-the-art collaborative filtering algorithms like SVD and KNN. Our exhaustive experimental results confirmthat our private algorithm outperforms SVD and KNN for a few values of epsilon. We further observe that itis possible to attain most of the upper bound accuracies.

2399


0.1 0.5 1 5 10 15 20 25 30 35 40Epsilon

1.65

1.70

1.75

1.80

1.85

1.90

1.95

RM

SE Global Average

Item Average

User Knn Pearson

SVD

Biased MF

non Deep NN

DNNRec

Private DNNRec

(a) RMSE

0.1 0.5 1 5 10 15 20 25 30 35 40Epsilon

1.25

1.30

1.35

1.40

1.45

1.50

1.55

MA

E

Global Average

Item Average

User Knn Pearson

SVD

Biased MF

non Deep NN

DNNRec

Private DNNRec

(b) MAE

Figure 5. Accuracy of Book-Crossing.

5.6. Accuracy comparison to other private differential privacy based approaches

In Fig. 6, other differentially privacy based algorithm [32] is compared with proposed algorithm. [32] proposedPersonalized Differential Privacy (PDP-PMF) based probabilistic matrix factorization and compares the resultswith Differential Privacy (DP-PMF). Figure 6 clearly indicates that the proposed algorithms outperforms theexisting private algorithms. Section 6 concludes our work.

0

0.5

1

1.5

2

2.5

0.1 0.2 0.3 0.4 0.5 0.6Epsilon

ESM

R

Zhang et. al (DP-PMF)

Zhang et. al (PDP-PMF)

Private DNNRec

Figure 6. Comparison to other differential privacy based algorithm.

6. ConclusionIn this paper, a privacy-preserving hybrid deep learning algorithm based on differential privacy is proposed.The accuracy is enhanced using adaptive perturbation where large noise is added to outliers in the data and aminimal random noise addition is performed on all other features.

The contributions of this paper are fivefold. First, the experimental results of the proposed private deeplearning algorithm prove that it is feasible to achieve the benefits of deep learning with reasonable privacy.Second, the proposed solution is based on differential privacy which provides a mathematical guarantee forprotecting individual privacy. The implementation is compared with varying values of epsilon with very high

2400


privacy of ϵ = 0.1 to low privacy of ϵ = 40. The accuracy comparison with other algorithms indicates thatthe proposed noisy private algorithm outperforms the nonnoisy baseline approaches in terms of accuracy. Ourfindings also indicate that attaining upper bound is feasible for few datasets with deep learning approach.Third, since our noise addition is not performed during model training it does not require additional modelconvergence time like the existing algorithms. Fourth, the noise addition is performed on the embeddings in acompressed form obtained from PCA. This mechanism avoids excess noise addition and also ensures accuracy.Fifth, multiple epochs can be performed without incremental noise as the proposed algorithm uses perturbedpretrained weights with transfer learning. Transfer learning is a key advancement in deep learning that runsthe model with pretrained weights. The private deep learning approach is generic, so it can be applied to anybig data recommendation engine.

References

[1] Tianqing Z, Gang L, Wanlei Z, Philip S Y. Differential Privacy and Applications. NY, USA: Springer InternationalPublishing, 2017.

[2] Sangeetha S, Sudhasadasivam G. Privacy of big data: a review. In: Dehghantanha A, Choo KK (editors). Handbookof Big Data and IoT Security, NY, USA: Springer Cham, 2019, pp. 5-23.

[3] Zhang D, Chen X, Wang D, Shi J. A survey on collaborative deep learning and privacy- preserving. In: IEEE ThirdInternational Conference on Data Science in Cyberspace, Guangzhou, China; 2018. pp. 652-658.

[4] Hamm J, Cao P, Belkin M. Learning privately from multiparty data. In: Proceedings of 33rd InternationalConference on Machine Learning, NY, USA; 2016. pp. 555-563.

[5] Arun R, Shivani A. A differentially private stochastic gradient descent algorithm for multiparty classification. In:Proceedings of Machine Learning Research, La Palma, Canary Islands, Spain; 2012, pp. 933-941.

[6] Pathak MA, Rane S, Raj B. Multiparty differential privacy via aggregation of locally trained classifiers. In:Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vancouver, BritishColumbia, Canada; 2010, pp. 1876-1884.

[7] Narayanan A, Shmatikov V. Robust de-anonymization of large sparse datasets. In: IEEE Symposium on Securityand Privacy (sp 2008), Oakland, CA; 2008, pp. 111-125.

[8] Mironov I, McSherry F. Differentially private recommender systems: Building privacy into the Netflix prize con-tenders. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,Paris, France; 2009, pp. 627-636.

[9] Friedman A, Berkovsky S, Kaafar MA. A differential privacy framework for matrix factorization recommendersystems. User Modeling and User-Adapted Interaction 2016; 26(5): 425-458. doi: 10.1007/s11257-016-9177-7.

[10] Shokri R, Stronati M, Song C, Shmatikov V. Membership inference attacks against machine learning models. In:IEEE Symposium on Security and Privacy (SP), San Jose, CA; 2017, pp. 3-18.

[11] Fredrikson M, Jha S, Ristenpart T. Model inversion attacks that exploit confidence information and basic coun-termeasures. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security,Denver, Colorado, USA; 2015, pp. 1322-1333.

[12] Abadi M, Chu A, Goodfellow I, McMahan HB, Mironov I et al. Deep Learning with Differential Privacy. In:Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria;2016, pp. 308-318.

[13] Dowlin N, Gilad-Bachrach R, Laine K, Lauter K, Naehrig M et al. Cryptonets: Applying neural networks toencrypted data with high throughput and accuracy. In: Proceedings of the 33rd International Conference onInternational Conference on Machine Learning, NY, USA; 2016, pp. 201–210.

2401


[14] Ma Z, Liu Y, Liu X, Ma J, Ren K. Lightweight privacy-preserving ensemble classification for face recognition. IEEEInternet of Things Journal 2019; 6 (3): 5778-5790. doi: 10.1109/JIOT.2019.2905555.

[15] Ma Z, Liu Y, Liu X, Ma J, Li F. Privacy-preserving outsourced speech recognition for smart iot devices. IEEEInternet of Things Journal 2019; 6 (5): 8406-8420.

[16] McMahan HB, Ramage D, Talwar K, Zhang Li. Learning differentially private recurrent language models. In: 6thInternational Conference on Learning Representations; Vancouver, BC, Canada ; 2018. pp. 1–14.

[17] McSherry F, Talwar K. Mechanism design via differential privacy. In: Proceedings of the 48th Annual IEEESymposium on Foundations of Computer Science, IEEE Computer Society, Washington, DC, USA; 2007. pp. 94-103.

[18] Phan N, Wang Y, Wu X, Dou D. Differential privacy preservation for deep auto-encoders: an application of humanbehavior prediction. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona,2016; pp.1309-1316.

[19] Phan N, Wu X, Hu H, Dou D. Adaptive Laplace mechanism : differential privacy preservation in deep learning. In:Proceedings - IEEE International Conference on Data Mining ICDM; New Orleans, USA; 2017. pp. 385-394.

[20] Kiran R, Pradeep K, Bharat B. DNNRec: A novel deep learning based hybrid recommender system. Expert Systemswith Applications 2020; 144. doi: 10.1016/j.eswa.2019.113054.

[21] Sahoo A, Pradhan C, Barik R, Dubey H. DeepReco : deep learning based health recommender system usingcollaborative filtering. Computation 2019; 7: 25. doi: 10.3390/computation7020025.

[22] Covington P, Adams J, Sargin E. Deep neural networks for YouTube recommendations. In: Proceedings of the 10thACM Conference on Recommender Systems, Boston, Massachusetts, USA, 2016; pp. 191-198.

[23] Wei J, He J, C K, Zhou Y, Tang Z . Collaborative filtering and deep learning based recommendation system forcold start items, Expert Systems With Applications 2016, 69: 29-39, doi: 10.1016/j.eswa.2016.09.040.

[24] Dwork C. Differential privacy. In: Bugliesi M., Preneel B., Sassone V., Wegener I. (eds). Automata, Languages andProgramming, ICALP 2006. Lecture Notes in Computer Science. Heidelberg, Berlin: Springer, 2006, pp. 1-12.

[25] Dwork C, Roth A. The algorithmic foundations of differential privacy. Hanover, MA, USA: Now Publishers Incor-poration, 2014.

[26] Fanti G, Pihur V, Erlingsson Ú. Building a RAPPOR with the unknown: privacy-preserving learning of associationsand data dictionaries. Proceedings on Privacy Enhancing Technologies 2016; (3): 41-61.

[27] Erlingsson Ú, Pihur V, Korolova A. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In:Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security - CCS ’14, ScottsdaleArizona, USA; 2014, pp. 1054-1067.

[28] Shin H, Kim S, Shin J, Xiao X. Privacy enhanced matrix factorization for reccommendation with lo-cal differential privacy. IEEE Transactions on Knowledge and Data Engineering 2018; 30 (9): 1770-1782,doi:10.1109/TKDE.2018.2805356.

[29] Berlioz A, Friedman A, Kaafar MA, Boreli R, Berkovsky S. Applying differential privacy to matrix factorization.In: Proceedings of the 9th ACM Conference on Recommender Systems; Vienna, Austria; 2015. pp. 107–114.

[30] Goodfellow I, Bengio Y, Courville A. Deep Learning. USA: The MIT Press, 2016.

[31] Sweeney L. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness andKnowlege-Based Systems 2002; 10 (5): 557 – 570, doi: 10.1142/S0218488502001648.

[32] Shun Zhang, Laixiang Liu, Zhili Chen, Hong Zhong. Probabilistic matrix factorization with personalized differentialprivacy.Knowledge-Based Systems 2019; 183: 104864, doi: https://doi.org/10.1016/j.knosys.2019.07.035.

2402

Documents

Privacy preserving hybrid recommender system based on deep