View
2
Download
0
Category
Preview:
Citation preview
SHAPr: An Efficient and Versatile Membership Privacy RiskMetric for Machine Learning
Vasisht Duddu
University of Waterloo
Waterloo, Canada
vasisht.duddu@uwaterloo.ca
Sebastian Szyller
Aalto University
Espoo, Finland
contact@sebszyller.com
N. Asokan
University of Waterloo
Waterloo, Canada
asokan@acm.org
ABSTRACT
Data used to train machine learning (ML) models can be sensitive.
Membership inference attacks (MIAs), attempting to determine
whether a particular data record was used to train anMLmodel, risk
violating membership privacy. ML model builders need a principled
definition of a metric that enables them to quantify the privacy risk
of (a) individual training data records, (b) independently of specificMIAs, (c) efficiently. None of the prior work on membership privacy
risk metrics simultaneously meets all of these criteria.
We propose such a metric, SHAPr, which uses Shapley values
to quantify a model’s memorization of an individual training data
record by measuring its influence on the model’s utility. This mem-
orization is a measure of the likelihood of a successful MIA.
Using ten benchmark datasets, we show that SHAPr is effective(precision: 0.94±0.06, recall: 0.88±0.06) in estimating susceptibility
of a training data record for MIAs, and is efficient (computable
within minutes for smaller datasets and in ≈90 minutes for the
largest dataset) .
SHAPr is also versatile in that it can be used for other purposes
like assessing fairness or assigning valuation for subsets of a dataset.
For example, we show that SHAPr correctly captures the dispro-
portionate vulnerability of different subgroups to MIAs.
Using SHAPr, we show that the membership privacy risk of a
dataset is not necessarily improved by removing high risk training
data records, thereby confirming an observation from prior work
in a significantly extended setting (in ten datasets, removing up to
50% of data).
KEYWORDS
Membership Inference Attacks, Data Privacy, Deep Learning.
1 INTRODUCTION
Assessing the privacy risks of data is necessary as highlighted by
several official reports from government institutions (NIST [44],
the White House [18], and the UK Information Commissioner’s
Office [24]). Membership inference attacks (MIAs) are a potential
threat to privacy of an individual’s data used for training ML mod-
els [38, 40, 42, 46]. These attacks exploit the difference in a model’s
prediction on training and testing datasets to infer whether a given
data record was used to train that model. For datasets contain-
ing sensitive data, MIAs constitute a privacy threat. For instance,
identifying that a randomly sampled person’s data was used to
train a health-related ML model can allow an adversary to infer
the health status of that person. Hence, measuring the membershipprivacy risks of training data records is essential for data privacyrisk assessment.
Several existing tools, like MLPrivacyMeter [32] and MLDoc-
tor [29] can quantify membership privacy risk. They are based on
measuring the success rate of specific known MIAs [34, 38, 40, 46].
In addition, these attacks use aggregate metrics such as accuracy,
precision and recall over all training data records, and are not de-
signed for quantifying individual, record-level membership privacy
risk. Song and Mittal [42] proposed a record-level probabilistic met-
ric (hereafter referred as SPRS) defined as the likelihood of a data
record being present in the target model’s training dataset. SPRS is
intended to be used by adversaries rather than model builders. It
also relies on specific MIAs.
Ideally, a membership privacy risk metric should capture the root
cause of MIAs, namely the memorization of training data records.
Such a metric will be independent of any specific attack and thus
be applicable to any future MIAs as well. Hence, there is a need
for a principled definition of membership privacy risk metric for
individual training data records.
We present SHAPr, a membership privacy risk metric designed
for model builders. The intuition behind SHAPr is that membership
privacy risk for a model can be estimated by measuring the extent
to which the model memorizes an individual training data record.
This is done by estimating the influence of each training data record
on the model’s utility using the leave-one-out training approach [13,
30]. However, directly using the leave-one-out approach for each
data record is computationally expensive [15, 21–23].
Shapley values is a well-known notion in game theory used
to quantify the contributions of individuals within groups [39]. It
was shown to be capable of capturing the influence of training
data records on a model’s utility by approximating the leave-one-
out training [15, 22]. Crucially, Shapley values can be efficiently
computed in one go for every training data record (see Section 4)
without having to train two models for each training data record(with and without that data record in the training dataset). Shapley
values have been recently used in the context of data valuation in
ML (to estimate economic value of a data record) [14, 15, 21, 22].
They have also been used for estimating attribute influence for
explainability [31]. We propose using Shapley values as the basis to
quantify the membership privacy risks of individual training data
records. We make the following contributions:
(1) SHAPr1, the first efficiently computable metric with a principled
approach (independently of specificMIAs), using Shapley values,
for estimating membership privacy risk for individual trainingdata records. (Section 4)
(2) Showing that
1We plan to open-source our implementation.
arX
iv:2
112.
0223
0v1
[cs
.CR
] 4
Dec
202
1
• SHAPr is effective at assessing susceptibility of a training
data record to state-of-the-art MIAs (0.93 ± 0.06 precision
and 0.88 ± 0.06 recall across ten benchmark datasets). This
is comparable SPRS (adapted to the model builder’s setting);
(Section 6.1)
• unlike SPRS, SHAPr can correctly estimate how adding noise
to a subset of the training dataset impacts membership pri-
vacy risk, (Section 6.2);
• metrics like SPRS, based on specific MIAs, may not correctly
assess susceptibility to future attacks (Section 6.3); and
• SHAPr scores can be computed significantly more efficiently
than the direct application of the leave-one-out approach.
(Section 6.4)
(3) Demonstrating that SHAPr is versatile in that it
• can be used to estimate fairness of the distribution of mem-
bership privacy risk across subsets of a dataset grouped ac-
cording different attribute values, (Section 7.1), and
• inherits other established applications of Shapley values like
data valuation in data marketplaces [15, 21–23]. (Section 7.2)
(4) Using SHAPr, to show that removing data records with high
membership privacy risk does not necessarily reduce risk for
the remaining data records, confirming the observation by Long
et al. [30], but on a broader scale, using ten large datasets (vs.
one), and exploring the effect of removing up to 50% of training
data records (vs. 2%). (Section 8)
2 BACKGROUND
Consider a training dataset 𝐷𝑡𝑟 = {𝑥𝑖 , 𝑦𝑖 }𝑛𝑖=1 containing input fea-
tures 𝑥𝑖 ∈ 𝑋 and corresponding classification labels 𝑦𝑖 ∈ 𝑌 where
𝑋 and 𝑌 are the space of all possible inputs and corresponding
labels. A machine learning (ML) classifier is a model 𝑓\ which maps
the inputs to the corresponding classification labels 𝑓\ : 𝑋 → 𝑌 .
The function parameters \ are updated by minimizing the loss be-
tween the model’s prediction 𝑓\ (𝑥) on input 𝑥 and the true labels
𝑦. The loss is minimized using gradient based algorithms such as
stochastic gradient descent.
2.1 Membership Inference Attacks
MIAs differentiate betweenmembers and non-members of the train-
ing dataset of a model using the output predictions of that model,
or some function of them. We identify four main types of MIAs
proposed in the literature:
Shadow Models [40]. Shokri et al. proposed the first attack that
uses a ML attack model to distinguish between a member and
non-member based on the predictions of the target model. The
attack assumes that the adversary has auxiliary data including
some training data records used by the target model. This auxiliary
data is used to train multiple shadow models to mimic the utility of
the target model. An attack ML model is then trained to distinguish
between members and non-members using the predictions of the
shadow models. Given a prediction from the target model for an
arbitrary input, the attack model can classify it as a member or a
non-member. This attack has two main drawbacks: first, it assumes
a strong adversary who has partial knowledge about the target
model’s training data, and second it incurs a high computational
overhead due to the need to train multiple (shadow) models.
Prediction Correctness [46]. An alternative approach, thatmakes
weaker assumptions regarding adversary capabilities, relies on the
fact that models which do not generalize well make correct pre-
dictions on training data records but not on testing data records.
The adversary decides that a data record is a member if it is cor-
rectly predicted by the target model and non-member otherwise.
This attack is particularly applicable in settings where the target
model outputs only a label. However, the attack works for poorly
generalizing models and assumes the adversary knows the ground
truth labels for the data records used to probe the target model.
Prediction Confidence [38, 46]. A third approach uses prediction
confidence reported by the target model across all classes. Givenan input data record, the target model outputs a vector describing
the confidence that the record belongs to each class. The maximum
confidence value is likely to be higher for an input data record that
was also part of the training set, than for one that was not [40,
41]. The prediction confidence attack relies on this observation: it
declares an input data record as a member if the associated highest
confidence is higher than an adversary-chosen threshold, and as
a non-member otherwise. Unlike Prediction Correctness attacks,
Prediction Confidence attacks do not require the adversary to have
any knowledge of the target model’s training data or the ground
truth for the input data record. However it assumes that the target
model outputs confidence values for all classes.
Prediction Entropy [38, 40, 42]. Rather than relying on the max-
imum confidence value in the output prediction, an adversary may
resort to a more sophisticated function defined over the set of
confidence values in the prediction. The entropy in a model’s pre-
diction (i.e., information gain for the adversary) is the uncertainty
in predictions [38, 40]. The entropy differs for training and testing
data records which the adversary can use as the basis for deciding
whether an input data record was in the training set. For instance,
the output for a training data record is likely to be close to a one-hot
encoding, resulting in a prediction entropy close to zero. Testing
data records are likely to have higher prediction entropy values. As
with the previous method, the adversary can choose a threshold
for the prediction entropy to decide whether an input data record
is a member or not.
In the rest of this paper, we focus on Prediction Entropy attacks;
we use a modified entropy function as the basis for the state-of-the-
art MIA [42] (described in detail in Section 6.1).
2.2 Memorization in Deep Learning
Membership privacy risk (susceptibility to MIAs) occurs due the
fact that deep learning models, with their inherent large capacity,
tend to memorize training data records [12, 34]. Memorization of
an 𝑖𝑡ℎ data record 𝑧𝑖 = (𝑥𝑖 , 𝑦𝑖 ) with input features 𝑥𝑖 and label 𝑦𝑖in a training dataset 𝐷𝑡𝑟 can be estimated as the difference in the
prediction of a model on input features 𝑥𝑖 when the model was
trained with and without 𝑧𝑖 in its training set [13]. Formally, for
a specific model 𝑓\ drawn from the set of models for a training
algorithm A, this can be written as follows:
𝑚𝑒𝑚(𝑧𝑖 , 𝐷𝑡𝑟 ,A) = |𝑃𝑟 𝑓\∼A(𝐷) [𝑓\ (𝑥𝑖 ) = 𝑦𝑖 ]−𝑃𝑟 𝑓\∼A(𝐷𝑡𝑟 \𝑧𝑖 ) [𝑓\ (𝑥𝑖 ) = 𝑦𝑖 ] |
(1)
2
If𝑚𝑒𝑚(𝑧𝑖 , 𝐷𝑡𝑟 ,A) is high, the model is likely to have memorized
𝑧𝑖 . The above formulation of memorization is a notion of leave-
one-out stability which captures the extent to which the presence
of a record in the training dataset influences the model’s output
predictions [13].
2.3 Shapley Values
An alternative approach to capture the influence of a training data
record is by estimating Shapley values [15, 21–23]. Shapley values
are of the form,
𝜙𝑖 =1
|𝐷𝑡𝑟 |∑︁
𝑆⊆𝐷𝑡𝑟 \{𝑧𝑖 }
1( |𝐷𝑡𝑟−1 ||𝑆 |
) [𝑈 (𝑆 ∪ {𝑧𝑖 }) −𝑈 (𝑆)] (2)
where 𝑆 is a randomly chosen subset of 𝐷𝑡𝑟 \{𝑧𝑖 } and𝑈 (𝑆) (accu-racy of 𝑓\ on a testing dataset 𝐷𝑡𝑒 when trained on 𝑆) is a util-
ity metric.
( |𝐷𝑡𝑟−1 ||𝑆 |
)denotes the binomial coefficient for choosing
|𝐷𝑡𝑟 − 1| elements from a set of |𝑆 | elements. Here, the Shapley
value of 𝑧𝑖 is defined as the average marginal contribution of 𝑧𝑖 to
𝑈 (𝑆) over all training data subsets 𝑆 ⊆ 𝐷𝑡𝑟 \{𝑧𝑖 }. Evaluating the
Shapley function naïvely for all possible subsets with and without
𝑧𝑖 is computationally expensive (complexity of 𝑂 (2 |𝐷𝑡𝑟 | for |𝐷𝑡𝑟 |data records [23]) and not scalable (leading to the same problem as
with naïve leave-one-out) [13, 30].
However, Shapley values can be efficiently computed using a
𝐾-Nearest Neighbours (𝐾-NN) classifier as a surrogate model [23].
Unlike the naïve approach to computing Shapley values which
requires training two models for each training data record, the𝐾-NN model, once trained, can be used to compute the Shapley
values for all training data records. This improves the computational
complexity to O(|𝐷𝑡𝑟 |𝑙𝑜𝑔( |𝐷𝑡𝑟 |.|𝐷𝑡𝑒 |)) compared to exponential
complexity of the formulation in Equation 2. We now outline this
approach [23]2. For a given 𝑧𝑖 , we can first compute the partial
contribution 𝜙𝑡𝑒𝑠𝑡𝑖
of a single test data record 𝑧𝑡𝑒𝑠𝑡 to the Shapley
value 𝜙𝑖 of 𝑧𝑖 , and then add up these partial contributions across
the entire 𝐷𝑡𝑒 .
Step 1. The training of the 𝐾-NN classifier consists of passing 𝐷𝑡𝑟and a single testing data record 𝑧𝑡𝑒𝑠𝑡 = (𝑥𝑡𝑒𝑠𝑡 , 𝑦𝑡𝑒𝑠𝑡 ) ∈ 𝐷𝑡𝑒 , as
an input to the target classifier 𝑓\ which outputs the probability
scores across all classes. The outputs 𝑓\ (𝐷𝑡𝑟 ) and 𝑓\ (𝑥𝑡𝑒𝑠𝑡 ) andtheir corresponding true labels are used for further computation.
Step 2. During inference, for 𝑧𝑡𝑒𝑠𝑡 , the 𝐾-NN classifier identifies
the top 𝐾 closest training data records (𝑥𝛼1 , · · · , 𝑥𝛼𝐾 ) with la-
bels (𝑦𝛼1 , · · · , 𝑦𝛼𝐾 ) using the distance between the predictions
(𝑓\ (𝑥𝛼1 ), · · · , 𝑓\ (𝑥𝛼𝐾 )) and 𝑓\ (𝑥𝑡𝑒𝑠𝑡 ). We use 𝛼 𝑗 (𝑆) to indicate the
index of the training data record, among all data records in 𝑆 ,
whose output prediction is the 𝑗𝑡ℎ closest to 𝑓\ (𝑥test). For brevity,𝛼 𝑗 (𝐷𝑡𝑟 ) is written simply as 𝛼 𝑗 . Following prior work on data val-
uation [21, 23], we use 𝐾 = 5.Step 3. The𝐾-NN classifier assigns majority label corresponding to
the top 𝐾 training data records as the label to 𝑥test. The probability
of the classifier assigning the correct label is given as: 𝑃 [𝑓\ (𝑥test) =𝑦test] = 1
𝐾
∑𝐾𝑖=1 1[𝑦𝛼𝑖 = 𝑦test]. Hence, the utility of the classifier
with respect to the subset 𝑆 , and the single test data record 𝑧𝑡𝑒𝑠𝑡 , is
computed as𝑈 𝑡𝑒𝑠𝑡 (𝑆) = 1𝐾
∑min{𝐾, |𝑆 | }𝑘=1
1[𝑦𝛼𝑘 (𝑆) = 𝑦test].
2Source code: https://github.com/AI-secure/Shapley-Study
Step 4.Consider all data records in𝐷𝑡𝑟 sorted as above {· · · , 𝑧𝛼𝑖−1 , 𝑧𝛼𝑖 , 𝑧𝛼𝑖+1 , · · · }.From Equation 2, the difference between the partial contributions
for two adjacent data records 𝑧𝛼𝑖 , 𝑧𝛼𝑖+1 ∈ 𝐷𝑡𝑟 is given by 𝜙𝑡𝑒𝑠𝑡𝛼𝑖−
𝜙𝑡𝑒𝑠𝑡𝛼𝑖+1 =
1
|𝐷𝑡𝑟 | − 1
∑︁𝑆⊆𝐷𝑡𝑟 \{𝑧𝛼𝑖 ,𝑧𝛼𝑖+1 }
[𝑈 𝑡𝑒𝑠𝑡 (𝑆 ∪ {𝑧𝛼𝑖 }) −𝑈 𝑡𝑒𝑠𝑡 (𝑆 ∪ 𝑧𝛼𝑖+1 )]( |𝐷𝑡𝑟−2 ||𝑆 |
)(3)
Using the 𝐾-NN utility function: 𝑈 𝑡𝑒𝑠𝑡 (𝑆 ∪ {𝑧𝛼𝑖 }) −𝑈 𝑡𝑒𝑠𝑡 (𝑆 ∪𝑧𝛼𝑖+1 ) =
1[𝑦𝛼𝑖 =𝑦test ]−1[𝑦𝛼𝑖+1=𝑦test ]𝐾
.
Once the label for 𝑥𝑡𝑒𝑠𝑡 is assigned, the partial contribution can
be computed recursively starting from the farthest data record:
𝜙𝑡𝑒𝑠𝑡𝛼 |𝐷𝑡𝑟 |=1[𝑦𝛼 |𝐷𝑡𝑟 | = 𝑦test]
|𝐷𝑡𝑟 |(4)
𝜙𝑡𝑒𝑠𝑡𝛼𝑖= 𝜙𝑡𝑒𝑠𝑡𝛼𝑖+1+
1[𝑦𝛼𝑖 = 𝑦test] − 1[𝑦𝛼𝑖+1 = 𝑦test]𝐾
min{𝐾, 𝑖}𝑖
(5)
The fractionmin{𝐾,𝑖 }
𝑖 is obtained by simplifying the binomial
coefficient (the full derivation can be found in Theorem 1 of Jia et
al. [21]). The intuition behind Equation 5 is that the contribution of
𝑧𝛼𝑖 is 0 if the nearest neighbor of 𝑧𝛼𝑖 in 𝑆 is closer to 𝑧𝑡𝑒𝑠𝑡 than 𝑧𝛼𝑖 ,
and 1 otherwise. Using the above steps, we get 𝜙𝑡𝑒𝑠𝑡 for each 𝑧𝑡𝑒𝑠𝑡of size 𝐷𝑡𝑟 × 1. This recursive formulation in Equation 5 can be
extended across all 𝐷𝑡𝑒 to obtain a matrix [𝜙𝑡𝑒𝑠𝑡𝑖
] of size 𝐷𝑡𝑟 ×𝐷𝑡𝑒 .The final Shapley values can be obtained by aggregating the partial
contributions 𝜙𝑡𝑒𝑠𝑡𝑖
across 𝐷𝑡𝑒 .
2.4 Song and Mittal’s Privacy Risk Scores
Song and Mittal [42] describe a membership privacy risk metric3
(whichwe refer to as SPRS) that defines themembership privacy risk
score of 𝑧𝑖 = (𝑥𝑖 , 𝑦𝑖 ) as the posterior probability that 𝑧𝑖 ∈ 𝐷𝑡𝑟 giventhe output predictions from the model 𝑓\ (𝑥𝑖 ). They compute the
score as 𝑟 (𝑧𝑖 ) = 𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑟 |𝑓\ (𝑥𝑖 )). This probability is computed
using Bayes’ theorem as
𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑟 )𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 )𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑟 )𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 ) + 𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑒 )𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑒 )
(6)
They assume that the probability of the data record belonging to the
training/testing dataset is equally likely, 𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑟 ) = 𝑃 (𝑧𝑖 ∈ 𝐷𝑡𝑒 )= 0.5. The membership privacy risk scores rely on training shadow
models on auxiliary dataset to mimic the functionality of the tar-
get model. The conditional probabilities 𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 ) and𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑒 ) are then computed using the shadow model’s
output predictions on the auxiliary training and testing dataset.
Further, instead of using fixed threshold based prediction entropy
MIA, each class has a threshold for deciding the data record’s mem-
bership which are computed using the auxiliary dataset. The con-
ditional probabilities are estimated per class 𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 ) ={𝑃 (𝑓\ (𝑥𝑖 ) |𝑧𝑖 ∈ 𝐷𝑡𝑟 , 𝑦 = 𝑦𝑖 )} across all class labels 𝑦 = 𝑦𝑖 .
Traditional MIAs require the adversary to sample arbitrary data
records to infer their membership status. SPRS is designed as a tool
for adversaries to identify data samples which are more likely to
be members instead of sampling a large number of data records.
3Source Code: https://github.com/inspire-group/membership-inference-evaluation/
blob/master/privacy_risk_score_utils.py
3
3 PROBLEM STATEMENT
Our goal is to design an efficient metric for model builders to assess
membership privacy risk, independent of specific MIAs. We lay
out the system and adversary models (Section 3.1), describe the
desiderata for such a metric (Section 3.2), and outline the challenges
in designing such a metric by discussing limitations of prior work
(Section 3.3).
3.1 System and Adversary Model
System Model. We consider the perspective of a model builder
M who trains a model using a dataset contributed to by multiple
participants.M wants to estimate the susceptibility of individual
data records to MIAs.M has full access to the training (𝐷𝑡𝑟 ) and
testing (𝐷𝑡𝑒 ) datasets and can use them to compute the membership
privacy risk score.
Adversary Model. The ground truth for the membership privacy
risk metric for a given training data record is the degree to which
an actual state-of-the-art MIA [38, 40, 42, 46] succeeds against that
record. We adapt the standard adversary model for MIAs [40, 42].
The adversary A has access to the prediction interface of a model
𝑓\ built using a training dataset 𝐷𝑡𝑟 . A submits data records via
the prediction interface and receives model outputs (this is a widely
adapted setting for cloud-based ML models in the industry). Given
an input data record 𝑥 , A can only observe the final output pre-
diction 𝑓\ (𝑥). The MIAs considered use the full prediction vec-
tor [40, 42] instead of the prediction labels [11, 26]. A does not
know the underlying target model architecture. Furthermore, we
assume that A does not know 𝐷𝑡𝑟 but has access to an auxiliary
dataset 𝐷𝑎𝑢𝑥 sampled from the same distribution as 𝐷𝑡𝑟 .
3.2 Requirements
We identify three requirements for an effective record-level mem-
bership privacy risk metric:
R1 Principled Definition. Membership privacy risk scores re-
sulting from the metric must be independent of specific MIAs,
thereby allowing the scores to assess the risks associated with
any MIA, current or future. (Sections 4 and 6.3)
R2 Correlation with attack success. The estimated risk score
of a particular data record must correlate with the likelihood
of success of the best MIA against that record. (Sections 6.1
and 6.2).
R3 Versatility. The membership privacy risk score, once com-
puted, should be useful to estimate other characteristics of the
dataset such as fairness (defined as susceptibility of subgroups,
as determined by one or more attributes, to MIAs; Section 7.1)
and economic value (as defined in prior work on Shapley scores;
Section 7.2).
3.3 Limitations of Existing Metrics
Privacy assessment libraries such as MLPrivacyMeter [32] and ML-
Doctor [29] quantify the membership privacy risk using existing
MIAs. They use aggregatemetrics such as accuracy, precision and re-
call for MIAs across all training data records, and are not optimized
for estimating the privacy risks of individual data records [42].
Song and Mittal propose SPRS which is a probabilistic mem-
bership privacy risk metric for individual data records [42]. SPRS
estimates use a specific MIA. The more effective the attack against
a particular data record, the higher the score. Hence, it is unclear
whether SPRS reflects vulnerability to future, more effective attacks.
Additionally, SPRS does not satisfy the versatility requirement R3
(c.f. Sections 6.2 and 7.1).
Long et al. [30] propose Differential Training Privacy as a mem-
bership privacy metric based on the direct leave-one-out approach:
computing the difference between model predictions with and with-
out a given training record in the training dataset and hence, the
influence of that record on the model utility. However, as we saw in
Section 2.3, naïve application of the leave-one-out approach cannot
scale to large datasets and models since it requires retraining the
model to estimate the score for each data record.
4 SHAPR: A MEMBERSHIP PRIVACY RISK
METRIC
Shapley values (Section 2.3), originally designed as a game-theoretic
notion to quantify the contributions of individuals within groups [39],
were previously proposed for data valuation [14, 15, 21, 22] and
explainability [31]. We propose, SHAPr: using Shapley values to es-
timate individual record-level membership privacy risk associated
with an ML model. SHAPr scores are computed using a 𝐾-NN clas-
sifier trained on the output predictions of the target ML classifier.
SHAPr scores inherit certain properties from Shapley values
which allows SHAPr to satisfy the requirements introduced in
Section 3. In the context of membership privacy risk scores, these
properties can be formulated as follows:
P1 Interpretable. The SHAPr score 𝜙𝑖 of a data record 𝑧𝑖 =
(𝑥𝑖 , 𝑦𝑖 ) is captured measuring how 𝑧𝑖 ’s addition to a training
dataset 𝑆 influences utility 𝑈 () of the resulting model (Equa-
tion 2). Consequently, no influence (i.e.,𝑈 (𝑆) = 𝑈 (𝑆 ∪ 𝑧𝑖 ) leadsto a zero score. Similarly if two data records 𝑧𝑖 and 𝑧 𝑗 have the
same influence (i.e., 𝑈 (𝑆 ∪ 𝑧𝑖 ) = 𝑈 (𝑆 ∪ 𝑧 𝑗 ), then they are as-
signed the same score. We can identify three ranges of SHAPr
scores that have associated semantics:
(a) Case 1:𝑈 (𝑆∪{𝑧𝑖 }) =𝑈 (𝑆) → 𝜙 = 0: There is no differencein the model’s output regardless of the presence of 𝑧𝑖 in
the training dataset: 𝑧𝑖 has a no membership privacy risk.
(b) Case 2: 𝑈 (𝑆 ∪ {𝑧𝑖 }) > 𝑈 (𝑆) → 𝜙 > 0: 𝑧𝑖 contributed to
increasing the model utility. Higher scores indicate higher
likelihood of memorization which increases the suscepti-
bility to MIAs.
(c) Case 3: 𝑈 (𝑆 ∪ {𝑧𝑖 }) < 𝑈 (𝑆) → 𝜙 < 0: 𝑧𝑖 was harmful
to the model’s utility (not learnt well by the model or is
an outlier). It has a higher loss and is indistinguishable
from testing data records which makes it less susceptible
to MIAs.
This clear semantic association allows us to set meaningful
thresholds for SHAPr scores that can be used to decide whether
a data record is susceptible to MIAs.
P2 Additive. 𝜙𝑖 is computed using the testing dataset 𝐷𝑡𝑒 . Specif-
ically, 𝜙𝑖 (𝑈𝑘 ) represents the influence of 𝑧𝑖 on utility 𝑈 () w.r.tto 𝑘𝑡ℎ testing data record. For two testing data records 𝑘 and 𝑙 ,
𝑈𝑖 ({𝑘, 𝑙}) = 𝑈𝑖 (𝑘) +𝑈𝑖 (𝑙). Hence, 𝜙𝑖 is the sum of the member-
ship privacy risk scores of 𝑧𝑖 with respect to each testing data
record. This property further implies group rationality where
4
the entire model utility is fairly and completely distributed
amongst all the training data records.
P3 Heterogeneous. Different data records have different influ-
ence on the model’s utility and hence, have varying suscepti-
bility to MIAs (referred to as “heterogeneity"). SHAPr assigns
scores to training data records based on their individual in-
fluence on the model’s utility. This is referred to as equitabledistribution of utility among the training data records in prior
work [22].
We will refer back to these properties while interpreting the
results of our experiments (Sections 6 and 7).
5 EXPERIMENT SETUP
Using several datasets (Section 5.1) we systematically evaluated the
effectiveness of SHAPr with respect to the requirements identified
in Section 3. In the rest of this section, we describe the model
architecture (Section 5.2, the state-of-the-art MIA (Section 6.1), and
the metrics (Section 5.4) we used in our experiments.
5.1 Datasets
Weused ten datasets for our experiments.We used the same number
of training and testing data records from all the datasets except in
the case of MNIST and FMNIST where we used the entire dataset to
ensure the utility of the resulting model is sufficiently high. Three
datasets, TEXAS, LOCATION and PURCHASE, were also used to
evaluate SPRS [42] – we refer to them as SPRS datasets. To facilitate
comparison with SPRS, we used the same dataset partitions for the
three SPRS datasets as described in [42]. We summarize the dataset
partitions in Table 1.
Table 1: Summary of dataset partitions for our experiments.
Dataset Training Set Size Testing Set Size
SPRS Datasets
LOCATION 1000 1000
PURCHASE 19732 19732
TEXAS 10000 10000
Additional Datasets
MNIST 60000 10000
FMNIST 60000 10000
USPS 3000 3000
FLOWER 1500 1500
MEPS 7500 7500
CREDIT 15000 15000
CENSUS 24000 24000
We briefly describe each of the ten datasets, starting with the
SPRS datasets:
LOCATION contains the location check-in records of individu-
als [4]. We used the pre-processed dataset from [40] which con-
tains 5003 data samples with 446 binary features corresponding to
whether an individual has visited a particular location. The data is
divided into 30 classes representing different location types. The
classification task is to predict the location type given the location
check-in attributes of individuals. As in prior work [20, 42], we
used 1000 training data records and 1000 testing data records.
PURCHASE consists of shopping records of different users [5].
We used a pre-processed dataset from [40] containing 197,324 data
records with 600 binary features corresponding to a specific product.
Each record represents whether an individual has purchased the
product or not. The data has 100 classes each representing the
purchase style for the individual record. The classification task is
to predict the purchase style given the purchase history. We used
19,732 train and test records as in prior work [42].
TEXAS consists of Texas Department of State Health Services’ in-
formation about patients discharged from public hospitals [6]. Each
data record contains information about the injury, diagnosis, the
procedures the patient underwent and some demographic details.
We used the pre-processed version of the dataset from [40] which
contains 100 classes of patient’s procedures consisting 67,330 data
samples with 6,170 binary features. The classification task is to
predict the procedure given patient’s attributes. We used 10,000
train and test records as in prior work [20, 42].
Additionally, we used seven other datasets described below. We
rounded down the number of training data records in dataset to
the nearest 1000 and split it in half between training and testing
datasets.
MNIST consists of a training dataset of 60,000 images and a test
dataset of 10,000 images that represent handwritten digits (0-9).
Each data record is a 28x28 grayscale image with a corresponding
class label identifying the digit. The classification task is to identify
the handwritten digits. We used the entire training and testing set.
FMNIST consists of a training dataset of 60,000 data records and a
test dataset of 10,000 data records that represent pieces of clothing.
Each data record is a 28x28 grayscale image with a corresponding
class from one of ten labels. The classification task is to identify the
piece of clothing.
USPS consists of 7291 16x16 grayscale images of handwritten dig-
its [7]. There area total of 10 classes. The classification task is to
identify the handwritten digits. We used 3000 training data records
and 3000 testing data records.
FLOWER consists of 3670 images of flowers categorized into five
classes—chamomile, tulip, rose, sunflower, and dandelion—with
each class having about 800 320x240 images. The dataset was col-
lected from Flickr, Google Images and Yandex Images [3]. The
classification task is to predict the flower category given an image.
We used 1500 train and 1500 testing data records.
CREDIT is an anonymized dataset from the UCI Machine Learning
dataset repository which contains 30000 records with 24 attributes
for each record [2]. It contains information about different credit
card applicants, including a sensitive attribute: the gender of the
applicant. There are two classes indicating whether the application
was approved or not. The classification task is to predict whether
the applicant will default. We used 15000 training data records and
15000 testing data records.
MEPS contains 15830 records of different patients that used medi-
cal services, and captures the frequency of their visits. Each data
record includes the gender of the patient, which is considered a sen-
sitive attribute. The classification task is to predict the utilization
of medical resources as ’High’ or “Low” based on whether the total
number of patient visits is greater than 10. We use 7500 training
data records and 7500 testing data records.
5
CENSUS consists of 48842 data records with 103 attributes about
individuals from the 1994 US Census data obtained from UCI Ma-
chine Learning dataset repository [1]. It includes sensitive attributes
such as gender and race of the participant. Other attributes include
marital status, education, occupation, job hours per week among
others. The classification task is to estimate whether the individ-
ual’s annual income is at least 50,000 USD. We used 24000 training
data records and 24000 testing data records.
5.2 Model Architecture
While the proposed SHAPr scores are compatible with all types of
machine learning models, we focus on deep neural networks in our
evaluation. We used a fully connected model with the following
architecture: [1024, 512, 256, 128, 𝑛] with tanh() activation functions
where 𝑛 is the number of classes. This model architecture has been
used in prior work on MIAs [20, 33, 34, 38, 40, 42].
5.3 Membership Inference Attack
We used a modification of prediction entropy attack as proposed
by Song and Mittal [42]. The prediction entropy (described in Sec-
tion 6.1) outputs a zero prediction entropy for data records with cor-
rect or incorrect classification predicted with high confidence by the
model. For a given data record (𝑥,𝑦), the modified entropy function:
𝑀𝑒𝑛𝑡𝑟 (𝑓\ (𝑥), 𝑦) = −(1− 𝑓\ (𝑥)𝑦)𝑙𝑜𝑔(𝑓\ (𝑥)𝑦)−∑𝑖≠𝑦 (𝑓\ (𝑥)𝑖𝑙𝑜𝑔(1−
𝑓\ (𝑥)𝑖 )), accounts for this problem. Here, 𝑓\ (𝑥)𝑦 indicates the pre-
diction on record 𝑥 with correct label 𝑦. A thresholds the modified
prediction entropy to determine themembership status: 𝐼𝑚𝑒𝑛𝑡 (𝑓\ (𝑥), 𝑦)= 1{𝑀𝑒𝑛𝑡𝑟 (𝑓\ (𝑥), 𝑦) ≤ 𝜏𝑦}. Instead of using a fixed threshold of
0.5 over the prediction confidence as seen in original prediction
entropy attack (Section 6.1), the thresholds 𝜏𝑦 are adapted for each
class using the shadow models trained on the auxiliary dataset to
improve the attack accuracy. This modified entropy function along
with the adaptive threshold gives the best attack accuracy [42]. We
refer to this attack as 𝐼𝑚𝑒𝑛𝑡 .
5.4 Metrics
For all the experiments, we used accuracy of MIAs as a the primary
metric along with the average membership privacy risk score. At-
tack accuracy is the percentage of data records with correct mem-
bership predictions by the attack model. Average membership
privacy risk score is the average over the membership privacy
risk scores assigned to training data records by a metric to evaluate
the membership privacy risk across a group of data records.
As in prior work [42], we used two additional metrics to measure
the success of the attack: precision and recall. Precision is the ratio
of true positives to the sum of true positive and false positives. This
indicates the fraction of data records inferred as members by theAwhich are indeed members of training dataset. Recall is the ratio
of true positives to the sum of true positives and false negatives.
This indicates the fraction of the training dataset’s members which
are correctly inferred as members by the A.
6 COMPARATIVE EVALUATION
We experimentally evaluated the effectiveness of SHAPr and SPRS
along two dimensions: their ability to (a) capture susceptibility to
MIAs (Section 6.1), (b) correctly assess the effectiveness of adding
noise to thwart MIAs (Section 6.2). We then tested how well SPRS
would perform against future MIAs by computing SPRS scores us-
ing an older MIA, and examining how that impacts the effectiveness
of SPRS in assessing susceptibility to the state-of-the-art MIA (Sec-
tion 6.3). Finally, we report on the performance cost of computing
SHAPr scores (Section 6.4).
Table 2 presents the baselines: the test accuracy of target models
trainedwith each dataset, and the attack accuracy of the state-of-the
art MIA (modified entropy attack, I𝑚𝑒𝑛𝑡 ) against each model.
Table 2: Test accuracy of target models for each dataset, and
corresponding attack accuracy using I𝑚𝑒𝑛𝑡 .
Dataset Test Accuracy I𝑚𝑒𝑛𝑡
SPRS Datasets
LOCATION 69.0 79.6
PURCHASE 84.65 65.1
TEXAS 49.92 83.0
Additional Datasets
MNIST 98.1 54.3
FMNIST 89.3 58.0
USPS 95.5 54.9
FLOWER 89.6 61.1
MEPS 84.0 62.4
CREDIT 79.9 58.6
CENSUS 82.2 56.5
6.1 Susceptibility to MIAs
Our goal was to evaluate how well the scores from a given met-
ric correlate with the success rate of MIAs. For SPRS, we used a
threshold of 0.5 that was shown to be optimal (according to the F1-
score [42]); data records with scores > 0.5 were deemed members.
SHAPr scores were thresholded at 0 (Property P1); data records
with scores > 0 were considered members. We used the results
of 𝐼𝑚𝑒𝑛𝑡 attacks as the ground truth for computing precision and
recall in order to assess both metrics.
For each dataset, we repeated the experiment ten times. We re-
port the mean precision, recall and their corresponding standard
deviations (Table 3). The results are color-coded: 1) orange in-
dicates that SPRS and SHAPr are comparable (similar mean and
small standard deviation); 2) red indicates that SPRS outperformed
SHAPr 3); and green indicates that SHAPr outperformed SPRS.
Additionally, we report the statistical significance of this difference
(corresponding p-value of a student t-test). Our null hypothesis was
that both sets of results came from the distribution with the same
mean. For p-value < 0.05 there is enough evidence to say that one
metric outperforms the other. For p-value < 0.01, the confidencewith which we can reject the null hypothesis is even stronger. Oth-
erwise (p-value > 0.05) we do not have enough evidence to say
that one metric consistently outperformed the other.
SHAPr has comparable precision to SPRS for six datasets, and
clearly outperformed SPRS for two (TEXAS and MEPS). On the
other hand, SPRS has better precision for two datasets (CREDIT
and CENSUS). In terms of recall, SHAPr outperformed SPRS scores
6
Table 3: SHAPr correlates with the success rate of MIAs.
Neithermetric consistently outperforms the other. Orange
indicates comparable results, red indicates SPRS outper-
forms SHAPr and green SHAPr outperforms SPRS.
Dataset Metric Precision p-value Recall p-value
SPRS Datasets
LOCATION
SPRS 0.96 ± 1e-16
>0.050.93 ± 1e-16
<0.01SHAPr 0.96 ± 0.000 0.85 ± 0.000
PURCHASE
SPRS 0.95 ± 1e-16
>0.050.80 ± 0.000
<0.01SHAPr 0.95 ± 1e-16 0.81 ± 0.000
TEXAS
SPRS 0.92 ± 1e-16
<0.010.95 ± 0.000
<0.01SHAPr 0.96 ± 1e-16 0.74 ± 1e-16
Additional Datasets
MNIST
SPRS 0.99 ± 0.002
<0.010.57 ± 0.013
<0.01SHAPr 0.99 ± 8e-4 0.94 ± 0.001
FMNIST
SPRS 0.99 ± 0.005
0.05
0.98 ± 0.026
<0.01SHAPr 0.99 ± 0.005 0.89 ± 0.026
USPS
SPRS 0.79 ± 0.201
0.84
0.76 ± 0.074
<0.01SHAPr 0.77± 0.230 0.98 ± 0.009
FLOWER
SPRS 0.98 ± 0.010
0.81
0.81 ± 0.040
<0.01SHAPr 0.98 ± 0.010 0.94 ± 0.008
MEPS
SPRS 0.96 ± 1e-16
<0.010.99 ± 0.000
<0.01SHAPr 0.97 ± 1e-16 0.91 ± 1e-16
CREDIT
SPRS 0.94 ± 0.006
<0.010.81 ± 2e-4
<0.01SHAPr 0.89 ± 0.004 0.92 ± 0.002
CENSUS
SPRS 0.98 ± 0.000
<0.051.00 ± 0.000
<0.05SHAPr 0.93 ± 0.000 0.84 ± 0.000
for five datasets, while SPRS outperformed SHAPr for the other
five.
It is worth noting that for a metric used to assess membership pri-
vacy risk, recall is more important than precision because minimiz-
ing false negatives (i.e., failing to correctly identify a training data
record at risk) is undesirable from a privacy perspective, whereas
false positives (i.e., incorrectly flagging a record as risky) consti-
tutes erring on the safe side. Both metrics performed comparably
in terms of recall.
Furthermore, we repeated the same experiment under the system
model used in [42] – the datasets used to compute the scores did not
overlap with the training data of the target model – and confirmed
that both metrics still perform comparably (See Appendix B).
6.2 Effectiveness of Adding Noise
A seemingly plausible way to thwart MIAs is to add noise to (“per-
turb") data records before training the model. The rationale is that
the adversary A, who wants to check the presence of a data record
in the training data, is likely to be thwarted because the A cannot
know what perturbation was added to that record.
We divided the original training set (“No Noise”) into two subsets
of equal size: 1) a clean subset without any noise and 2) a noisy sub-
set with perturbed samples. We crafted the noise using FGSM [16],
None 64/255 128/255Noise
0.500
0.505
0.510
0.515
0.520
0.525
0.530
SPR
S Sc
ores
52
53
54
55
56
Atta
ck A
ccur
acy
CleanNoisy
(a) MNIST (SPRS)
None 64/255 128/255Noise
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
SHAP
r Sc
ores
1e 4
52
53
54
55
56
Atta
ck A
ccur
acy
CleanNoisy
(b) MNIST (SHAPr)
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
2500
5000
7500
10000
12500
15000
17500
Freq
uenc
y
(c) MNIST (SPRS)
7.5 5.0 2.5 0.0 2.5 5.0 7.5SHAPr Scores 1e 5
0
2500
5000
7500
10000
12500
15000
17500
Freq
uenc
y
(d) MNIST (SHAPr)
None 64/255 128/255Noise
0.50
0.52
0.54
0.56
0.58
0.60SP
RS
Scor
es
59
60
61
62
63
Atta
ck A
ccur
acy
CleanNoisy
(e) USPS (SPRS)
None 64/255 128/255Noise
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
SHAP
r Sc
ores
1e 3
59
60
61
62
63
Atta
ck A
ccur
acy
CleanNoisy
(f) USPS (SHAPr)
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
100
200
300
400
500
600
Freq
uenc
y
(g) USPS (SPRS)
0.50 0.25 0.00 0.25 0.50 0.75 1.00SHAPr Scores 1e 3
0
100
200
300
400
500
600
Freq
uenc
y
(h) USPS (SHAPr)
Figure 1: Effect of adding noise. The grey lines for attack
accuracy forming a “sideways-V” shape acts as the ground
truth. Ideally, the privacy risk scores (black lines) should ex-
hibit the same shape (gray lines). This is observed for SHAPr
but not for SPRS.
and tested two different amounts 𝜖 = {64/255, 128/255} (underℓ∞).
4
In the experiment, A used an auxiliary dataset D𝑎𝑢𝑥 that was
identical to the original “No Noise” dataset used to mount the attack.
We assumed this complete overlap betweenM’s and A’s data as it
corresponds to the scenario in whichM is trying to estimate the
privacy risk of their own model.
4Adding Gaussian noise led to similar behavior as with FGSM.
7
When a subset of the data was perturbed, the attack accuracy of
the clean subset increased while that for the noisy subset decreased.
This “sideways-V” shape shown with gray lines in Figure 1 for the
attack accuracy acts as the ground truth.
Due to space constraints, we present the results only for two
datasets: MNIST and USPS. Nevertheless, we observed the same
trend for other datasets (see Appendix D). Adding noise reduces
the average privacy risk score of the noisy subset (Figure 1). These
noisy samples are more difficult to learn and contribute negatively
to the model utility. Hence, their corresponding SHAPr score is
lower. The more noise we add, the lower the SHAPr score, and the
lower the attack accuracy. However, as a result, clean data points
in D𝑎𝑢𝑥 become more vulnerable to MIAs as they become more
influential to the utility of the model.
The scores of a good membership privacy risk metric should
exhibit the same “sideways-V” shape as the ground truth. We note
that unlike SPRS scores, SHAPr satisfies this condition. For SPRS,
the average score is not impacted by the added noise. Also, there is
no consistent correlation between the score and the attack accuracy.
The lack of sensitivity of SPRS to training data noise can be attrib-
uted to clustering of SPRS scores around 0.5 indicating indecisive
membership (Figure 1 (c) and (g)). SHAPr scores are fine-grained
and heterogeneous (Property P3) which make them sensitive to
noise added to training data records.
6.3 Is SPRS future-proof?
Recall that to satisfy requirement R1, we compute SHAPr scores
independently of any particular MIA. SPRS on the other hand uses
the best known attack for computing scores. In the preceding sec-
tions, we used the same state-of-the-art MIA (𝐼𝑚𝑒𝑛𝑡 ) for computing
the ground truth as well as for computing SPRS scores.
To fairly assess whether the reliance on a specific attack will
affect the ability of SPRS to correctly assess susceptibility to future
MIAs, we simulated a “future proofness scenario”. We used 𝐼𝑚𝑒𝑛𝑡 to
compute the ground truth as before (simulating “future”), but used
the original prediction entropy function to generate SPRS scores
(simulating “past”).
Table 4 summarizes the results. The "Simulated" corresponds
to a "past" where SPRS scores were computed using the original
prediction entropy attack, and the baseline is same as Table 3 where
A uses the modified prediction entropy attack as the MIA. As
mentioned previously, recall is the most important measure for a
membership privacy metric. We see that in all but two datasets, the
recall for SPRS scores drops sharply in the simulated past setting.
This indicates that SPRS scores may not be effective at assessing
susceptibility to unknown, future MIAs as SPRS relies on specific
MIAs for computing the scores.
6.4 Performance Evaluation of SHAPr
Next, we show that SHAPr scores can be computed in reasonable
time. Table 5 shows the average execution time for computing
SHAPr scores across datasets of different sizes over ten runs. We
ran the evaluation on Intel Core i9-9900K CPU @ 3.60GHz with
65.78GB memory. We use the python metric time() in time library
which returns the time in seconds (UTC) since epoch start.
Table 4: Assessing “future proofness” of SPRS. The base-
line setting is the same as the SPRS scores reported in Ta-
ble 3. The “simulated” indicates the “past” setting to com-
pute SPRS scores using the original prediction entropy func-
tion. Red indicates datasets for which the recall of SPRS
scores have a sharp drop in recall in the simulated “past”
setting.
Dataset Metric Precision Recall
SPRS Datasets
LOCATION
Baseline 0.96 ± 1e-16 0.93 ± 1e-16
Simulated 0.95 ± 1e-16 0.97 ± 1e-16
PURCHASE
Baseline 0.95 ± 1e-16 0.80 ± 0.000
Simulated 0.99 ± 1e-16 0.50 ± 1e-16
TEXAS
Baseline 0.92 ± 1e-16 0.95 ± 0.000
Simulated 0.94 ± 6e-4 0.79 ± 0.002
Additional Datasets
MNIST
Baseline 0.99 ± 0.002 0.57 ± 0.013
Simulated 0.99 ± 0.001 0.56 ± 0.028
FMNIST
Baseline 0.99 ± 0.005 0.98 ± 0.026
Simulated 1.0 ± 0.000 0.64 ± 0.035
USPS
Baseline 0.79 ± 0.201 0.76 ± 0.074
Simulated 0.86 ± 0.160 0.64 ± 0.050
FLOWER
Baseline 0.98 ± 0.010 0.81 ± 0.040
Simulated 0.99 ± 0.006 0.66 ± 0.094
MEPS
Baseline 0.96 ± 1e-16 0.99 ± 0.000
Simulated 0.94 ± 0.001 0.67 ± 6e-4
CREDIT
Baseline 0.94 ± 0.006 0.81 ± 2e-4
Simulated 0.79 ± 0.032 0.39 ± 0.038
CENSUS
Baseline 0.98 ± 0.000 1.00 ± 0.000
Simulated 0.99 ± 1e-16 0.28 ± 0.000
Table 5: Performance of SHAPr across different datasets.
Dataset # Records# FeaturesExecution Time (s)
SPRS Datasets
LOCATION 1000 446 130.77 ± 3.90
PURCHASE 19732 600 3065.58 ± 19.24
TEXAS 10000 6170 5506.79 ± 17.47
Additional Datasets
MNIST 60000 784 2747.41 ± 22.65
FMNIST 60000 784 3425.90 ± 34.03
USPS 3000 256 238.67 ± 1.74
FLOWER 1500 2048 174.27 ± 11.74
MEPS 7500 42 732.43 ± 4.95
CREDIT 15000 24 1852.66 ± 30.92
CENSUS 24000 103 3718.26 ± 18.25
Computation time for SHAPr scores ranges from ≈ 2 mins for
LOCATION dataset to ≈ 91 mins for TEXAS. Since the scores
are computed once and designed for M with access to hardware
resources, these execution times are reasonable.
Long et al.’s naïve leave-one-out scores require training |𝐷𝑡𝑟 | ad-ditionalmodels (compared to training a singlemodel for SHAPr) [30].
8
To benchmark, we used a subset of the LOCATION dataset with
100 training data records and found that SHAPr is 100x faster
than a stratight-forward leave-one-out based approach: 3640.21± 244.08s (leave-one-out) vs. 34.65 ± 1.74s (SHAPr) took only,
across five runs.5
7 VERSATILITY OF SHAPR
We now explore the versatility (Requirement R3) of SHAPr in
terms of applicability in other settings such as estimating fairness
(Section 7.1) and data valuation (Section 7.2).
7.1 Fairness
Prior work has shown that different subgroups with sensitive at-
tribute (e.g., race or gender) have disparate vulnerability toMIAs [45].
We evaluated whether SPRS and SHAPr can correctly identify this
disparity.
We used three datasets that have sensitive attributes: CENSUS,
CREDIT, and MEPS. CENSUS has two sensitive attributes, gender
and race, while CREDIT and MEPS have gender. For gender, the
majority class is “Male” and the minority class is “Female”. For rate,
“White” is the majority class and “Black” the minority class. We
computed the ground truth MIA accuracy, separately for each class,
using 𝐼𝑚𝑒𝑛𝑡 .
Figure 2 shows that subgroups are disparately vulnerable to MIA
(indicated in yellow ). SHAPr can capture this difference (shown
in red ) – the scores are higher for subgroups with higher attack
accuracy. On the other hand, SPRS scores stay the same for both
subgroups, regardless of the attack accuracy (shown in blue ).
SHAPr scores are additive (Property P2) – we can compute
the membership privacy risk over subgroups by averaging the
scores within each subgroup. However, for SPRS scores there is no
semantically meaningful notion of adding or averaging probability
scores. We conjecture that the lack of additivity property makes
SPRS makes ineffective at this task.
7.2 Data Valuation
We briefly discuss the application of SHAPr for data valuation. We
did not carry out separate experiments but refer to the extensive
prior literature on the use of Shapley values for data valuation
[15, 21–23].
Two relevant properties of Shapley values are additivity (Prop-
erty P2) which includes group rationality, where the complete util-
ity is distributed among all training data records, and heterogenity
(Property P3), which indicates equitable assignment of model util-
ity to training data records based on their influence. These make
Shapley values useful for data valuation [15, 22]. Since SHAPr uses
Shapley values, once computed, SHAPr scores can be used directly
for data valuation of both individual data records as well as groups
of data records.
SPRS was not designed to be additive P2 and hence cannot guar-
antee group rationality of scores among training data records. SPRS
scores are not heterogeneous (Property P3) either which does guar-
antee equitable assignment of privacy risk scores (as discussed in
Section 1). We discuss the lack of heterogeneity in the Appendix C,
5For larger datasets leave-one-out took unreasonably long.
SHAPr (Group 1)
SPRS (Group 1)
AttAcc (Group 1)
SHAPr (Group 2)
SPRS (Group 2)
AttAcc (Group 2)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
SHAP
r Sc
ores
1e 5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
AttA
cc /
SPR
S Sc
ores
(a) CENSUS (Race)
SHAPr (Group 1)
SPRS (Group 1)
AttAcc (Group 1)
SHAPr (Group 2)
SPRS (Group 2)
AttAcc (Group 2)
0
1
2
3
4
SHAP
r Sc
ores
1e 5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
AttA
cc /
SPR
S Sc
ores
(b) CENSUS (Gender)
SHAPr (Group 1)
SPRS (Group 1)
AttAcc (Group 1)
SHAPr (Group 2)
SPRS (Group 2)
AttAcc (Group 2)
0
1
2
3
4
5SH
APr
Scor
es
1e 5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
AttA
cc /
SPR
S Sc
ores
(c) CREDIT (Gender)
SHAPr (Group 1)
SPRS (Group 1)
AttAcc (Group 1)
SHAPr (Group 2)
SPRS (Group 2)
AttAcc (Group 2)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
SHAP
r Sc
ores
1e 4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
AttA
cc /
SPR
S Sc
ores
(d) MEPS (Gender)
Figure 2: Different subgroups are vulnerable to MIAs to a
different extent. SHAPr scores ( red ) have the same trend
as the ground truth ( yellow ). However SPRS scores ( blue )
are the same for all subgroups. “AttAcc” refers to the MIA
accuracy.
visualizing the distribution of SPRS scores (Figure 5. Given the
lack of these properties (heterogeneity, additivity, group rationality,
and equitable assignment), we argue that SPRS is unlikely to be
applicable for data valuation.
9
8 PITFALLS OF DATA REMOVAL
Having established that SHAPr is an effective metric for measuring
membership privacy risk, we now describe an example application
where suchmetric can be useful. In data valuation research, it is well-
known that removing records with high Shapley values will harm
the utility of the model, and removing records with low values will
improve it [22, 23]. Hence, it begs the question whether removal of
records with high SHAPr scores improves the membership privacy
risk of a dataset, by reducing its overall susceptibility to MIAs. To
explore this question, we removed a fraction (up to 50%) of recordswith the highest SHAPr scores. Also, we randomly removed testing
data records so as to keep the same number of member and non-
member records as in previous experiments.
0.0 0.1 0.2 0.3 0.4 0.5Fraction of Data Removed
10 2
10 1
100
SHAP
r Sc
ores
MNISTFMNISTUSPSFLOWER
MEPSCENSUSCREDIT
LOCATIONPURCHASETEXAS
Figure 3: Removing a fraction of training data records with
high SHAPr scores does not reduce the risk for the remain-
ing records.
Figure 3 summarizes the results. Removing an increasing num-
ber of records with high SHAPr scores does not necessarily reduce
the membership privacy risk for the remaining records. No consis-
tent upward (or downward) trend was visible for the scores of the
remaining records. Interestingly, depending on the number of re-
moved samples, the scores fluctuate. A possible explanation is that
once risky data records are removed, and a new model is trained
using the remaining records, their contribution to the utility of the
revised model changes, thereby changing their SHAPr scores.
Long et al. [30] observed a similar result. However, their ex-
periment was limited to a single small dataset (≈ 1000 training
data records), and minimal removal (only up to 2%). Thanks tothe superior efficiency of SHAPr, we are able to confirm that this
observation holds broadly across larger datasets (10 vs. 1) and for
more extensive data removal (up to 50% vs. 2%).
9 RELATEDWORK
Efficient Computation of Shapley Values. Estimating the influ-
ence of a training data record on model utility is useful in settings
where data providers receive compensation for sharing their data
with model builders. This is referred to as data valuation. Shapley
values for ML were recently explored in the context of data valua-
tion [14, 15, 21, 22, 25]. Naive approaches for computing Shapley
values are computationally expensive because their require retrain-
ing for each training data record. There are two proposed variants
for approximate computation of Shapley values [15]: Monte Carlo
(drawing a random permutation of the training data and averaging
the marginal contribution of training data records over multiple
permutations) and using gradients (computing marginal contribu-
tion using gradients for each data record during training). However,
these metrics are still computationally expensive. In particular,
Monte Carlo based metrics still require training multiple models.
Defending Membership Privacy. Several defences against MIAs
have been proposed. Notably, they rely on regularization [33, 38, 40,
47], low precision models [12] or output perturbation with accuracy
guarantees [20]. However, recent attacks indicated that defences
such as adversarial regularization [33] and MemGuard [20] are
ineffective [42]. Hence, an effective defence against MIAs remains
an open problem.
Estimating Membership Privacy Risk. Adversary’s advantage
was proposed as a metric for evaluating differential privacy mecha-
nisms using MIAs [46]. It was later improved by using recall instead
of attack accuracy [19]. However, both metrics compute aggregate
membership privacy risk across all data records and not for indi-
vidual records. Two metrics that are the closest to our work were
described earlier: SPRS [42] (Section 2.4) and Long et al. [30] (Sec-
tions 3.3 and 6.4).
Membership privacywas also studied in the context of generative
models [28], including for individual records. Additionally, it was
shown that Fisher information can be used to compute the influence
of attributes towards the model utility [17]. However, this is limited
to linear models with convex loss which does not apply to the neural
networks we consider. Also, this metric requires inverting a Hessian
which is computationally expensive for large models. Similarly, an
information theoretic metric [37] can be used to compute an upper
bound on the privacy risks of the PATE framework [35]. Even so, it
cannot be used as a standalone metric for record-level membership
privacy risk.
10 DISCUSSION
We now discuss alternatives to SHAPr and potential limitations.
10.1 Comparison with Influence Functions
Influence functions [25, 36] were proposed for explaining model
predictions. Since these are independent of specific membership
attacks, they could potentially be used to design an alternative,
principled (Property P1) metric for measuring membership privacy
risk. We now explore the viability of such designs.
We implemented Koh et al.’s influence function [25] which as-
signs an influence score to each individual training data record with
respect to each prediction class. We adapt it to estimate membership
privacy risk by averaging the scores for each training data record
across all classes to estimate the overall influence of that the test
data record.
We additionally implemented TracIN [36] which estimates the
influence of a training data record by computing the dot product
of the gradient of the loss for training and testing data records
computed from intermediate models saved during training. Given
|𝐷𝑡𝑟 | training data records and |𝐷𝑡𝑒 | testing data records, TracIN10
computes a score for each training data record corresponding to
each testing data record resulting in a |𝐷𝑡𝑟 | × |𝐷𝑡𝑒 | matrix. To
estimate the scores of training data records across the entire test
dataset, we averaged the values across all the testing data records
for each training data record.
Table 6: Effectiveness of blackbox influence functions
(KIFS [25]) and TracIN [36] as a metric for membership pri-
vacy risk scores. Comparing to SHAPr in Table 3, orange
indicates comparable results, red indicates poor results
and green indicates better results. Computations which
took unreasonably long time were omitted indicated by “-”.
Dataset
KIFS [25] TracIN [36]
Precision Recall Precision Recall
SPRS Datasets
LOCATION 0.92 ± 0.0050.48 ± 0.0130.96 ± 1.1e-16 0.20 ± 0.00
PURCHASE 0.94 ± 0.002 0.51 ± 0.005 - -
TEXAS 0.91 ± 0.005 0.51 ± 0.034 - -
Additional Datasets
MNIST 0.98 ± 0.015 0.30 ± 0.180 - -
FMNIST 0.81 ± 0.081 0.49 ± 0.098 - -
USPS 0.82 ± 0.257 0.33 ± 0.097 0.75 ± 0.226 0.42 ± 0.028
FLOWER 0.97 ± 0.017 0.51 ± 0.066 0.95 ± 0.029 0.46 ± 0.095
MEPS 0.95 ± 0.004 0.62 ± 0.053 0.96 ± 0.00 0.85 ± 0.00
CREDIT 0.85 ± 0.037 0.79 ± 0.030 - -
CENSUS 0.94 ± 0.012 0.72 ± 0.115 - -
For evaluation, we compute precision and recall by thresholding
Koh et al.’s influence function scores (referred to as KIFS) and TracIN
scores and comparing with attack success of modified prediction
entropy attack (𝐼𝑚𝑒𝑛𝑡 ) as the ground truth.We compare these results
in Table 6 with the results of SHAPr in Table 3. We use orange
to indicate comparable results, red to indicate poor results and
green to indicate better results compared to SHAPr.
Overall, both TracIN and KIFS have poor recall across all the
datasets compared to SHAPr scores. Both KIFS and TracIN approx-
imate the influence of training data records [25, 36]. KIFS is well
defined for convex functions but not for large neural networks with
non-convex optimization [9]. Hence, the approximation of influ-
ence scores are often erroneous. We conjecture this as a potential
reason for the lower recall for KIFS compared to SHAPr.
For TracIn, the overhead of computing per-sample influence
was high for large datasets (>7500 training data records) and all
the datasets which took more than a day of computation were
omitted indicated by “-” in Table 6. TracIN stores the model at
different epochs during training. For each of the 𝑁𝑚𝑜𝑑𝑒𝑙𝑠 models
saved during training, the dot product of the gradient loss over
each training and testing data record is computed resulting in a
complexity of O(𝑁𝑚𝑜𝑑𝑒𝑙𝑠 .|𝐷𝑡𝑟 |.|𝐷𝑡𝑒 |). The computational overhead
for KIFS is also high and in the order of O(|𝐷𝑡𝑟 |.|𝐷𝑡𝑒 |).The poor recall and high cost of KIFS and TracIN (compared to
SHAPr) imply that they are not good candidates to base member-
ship privacy risk metrics on.
10.2 Impact of Backdoors
A backdoor to a machine learning model is a set of inputs chosen
to manipulate decision boundaries of the model. Backdoors can
be used for malicious purposes such as poisoning (e.g. [10]), or to
embed watermarks that allow model owners to claim ownership of
their model in case it gets stolen [8, 43]. A backdoor is created by
changing the label of several training data records [43], by adding
artifacts to the training data records themselves (e.g. overlay text
or texture to images [48]), or by introducing out-of-distribution
data [8] to the training data. A successfully embedded backdoor is
memorised during training, along the primary task of the model.
During the verification, the model builder M queries the model
and expects matching backdoor predictions.
Backdoors have negative influence on model utility as they intro-
duce noise, and make training more difficult. Hence, their SHAPr
scores are low. However, memorization of backdoors is required
for successful verification. In other words, backdoors behave differ-
ently from other data records in the context of SHAPr: they are,
by definition, memorized but unlike other memorized data records,
they are likely to have low SHAPr scores. This is not a concern in
our setting becauseM is the entity that computes SHAPr scores. If
a backdoor is inserted intentionally by M (e.g., for watermarking),
then M will know what they are. If a backdoor was inserted mali-
ciously (e.g., by a training data provider), there is no need to provide
any guarantees regarding the SHAPr score for those records.
10.3 Comparing Privacy Risk Across Datasets
SHAPr scores can be used to compare the relative privacy risk of
individual data records within a dataset. However, they are not de-
signed to compare membership privacy risk across different datasets.This is because the scores rely on the testing dataset which varies
across different datasets. Similarly, none of the other privacy met-
rics (like SPRS [42] or Long et al. [30]) designed for individual
privacy risk scores can be used to compare the privacy risk of two
different datasets either.
11 CONCLUSIONS
As a membership privacy metric, SHAPr scores can potentially
be used for evaluating the effectiveness of defences. Several prior
works have indicated specific defences, adversarial regularization
and MemGuard, are ineffective [42] and that simple regulariza-
tion can be used to mitigate membership privacy risk [27, 47]. In
(Appendix A,we show how SHAPr scores can be used to assess
the effectiveness of a defence, L2 regularization for demonstration,
against MIAs. We leave similar evaluations of other defences as
future work.
Differential privacy (DP) is closely associated with the notion
of leave-one-out stability where the goal is to minimize the model
behaviour with and without a training data record. While this has
been used to reduce the membership privacy risk, Jia et al. [23]
show that when a model is trained using DP, the overall Shapley
values across the training data records decrease [23]. Evaluating
whether SHAPr scores can correctly assess the impact of DP on
susceptibility to MIAs is also left as future work.
11
REFERENCES
[1] Adult income census dataset. https://archive.ics.uci.edu/ml/datasets/adult. Ac-
cessed: 2021-11-27.
[2] Credit dataset. https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+
data). Accessed: 2021-11-27.
[3] Flowers dataset. https://www.kaggle.com/alxmamaev/flowers-recognition. Ac-
cessed: 2021-11-27.
[4] Location dataset. https://sites.google.com/site/yangdingqi/home/foursquare-
dataset. Accessed: 2021-11-27.
[5] Purchase dataset. https://www.kaggle.com/c/acquire-valued-shoppers-challenge.
Accessed: 2021-11-27.
[6] Texas dataset. https://www.dshs.texas.gov/THCIC/Hospitals/Download.shtm.
Accessed: 2021-11-27.
[7] Usps dataset. https://www.kaggle.com/bistaumanga/usps-dataset. Accessed:
2021-11-27.
[8] Adi, Y., Baum, C., Cisse, M., Pinkas, B., and Keshet, J. Turning your weakness
into a strength: Watermarking deep neural networks by backdooring. In 27thUSENIX Security Symposium (2018), pp. 1615–1631.
[9] Basu, S., Pope, P., and Feizi, S. Influence functions in deep learning are fragile,
2021.
[10] Chen, X., Liu, C., Li, B., Lu, K., and Song, D. Targeted backdoor attacks on deep
learning systems using data poisoning. arXiv preprint arXiv:1712.05526 (2017).[11] Choqette-Choo, C. A., Tramer, F., Carlini, N., and Papernot, N. Label-only
membership inference attacks. In Proceedings of the 38th International Conferenceon Machine Learning (18–24 Jul 2021), M. Meila and T. Zhang, Eds., vol. 139 of
Proceedings of Machine Learning Research, PMLR, pp. 1964–1974.
[12] Duddu, V., Boutet, A., and Shejwalkar, V. Gecko: Reconciling privacy, accu-
racy and efficiency in embedded deep learning. In arXiv 2010.00912 (2021).[13] Feldman, V. Does learning require memorization? a short tale about a long tail.
In Symposium on Theory of Computing (New York, NY, USA, 2020), STOC 2020,
Association for Computing Machinery, p. 954–959.
[14] Ghorbani, A., Kim, M., and Zou, J. A distributional framework for data valuation.
In International Conference on Machine Learning (13–18 Jul 2020), H. D. III and
A. Singh, Eds., vol. 119 of Proceedings of Machine Learning Research, PMLR,
pp. 3535–3544.
[15] Ghorbani, A., and Zou, J. Data shapley: Equitable valuation of data for ma-
chine learning. In International Conference on Machine Learning (09–15 Jun
2019), K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97 of Proceedings of MachineLearning Research, PMLR, pp. 2242–2251.
[16] Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing
adversarial examples. In arXiv 1412.6572 (2015).[17] Hannun, A., Guo, C., and van der Maaten, L. Measuring data leakage in
machine-learning models with fisher information. In arXiv 2102.11673 (2021).[18] House, W. Guidance for regulation of artificial intelligence applications. In
Memorandum For The Heads Of Executive Departments And Agencies (2020).[19] Jayaraman, B., Wang, L., Evans, D. E., and Gu, Q. Revisiting membership infer-
ence under realistic assumptions. Proceedings on Privacy Enhancing Technologies2021 (2021), 348 – 368.
[20] Jia, J., Salem, A., Backes, M., Zhang, Y., and Gong, N. Z. Memguard: Defending
against black-box membership inference attacks via adversarial examples. In
Conference on Computer and Communications Security (New York, NY, USA, 2019),
CCS ’19, Association for Computing Machinery, p. 259–274.
[21] Jia, R., Dao, D., Wang, B., Hubis, F. A., Gurel, N. M., Li, B., Zhang, C., Spanos,
C., and Song, D. Efficient task-specific data valuation for nearest neighbor
algorithms. Proc. VLDB Endow. 12, 11 (July 2019), 1610–1623.
[22] Jia, R., Dao, D., Wang, B., Hubis, F. A., Hynes, N., Gürel, N. M., Li, B., Zhang,
C., Song, D., and Spanos, C. J. Towards efficient data valuation based on the
shapley value. In International Conference on Artificial Intelligence and Statistics(16–18 Apr 2019), K. Chaudhuri and M. Sugiyama, Eds., vol. 89 of Proceedings ofMachine Learning Research, PMLR, pp. 1167–1176.
[23] Jia, R., Wu, F., Sun, X., Xu, J., Dao, D., Kailkhura, B., Zhang, C., Li, B., and
Song, D. Scalability vs. utility: Do we have to sacrifice one for the other in
data importance quantification? In Conference on Computer Vision and PatternRecognition (2021).
[24] Kazim, E., Denny, D. M. T., and Koshiyama, A. AI auditing and impact assess-
ment: according to the uk information commissioner’s office. AI and Ethics (Feb2021).
[25] Koh, P. W., and Liang, P. Understanding black-box predictions via influence
functions. In International Conference on Machine Learning (06–11 Aug 2017),
D. Precup and Y. W. Teh, Eds., vol. 70 of Proceedings of Machine Learning Research,PMLR, pp. 1885–1894.
[26] Li, Z., and Zhang, Y. Membership leakage in label-only exposures, 2021.
[27] Liu, J., Oya, S., and Kerschbaum, F. Generalization techniques empirically
outperform differential privacy against membership inference, 2021.
[28] Liu, X., Xu, Y., Tople, S., Mukherjee, S., and Ferres, J. L. Mace: A flexible
framework for membership privacy estimation in generative models. In arXiv2009.05683 (2021).
[29] Liu, Y., Wen, R., He, X., Salem, A., Zhang, Z., Backes, M., Cristofaro, E. D.,
Fritz, M., and Zhang, Y. Ml-doctor: Holistic risk assessment of inference attacks
against machine learning models. In arXiv 2102.02551 (2021).[30] Long, Y., Bindschaedler, V., and Gunter, C. A. Towards measuring member-
ship privacy. In arXiv 1712.09136 (2017).[31] Lundberg, S. M., and Lee, S.-I. A unified approach to interpreting model pre-
dictions. In Proceedings of the 31st international conference on neural informationprocessing systems (2017), pp. 4768–4777.
[32] Murakonda, S. K., and Shokri, R. ML privacy meter: Aiding regulatory compli-
ance by quantifying the privacy risks of machine learning. InWorkshop on HotTopics in Privacy Enhancing Technologies (HotPETs) (2020).
[33] Nasr, M., Shokri, R., and Houmansadr, A. Machine learning with member-
ship privacy using adversarial regularization. In Conference on Computer andCommunications Security (New York, NY, USA, 2018), CCS ’18, Association for
Computing Machinery, p. 634–646.
[34] Nasr, M., Shokri, R., and Houmansadr, A. Comprehensive privacy analysis of
deep learning: Passive and active white-box inference attacks against centralized
and federated learning. In IEEE Symposium on Security and Privacy (SP) (2019),pp. 739–753.
[35] Papernot, N., Abadi, M., Úlfar Erlingsson, Goodfellow, I., and Talwar, K.
Semi-supervised knowledge transfer for deep learning from private training data.
In Proceedings of the International Conference on Learning Representations (2017).[36] Pruthi, G., Liu, F., Kale, S., and Sundararajan, M. Estimating training data
influence by tracing gradient descent. In Advances in Neural Information Pro-cessing Systems (2020), H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and
H. Lin, Eds., vol. 33, Curran Associates, Inc., pp. 19920–19930.
[37] Saeidian, S., Cervia, G., Oechtering, T. J., and Skoglund, M. Quantifying
membership privacy via information leakage. In arXiv 2010.05965 (2020).[38] Salem, A., Zhang, Y., Humbert, M., Berrang, P., Fritz, M., and Backes, M. ML-
leaks: Model and data independent membership inference attacks and defenses
on machine learning models. In Network and Distributed Systems Security (2018).
[39] Shapley, L. S. 17. A Value for n-Person Games. Princeton University Press, 2016,
pp. 307–318.
[40] Shokri, R., Stronati, M., Song, C., and Shmatikov, V. Membership inference
attacks against machine learning models. In IEEE Symposium on Security andPrivacy (SP) (2017), pp. 3–18.
[41] Song, C., Ristenpart, T., and Shmatikov, V. Machine learning models that
remember too much. In Proceedings of the 2017 ACM SIGSAC Conference onComputer and Communications Security (New York, NY, USA, 2017), CCS ’17,
Association for Computing Machinery, p. 587–601.
[42] Song, L., and Mittal, P. Systematic evaluation of privacy risks of machine
learning models. In 30th USENIX Security Symposium (USENIX Security 21) (Aug.2021), USENIX Association.
[43] Szyller, S., Atli, B. G., Marchal, S., and Asokan, N. DAWN: dynamic adver-
sarial watermarking of neural networks. CoRR (2019).
[44] Tabassi, E., Burns, K. J., Hadjimichael, M., Molina-Markham, A., and Sex-
ton, J. A taxonomy and terminology of adversarial machine learning. In NISTInteragency/Internal Report (2019).
[45] Yaghini, M., Kulynych, B., Cherubin, G., and Troncoso, C. Disparate vulner-
ability: On the unfairness of privacy attacks against machine learning. arXivpreprint arXiv:1906.00389 (2019).
[46] Yeom, S., Giacomelli, I., Fredrikson, M., and Jha, S. Privacy risk in machine
learning: Analyzing the connection to overfitting. In IEEE 31st Computer SecurityFoundations Symposium (CSF) (2018), pp. 268–282.
[47] Ying, Z., Zhang, Y., and Liu, X. Privacy-preserving in defending against mem-
bership inference attacks. In Workshop on Privacy-Preserving Machine Learningin Practice (New York, NY, USA, 2020), PPMLP’20, Association for Computing
Machinery, p. 61–63.
[48] Zhang, J., Gu, Z., Jang, J., Wu, H., Stoecklin, M. P., Huang, H., and Molloy, I.
Protecting intellectual property of deep neural networks with watermarking. In
ACM Symposium on Information, Computer and Communications Security (2018),
pp. 159–172.
A EVALUATING EFFECTIVENESS OF
DEFENCES
We evaluate whether SHAPr can be used to verify the effective-
ness of defences against MIAs. Specifically, the average SHAPr
scores across all training data records should decrease on applying
a defence to indicate the decrease in the empirical MIA accuracy.
Prior defences such as adversarial regularization [33] and Mem-
Guard [20] have been shown to be ineffective against MIAs [42].
Hence, we focus on L2 regularization previously shown as a valid
12
defence against MIAs [47]. As in Song and Mittal [42], we con-
sider the three datasets vulnerable to MIAs to evaluate SHAPr:
LOCATION, PURCHASE and TEXAS.
SHAPr (NoReg)
SHAPr (Reg)
AttAcc (NoReg)
AttAcc (Reg)
ModelAcc (NoReg)
ModelAcc (Reg)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
SHAP
r Sc
ores
1e 4
0
10
20
30
40
50
60
70
80
Accu
racy
Random Guess
(a) LOCATION
SHAPr (NoReg)
SHAPr (Reg)
AttAcc (NoReg)
AttAcc (Reg)
ModelAcc (NoReg)
ModelAcc (Reg)
0.0
0.5
1.0
1.5
2.0
SHAP
r Sc
ores
1e 5
0
10
20
30
40
50
60
70
80
Accu
racy
Random Guess
(b) PURCHASE
SHAPr (NoReg)
SHAPr (Reg)
AttAcc (NoReg)
AttAcc (Reg)
ModelAcc (NoReg)
ModelAcc (Reg)
0.0
0.5
1.0
1.5
2.0
2.5
SHAP
r Sc
ores
1e 5
0
10
20
30
40
50
60
70
80
Accu
racy
Random Guess
(c) TEXAS
Figure 4: Effectiveness of L2 regularization. L2 regulariza-
tion lowers the susceptibility to MIAs which is captured by
SHAPr. The gain comes at a significant cost to model accu-
racy. “SHAPr” on x-axis indicates the average SHAPr scores
with values on the left y-axis. “AttAcc” refers to the MIA
accuracy and “ModelAcc” refers to the model utility on the
testing dataset, both with values on right Y-axis.
We choose the regularization hyperparameter for highest MIA
resistance (attack accuracy close to random guess 50%). In Figure 4,
“AttAcc” refers to the empirical MIA accuracy with a random guess
of 50%, “ModelAcc” refers to the model utility on unseen test data
while “Scores” indicate the average SHAPr scores across all data
records. The evaluation is done with and without regularization
indicated as “Reg” and “NoReg” respectively (Figure 4).
We can see that the model utility degradation significant for
random guess attack accuracy resulting in a poor privacy-utility
tradeoff (shown previously [33]). However, average SHAPr scores
across all data records decreases with regularization which corre-
sponds to a decrease in the empirical MIA accuracy. Hence, SHAPr
can measure the effectiveness of defences against MIAs.
B EVALUATING ADVERSARY’S
PERSPECTIVE
We additionally compare SHAPr and SPRS in the adversary model
used by Song and Mittal, i.e., there is no overlap between the target
model’s training data and the A’s auxiliary data. This is the practi-
cal setting from the adversary A’s perspective for which SPRS was
specifically designed.
The results are shown in Table 7 where orange indicates com-
parable results (same mean and small standard deviation or p-
value>0.05), red indicates that SPRS significantly performs better
and green indicates that SHAPr is significantly better. We use the
p-value student t-test over 10 runs to indicate statistical significance
of the results.
The results for SHAPr is still comparable to SPRS across all ten
datasets: both SPRS and SHAPr perform better for two datasets in
precision and five datasets in recall each (Table 2).
Table 7: SHAPr is comparable to SPRS (evaluated in the ad-
versary model of [42], representing the adversary’s perspec-
tive with no overlap between D𝑎𝑢𝑥 and target model’s train
data). Orange indicates comparable results, red indicates
SPRS outperforms SHAPr and green SHAPr outperforms
SPRS.
Dataset Approach Precision p-value Recall p-value
SPRS Datasets
LOCATION
SPRS 0.97 ± 1e-16
>0.050.95 ± 0.000
<0.01SHAPr 0.97 ± 1e-16 0.87 ± 1e-16
PURCHASE
SPRS 0.98 ± 1e-16
<0.010.83 ± 1e-16
<0.01SHAPr 0.98 ± 0.000 0.81 ± 0.000
TEXAS
SPRS 0.99 ± 0.000
>0.050.93 ± 1e-16
<0.01SHAPr 0.99 ± 0.000 0.70 ± 1e-16
Additional Datasets
MNIST
SPRS 0.95 ± 0.001
<0.010.53 ± 0.043
<0.01SHAPr 0.98 ± 0.002 0.96 ± 0.001
FMNIST
SPRS 0.99 ± 0.004
0.021
0.84 ± 0.061
0.015
SHAPr 0.99 ± 0.008 0.89 ± 0.007
USPS
SPRS 0.99 ± 0.003
0.99
0.70 ± 0.014
<0.01SHAPr 0.99 ± 0.002 0.97 ± 0.001
FLOWER
SPRS 0.97 ± 0.021
0.68
0.78 ± 0.108
<0.01SHAPr 0.97 ± 0.022 0.95 ± 0.008
MEPS
SPRS 0.97 ± 1e-16
<0.010.97 ± 1e-16
<0.01SHAPr 0.98 ± 1e-16 0.89 ± 0.000
CREDIT
SPRS 0.91 ± 0.037
<0.010.87 ± 0.064
0.04
SHAPr 0.84 ± 0.029 0.92 ± 0.015
CENSUS
SPRS 0.90 ± 0.000
<0.051.00 ± 0.000
<0.05SHAPr 0.87 ± 0.000 0.86 ± 0.000
13
C COMPARING DISTRIBUTIONS
We visually compare SHAPr with SPRS by plotting the distribution
of SHAPr (in green) and for SPRS (in blue) shown in Figure 5.
For several datasets, we observe that SPRS is centered at 0.5
indicating that the membership likelihood for a large number of
training data records is inconclusive. Further, we note that the dis-
tribution of SPRS scores is not evenly distributed, with some values
correspond to several records while neighboring values correspond
to none. We conjecture that this is due to the assigning some prior
probabilities (Section 2.4) and estimating the conditional probabil-
ities using shadow models optimized to give the same output for
multiple similar data records. Compared to SPRS, SHAPr follows a
more even distribution (due to the heterogenity property P3).
D EFFECTIVENESS OF ADDING NOISE
To supplement the results in Section D, we now provide the results
for other datasets in the context of noise addition. As before, SPRS
scores do not follow the pattern exhibited by the ground truth
(attack accuracy) plots (Figure 6) while SHAPr scores match their
“sideways-V” shapes.
14
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
500
1000
1500
2000
Freq
uenc
y
(a) CREDIT (SPRS)
1.25 1.00 0.75 0.50 0.25 0.00 0.25SHAPr Scores 1e 3
0
500
1000
1500
2000
Freq
uenc
y(b) CREDIT (SHAPr)
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
5000
10000
15000
20000
25000
Freq
uenc
y
(c) FMNIST (SPRS)
2 1 0 1SHAPr Scores 1e 4
0
5000
10000
15000
20000
25000
Freq
uenc
y
(d) FMNIST (SHAPr)
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
500
1000
1500
2000
2500
Freq
uenc
y
(e) MEPS (SPRS)
1.5 1.0 0.5 0.0SHAPr Scores 1e 3
0
500
1000
1500
2000
2500
Freq
uenc
y
(f) MEPS (SHAPr)
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
50
100
150
200
250
300
Freq
uenc
y(g) FLOWER (SPRS)
2 1 0 1 2 3SHAPr Scores 1e 3
0
50
100
150
200
250
300
Freq
uenc
y
(h) FLOWER (SHAPr)
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
20
40
60
80
100
120
140
160
Freq
uenc
y
(i) LOCATION (SPRS)
2 1 0 1 2SHAPr Scores 1e 3
0
20
40
60
80
100
120
140
160
Freq
uenc
y
(j) LOCATION (SHAPr)
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
1000
2000
3000
4000
5000
6000
7000
8000
Freq
uenc
y
(k) CENSUS (SPRS)
4 3 2 1 0 1SHAPr Scores 1e 4
0
1000
2000
3000
4000
5000
6000
7000
8000
Freq
uenc
y(l) CENSUS (SHAPr)
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
500
1000
1500
2000
2500
3000
3500
Freq
uenc
y
(m) PURCHASE (SPRS)
2 1 0 1 2SHAPr Scores 1e 4
0
500
1000
1500
2000
2500
3000
3500
Freq
uenc
y
(n) PURCHASE (SHAPr)
0.0 0.2 0.4 0.6 0.8 1.0SPRS Scores
0
500
1000
1500
2000
2500
3000
Freq
uenc
y
(o) TEXAS (SPRS)
6 4 2 0 2 4SHAPr Scores 1e 4
0
500
1000
1500
2000
2500
3000
Freq
uenc
y
(p) TEXAS (SHAPr)
Figure 5: Distribution of membership privacy risk scores for different datasets. SHAPr (green) equitably assigns privacy risk
to data records based on model’s memorization. SPRS (blue) have the same scores for large numbers of data records and are
centered around 0.5 resulting in inconclusive membership privacy risk estimates for those training data records.
15
None 64/255 128/255Noise
0.52
0.53
0.54
0.55
0.56
SPR
S Sc
ores
54
56
58
60
62
Atta
ck A
ccur
acy
CleanNoisy
(a) FMNIST (SPRS)
None 64/255 128/255Noise
0.2
0.4
0.6
0.8
1.0
1.2
1.4
SHAP
r Sc
ores
1e 4
54
56
58
60
62
Atta
ck A
ccur
acy
CleanNoisy
(b) FMNIST (SHAPr)
None 64/255 128/255Noise
0.5200
0.5225
0.5250
0.5275
0.5300
0.5325
0.5350
0.5375
0.5400
SPR
S Sc
ores
56
58
60
62
Atta
ck A
ccur
acy
CleanNoisy
(c) CREDIT (SPRS)
None 64/255 128/255Noise
6.50
6.75
7.00
7.25
7.50
7.75
8.00
8.25
SHAP
r Sc
ores
1e 4
56
58
60
62
Atta
ck A
ccur
acy
CleanNoisy
(d) CREDIT (SHAPr)
None 64/255 128/255Noise
0.560
0.565
0.570
0.575
0.580
0.585
0.590
SPR
S Sc
ores
56
57
58
59
60
61
62
63
64
Atta
ck A
ccur
acy
CleanNoisy
(e) MEPS (SPRS)
None 64/255 128/255Noise
7.5
7.6
7.7
7.8
7.9
8.0
8.1
8.2
SHAP
r Sc
ores
1e 4
56
57
58
59
60
61
62
63
64
Atta
ck A
ccur
acy
CleanNoisy
(f) MEPS (SHAPr)
None 64/255 128/255Noise
0.54
0.55
0.56
0.57
0.58
0.59
0.60
0.61
0.62
SPR
S Sc
ores
52
54
56
58
60
62
64
66
Atta
ck A
ccur
acy
CleanNoisy
(g) FLOWER (SPRS)
None 64/255 128/255Noise
1
2
3
4
5
6
7
8
9
SHAP
r Sc
ores
1e 4
52
54
56
58
60
62
64
66
Atta
ck A
ccur
acy
CleanNoisy
(h) FLOWER (SHAPr)
None 64/255 128/255Noise
0.775
0.800
0.825
0.850
0.875
0.900
0.925
0.950
SPR
S Sc
ores
55
60
65
70
75
80
85
90
Atta
ck A
ccur
acy
CleanNoisy
(i) LOCATION (SPRS)
None 64/255 128/255Noise
0.0
0.5
1.0
1.5
2.0
2.5
SHAP
r Sc
ores
1e 4
55
60
65
70
75
80
85
90
Atta
ck A
ccur
acy
CleanNoisy
(j) LOCATION (SHAPr)
None 64/255 128/255Noise
0.5200
0.5225
0.5250
0.5275
0.5300
0.5325
0.5350
0.5375
0.5400
SPR
S Sc
ores
56
58
60
62
Atta
ck A
ccur
acy
CleanNoisy
(k) CENSUS (SPRS)
None 64/255 128/255Noise
2.6
2.8
3.0
3.2
3.4
3.6
3.8
4.0
SHAP
r Sc
ores
1e 5
51
52
53
54
55
56
Atta
ck A
ccur
acy
CleanNoisy
(l) CENSUS (SHAPr)
None 64/255 128/255Noise
0.58
0.60
0.62
0.64
0.66
0.68
0.70
0.72
SPR
S Sc
ores
55
60
65
70
75
80
Atta
ck A
ccur
acy
CleanNoisy
(m) PURCHASE (SPRS)
None 64/255 128/255Noise
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
SHAP
r Sc
ores
1e 5
55
60
65
70
75
80
Atta
ck A
ccur
acy
CleanNoisy
(n) PURCHASE (SHAPr)
None 64/255 128/255Noise
0.68
0.69
0.70
0.71
0.72
0.73
0.74
0.75
0.76
SPR
S Sc
ores
55
60
65
70
75
80
85
Atta
ck A
ccur
acy
CleanNoisy
(o) TEXAS (SPRS)
None 64/255 128/255Noise
0
1
2
3
4
5
SHAP
r Sc
ores
1e 5
55
60
65
70
75
80
85
Atta
ck A
ccur
acy
CleanNoisy
(p) TEXAS (SHAPr)
Figure 6: Impact of adding noise to training data records onmembership privacy.We expect the black lines (privacy risk scores)
to follow “sideways-V” shape to match the gray lines (attack accuracy). We observe this to be true for SHAPr but not SPRS.
16
Recommended