11
Towards Face Encryption by Generating Adversarial Identity Masks Xiao Yang 1 , Yinpeng Dong 1,3 , Tianyu Pang 1 , Hang Su 1,2 * , Jun Zhu 1,2,3 , Yuefeng Chen 4 , Hui Xue 4 1 Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center 1 THBI Lab, Tsinghua University, Beijing, 100084, China 2 Pazhou Lab, Guangzhou, 510330, China 3 RealAI 4 Alibaba Group {yangxiao19, dyp17, pty17}@mails.tsinghua.edu.cn {suhangss, dcszj}@mail.tsinghua.edu.cn {yuefeng.chenyf, hui.xueh}@alibaba-inc.com Abstract As billions of personal data being shared through so- cial media and network, the data privacy and security have drawn an increasing attention. Several attempts have been made to alleviate the leakage of identity information from face photos, with the aid of, e.g., image obfuscation tech- niques. However, most of the present results are either per- ceptually unsatisfactory or ineffective against face recogni- tion systems. Our goal in this paper is to develop a tech- nique that can encrypt the personal photos such that they can protect users from unauthorized face recognition sys- tems but remain visually identical to the original version for human beings. To achieve this, we propose a targeted identity-protection iterative method (TIP-IM) to generate adversarial identity masks which can be overlaid on facial images, such that the original identities can be concealed without sacrificing the visual quality. Extensive experiments demonstrate that TIP-IM provides 95%+ protection success rate against various state-of-the-art face recognition mod- els under practical test scenarios. Besides, we also show the practical and effective applicability of our method on a commercial API service. 1. Introduction The blooming development of social media and network has brought a huge amount of personal data (e.g., photos) shared publicly. With the growing ubiquity of deep neural networks, these techniques dramatically improve the capa- bilities for the face recognition systems to deal with per- sonal data [6, 26, 37, 46], but as a byproduct, also increase the potential risks for privacy leakage of personal informa- tion. For example, an unauthorized third party may scrabble and identify the shared photos on social media (e.g., Twit- ter, Facebook, LinkedIn, etc.) without the permission of * Corresponding author Real face Protected face Adversarial Identity Mask and look the same to me. Unauthorized Face Recognition SystemUsers in Target set But Figure 1. An illustrative example of targeted identity protection. When users share a photo x r on social media (e.g., Twitter, Face- book, etc.), unauthorized applications could scrabble this identity y0 based on face recognition systems, resulting in the privacy leak- age of personal information. Thus we provide an effective identity mask tool to generate a protected image x p , which can conceal the corresponding identity by misleading the malicious systems to predict it as a wrong target identity yt in an authorized or virtual target set, which can be provided by the service providers. their owners, resulting in cybercasing [23]. Therefore, it is imperative to provide users an effective way to protect their private information from being unconsciously identified and exposed by the excessive unauthorized systems, without af- fecting users’ experience. The past years have witnessed the progress for face en- cryption in both the security and computer vision commu- nities. Among the existing techniques, obfuscation-based methods are widely studied. Conventional obfuscation tech- niques [48], such as blurring, pixelation, darkening, and oc- clusion, are maybe either perceptually satisfactory or effec- tive against recognition systems [28, 31, 35]. The recent ad- vance in generative adversarial networks (GANs) [15] pro- vides an appealing way to generate more realistic images for obfuscation [14, 43, 44, 49, 25]. However, the resultant ob- fuscated images have significantly different visual appear- ances compared with the original images due to the exag- geration and suppression of some discriminative features, and occasionally generate unnatural output images with un- 3897

Towards Face Encryption by Generating Adversarial Identity

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Towards Face Encryption by Generating Adversarial Identity Masks

Xiao Yang1, Yinpeng Dong1,3, Tianyu Pang1, Hang Su1,2*, Jun Zhu1,2,3, Yuefeng Chen4, Hui Xue41 Dept. of Comp. Sci. and Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center

1 THBI Lab, Tsinghua University, Beijing, 100084, China2 Pazhou Lab, Guangzhou, 510330, China 3 RealAI 4 Alibaba Group

{yangxiao19, dyp17, pty17}@mails.tsinghua.edu.cn{suhangss, dcszj}@mail.tsinghua.edu.cn {yuefeng.chenyf, hui.xueh}@alibaba-inc.com

Abstract

As billions of personal data being shared through so-cial media and network, the data privacy and security havedrawn an increasing attention. Several attempts have beenmade to alleviate the leakage of identity information fromface photos, with the aid of, e.g., image obfuscation tech-niques. However, most of the present results are either per-ceptually unsatisfactory or ineffective against face recogni-tion systems. Our goal in this paper is to develop a tech-nique that can encrypt the personal photos such that theycan protect users from unauthorized face recognition sys-tems but remain visually identical to the original versionfor human beings. To achieve this, we propose a targetedidentity-protection iterative method (TIP-IM) to generateadversarial identity masks which can be overlaid on facialimages, such that the original identities can be concealedwithout sacrificing the visual quality. Extensive experimentsdemonstrate that TIP-IM provides 95%+ protection successrate against various state-of-the-art face recognition mod-els under practical test scenarios. Besides, we also showthe practical and effective applicability of our method on acommercial API service.

1. IntroductionThe blooming development of social media and network

has brought a huge amount of personal data (e.g., photos)shared publicly. With the growing ubiquity of deep neuralnetworks, these techniques dramatically improve the capa-bilities for the face recognition systems to deal with per-sonal data [6, 26, 37, 46], but as a byproduct, also increasethe potential risks for privacy leakage of personal informa-tion. For example, an unauthorized third party may scrabbleand identify the shared photos on social media (e.g., Twit-ter, Facebook, LinkedIn, etc.) without the permission of

*Corresponding author

Real face Protected faceAdversarial Identity Mask

and look the same to me.

Unauthorized Face Recognition System:

Users

in Target setButFigure 1. An illustrative example of targeted identity protection.When users share a photo xr on social media (e.g., Twitter, Face-book, etc.), unauthorized applications could scrabble this identityy0 based on face recognition systems, resulting in the privacy leak-age of personal information. Thus we provide an effective identitymask tool to generate a protected image xp, which can concealthe corresponding identity by misleading the malicious systems topredict it as a wrong target identity yt in an authorized or virtualtarget set, which can be provided by the service providers.

their owners, resulting in cybercasing [23]. Therefore, it isimperative to provide users an effective way to protect theirprivate information from being unconsciously identified andexposed by the excessive unauthorized systems, without af-fecting users’ experience.

The past years have witnessed the progress for face en-cryption in both the security and computer vision commu-nities. Among the existing techniques, obfuscation-basedmethods are widely studied. Conventional obfuscation tech-niques [48], such as blurring, pixelation, darkening, and oc-clusion, are maybe either perceptually satisfactory or effec-tive against recognition systems [28, 31, 35]. The recent ad-vance in generative adversarial networks (GANs) [15] pro-vides an appealing way to generate more realistic images forobfuscation [14, 43, 44, 49, 25]. However, the resultant ob-fuscated images have significantly different visual appear-ances compared with the original images due to the exag-geration and suppression of some discriminative features,and occasionally generate unnatural output images with un-

3897

desirable artifacts [44].Recent researches have found that adversarial examples

can evade the recognition of a FR system [52, 45, 16, 40]by overlaying adversarial perturbations on the original im-ages [1]. It becomes an appealing way to apply an adversar-ial perturbation to conceal one’s identity, even under a morestrict constraint of impersonating some authorized or gen-erated face images when available (e.g., given by the socialmedia services). It provides a possible solution to specifythe output, which may avoid an invasion of privacy to otherpersons if the resultant image is recognized as an arbitraryidentity 1. It should nevertheless be noted that although theadversarial perturbations generated by the existing methods(e.g., PGD [22] and MIM [8]) have a small intensity change(e.g., 12 or 16 for each pixel in [0, 255]), they may still sac-rifice the visual quality for human perception due to the arti-facts as illustrated in Fig. 2, and similar observation is alsoelaborately presented in [54, 38] that ℓp-norm adversarialperturbations can not fit human perception well. Moreover,the current adversarial attacks are mainly dependent on ei-ther the white-box control of the target system [40, 32] orthe tremendous number of model queries [10], which areimpractical in real-world scenarios (e.g., unauthorized facerecognition systems on social media) for identity protection.

In this paper, we involve some valuable considerationsfrom a general user’s perspective and propose to alleviatethe identity leakage of personal photos in real-world socialmedia. We focus on face identification in particular, a typ-ical sub-task in face recognition, the goal of which is toidentify a real face image in an unknown gallery identityset (see Sec. 3), since it can be adopted by unauthorized ap-plications for recognizing the identity information of users.As stated in Fig. 1, face encryption is to block the abilityof automatic inference on malicious applications, makingthem predict a wrong authorized or virtual target by the ser-vice providers. In general, little is known about the facerecognition system and no direct query access is possible.Therefore, we need to generate adversarial masks againsta surrogate known model with the purpose of deceiving ablack-box face recognition system. Moreover, we try to notaffect the user experience when users share the protectedphotos on social media, and simultaneously conceal theiridentities from unauthorized recognition systems. Thus, theprotected images should also be visually natural from thecorresponding original ones, otherwise it may introduce un-desirable artifacts as a result.

To address the aforementioned challenges, we proposea targeted identity-protection iterative method (TIP-IM)for face encryption against black-box face recognition sys-tems. The proposed method generates adversarial identitymasks that are both transferable and imperceptible. A good

1The practical FR system will obtain the similarity rankings from thecandidate image library, which could be mistaken for someone else.

transferability implies that a model can effectively deceiveother black-box face recognition systems, meanwhile theimperceptibility means that a photo manipulated by an ad-versarial identity mask is visually natural for the human ob-servers. Specifically, to ensure the generated images arenot arbitrarily misclassified as other identities, we randomlychoose a set of face images from a dataset collected fromthe internet as the specified targets in our experiments2. Ourmethod obtains superior performance against white-box andblack-box face systems with multiple target identities via anovel iterative optimization algorithm.

Extensive experiments under practical and challengingopen-set 3 test scenarios [26] demonstrate that our algo-rithm provides 95+% protection success rate against white-box face systems, and outperforms previous methods by amargin even against various state-of-the-art algorithms. Be-sides, we also demonstrate its effectiveness in a real-worldexperiment by considering a commercial API service. Ourmain contributions are summarized as

• We involve some valuable considerations to protectprivacy against unauthorized identification systemsfrom the user’s perspective, including targeted protec-tion, natural outputs, black-box face systems, and un-known gallery set.

• We propose a targeted identity-protection iterativemethod (TIP-IM) to generate an adversarial identitymask, in which we consider multi-target sets and in-troduce a novel optimization mechanism to guaranteeeffectiveness under various scenarios.

2. Related WorkIn this section, we review related work on face encryp-

tion. Typical information encryption [27, 41, 24] requiresto encode message to protect information from exposedby unauthorized parties, whereas face encryption aims toprotect users’ facial information from being unconsciouslyidentified and exposed by the unauthorized AI recognitionsystems. We provide a comprehensive comparison betweenthe previous methods and ours in Tab. 1.

Obfuscation-based methods. Several works have beendeveloped to protect private identity information in personalphotos against face or person recognition systems. Ear-lier works [48, 34] study the performance of these systemsunder various simple image obfuscation methods, such asblurring, pixelation, darkening, occlusion, etc. These meth-ods have been shown to be ineffective against the current

2We choose face images from the Internet only for experimental il-lustration, which simulates the authorized or generated target set of faceimages provided by the service providers.

3The testing identities are disjoint from the training sets, which is re-garded as the opposite of close-set classification task.

3898

Evasion [48] PP-GAN [49] Inpainting [43] Replacement [44] Eyeglasses [40] Evolutionary [10] LOTS [36] GAMAN [32] OursUnknown gallery set No No No No No Yes No No Yes

Target identity No No No No Yes Yes Yes Yes YesBlack-box model Yes Yes Yes Yes No Yes (Queries) No No YesNatural outputs No Yes Yes Yes No No Yes No Yes

Same faces Partially Partially No No Yes Yes Yes Yes Yes

Table 1. A comparison among different methods w.r.t the unknown gallery set, targeted misclassification of the output faces, black-boxface models, natural outputs, and whether the output faces are recognized as the same identities as the original ones for human observers.

recognition systems [31, 28, 25], since they can adapt to theobfuscation patterns. More sophisticated techniques havebeen proposed thereafter. For example, generative adver-sarial networks (GANs) [15] provide a useful way to syn-thesize realistic images on the data distribution for imageobfuscation [49]. In [43], the obfuscated images are gen-erated by head in-painting conditioned on the detected facelandmarks. However, these image obfuscation methods of-ten change the visual appearances of face images and evenlead to unnatural outputs, limiting their utility for users.

Adversarial methods. Deep neural networks are sus-ceptible to adversarial examples [45, 16, 7, 33], so are theface recognition models [40, 10, 51]. Fawkes [39] foolsunauthorized facial recognition models by introducing ad-versarial examples into training data. A recent work [32]proposes to craft protected images from a game theory per-spective. However, our work is different from their previousworks in three aspects. First, we focus on the unknown facesystems without changing training data [39], while [32] as-sumes the white-box access to the target model. Second,we consider the open-set face identification protocol withan unknown gallery set rather than a closed-set classifica-tion scenario. Ours can provide better protection successrate against unknown recognition systems on more practi-cal open-set scenarios. Third, we have the ability to controlthe naturalness of the protected images under the ℓp norm.

Differential privacy. As one of the popular definitionsof privacy, differential privacy (DP) [11, 12] has been intro-duced in the context of machine learning and data statistics,which requires that the returned information about an un-derlying dataset is robust to any change of one individual,thus protecting the privacy of entities. Along this routine,many promising DP techniques [12, 29] and practical ap-plications [3, 2] on DP have been developed. While DPwithholds the existence of entities in a dataset, in this pa-per we focus on concealing the identity of a single one byexploiting the vulnerability of neural networks [45].

3. Adversarial Identity MaskLet f(x) : X → Rd denote a face recognition model

that extracts a fixed length feature representation in Rd foran input face image x ∈ X ⊂ Rn. Given the metricDf (x1,x2) = ∥f(x1)− f(x2)∥22 that measures the featuredistance between two face images, face recognition com-pares the distance between a probe image and a gallery setof face images G = {xg

1, ...,xgm}, and returns the identity

whose face image has the nearest feature distance with theprobe image.

In this paper, we involve some valuable considerationsfrom the user’s perspective, to protect user’s photos againstan illegal face recognition systems, as illustrated in Fig. 1.Specifically, to conceal the true identity y of a user’s imagexr, we aim to generate a protected image xp by adding anadversarial identity mask ma to xr which can be denotedby xp = xr +ma to make the face recognition system pre-dict xp as a different authorized identity or virtual identitycorresponding to a generated image. Rather than specifyinga single target identity for generating the protected image,we choose an identity set I = {y1, ..., yk}, i.e., we allowthe face recognition system to recognize the protected im-age as an arbitrary one of the target identities in I ratherthan a single one, which makes identity protection easier toachieve due to the relaxed constraints.

Formally, let Gy = {x|x ∈ G,O(x) = y} denote asubset of G containing all face images belonging to the trueidentity y of xr, with O being an oracle to give the ground-truth identity labels, and GI =

⋃1≤i≤k

Gyidenote the face

images belonging to the target identities of I in the galleryset G. To conceal the identity of xr, the protected image xp

should satisfy the constraint as

∃xt ∈ GI ,∀x ∈ Gy : Df (xp,x) > Df (x

p,xt). (1)

It ensures that the feature distance between the generatedprotected image xp and a target identity’s image xt in GIis smaller than that between xp and any image belonging tothe true identity y in Gy .

We involve more practical considerations from a generaluser’s perspective than the previous studied setting, in thefollowing three aspects.

Naturalness. To make the protected image indistin-guishable from the corresponding original one, a commonpractice is to restrict the ℓp (p = 2,∞, etc.) norm betweenthe protected and original examples, as ∥ma∥p ≤ ϵ. How-ever, the perturbation under the ℓp norm can not naturally fithuman perception well [54, 38], as also illustrated in Fig. 2.Therefore, we require that the protected image should looknatural besides the constraint of the ℓp norm bound, to makeit constrained on the data manifold of real images [42], thusachieving imperceptible for human eyes. We use an ob-jective function to promote the naturalness of the protectedimage, which will be specified in the following section.

3899

Real Image ϵ = 16ϵ = 12

PGD

MIM

Figure 2. Illustration of different perturbations under the l∞ norm.More examples are presented in Appendix D.

Unawareness of gallery set. For a real-world facerecognition system, we have no knowledge of its galleryset G, meaning that we are not able to solve Eq. (1) directly,while previous works assume the availability of the galleryset or a closed-set protocol (i.e., no gallery set). To addressthis issue, we use substitute face images for optimization. Inparticular, we collect an image set GI containing face im-ages that belong to the target identities of I as a surrogatefor GI ; and use {xr} directly instead of Gy . The rational-ity of using substitute images is that face representations ofone identity are similar, and thus the representation of a pro-tected image optimized to be similar to the substitutes canalso be close to images belonging to the same target identityin the gallery set.

Unknown face systems. In practice, we are also un-aware of the face recognition model, include its architec-ture, parameters, and gradients. Previous methods rely onthe white-box access to the target model, which are imprac-tical in real-world scenarios for identity protection. Thuswe adopt a surrogate white-box model against which theprotected images are generated, with the purpose of improv-ing the transferability of adversarial masks against unknownface systems.

In summary, our considerations are designed to simulatethe real-world scenarios with minimum assumptions of thetarget face recognition system, which is also more challeng-ing than previously studied settings.

4. MethodologyTo achieve the above requirements, we propose a tar-

geted identity-protection iterative method (TIP-IM) togenerate protected images in this section.

4.1. Problem FormulationTo generate a protected image xp that is both effective

for obfuscation against face recognition systems and visu-ally natural for human eyes, we formalize the objective oftargeted privacy-protection function as

minxt,xp

Liden(xt,xp) = Df (x

p,xt)−Df (xp,xr)

s.t. ∥xp − xr∥p ≤ ϵ,Lnat(xp) ≤ η,

(2)

where xt ∈ GI , and Liden is a relative identification lossthat enables the generated xp to increase the distance gapbetween a targeted image xt and the original image xr inthe feature space. Lnat ≤ η is a constraint condition thatmakes xp look natural. We also restrict the ℓp norm of theperturbation to be smaller than a constant ϵ such that the vi-sual appearance does not change significantly. Note that forthe unawareness of gallery set, we use the substitute faceimages GI in our objective (2); for the unknown model, wegenerate a protected image xp against a surrogate white-box model with the purpose of fooling the black-box modelbased on the transferability. Thus the requirements in theproposed targeted identity-protection function can be ful-filled by solving Eq. (2).

Although the perturbation is somewhat small due to theℓp norm constraint in Eq. (2), it can be still perceptible andnot natural for human eyes, as shown in Fig. 2 . Therefore,we add a lossLnat into our objective to explicitly encouragethe naturalness of the generated protected image. In this pa-per, we adopt the maximum mean discrepancy (MMD) [4]asLnat, because it is an effective non-parametric and differ-entiable metric capable of comparing two data distributionsand evaluating the imperceptibility of the generated images.In our case, given two sets of data Xp = {xp

1, ...,xpN} and

Xr = {xr1, ...,x

rN} comprised of N generated data and N

real images, MMD calculates their discrepancy by

MMD(Xp,Xr) =∣∣∣∣ 1N

N∑i=1

ϕ(xpi )−

1

N

N∑j=1

ϕ(xrj )∣∣∣∣2

H, (3)

where ϕ(·) maps the data to a reproducing kernel Hilbertspace (RKHS) [4]. We adopt the same ϕ(·) as in [4]. Byminimizing MMD between the samples Xp from the gen-erated distribution and the samples Xr from the real datadistribution, we can constrain Xp to lie on the manifold ofreal data distribution, meaning the protected images in Xp

will be as natural as real examples.Since MMD is a differentiable metric and defined on the

batches of images, we thus integrate MMD into Eq. (2) andrewrite our objective with a batch-based formulation4 as

minXp

L(Xp) =1

N

N∑i=1

Liden(xti,x

pi ) + γ ·MMD(Xp,Xr),

s.t. ∥xpi − xr

i ∥p ≤ ϵ,(4)

where xti ∈ GI and γ is a hyperparameter to balance these

two losses.

4.2. Targeted Identity-Protection Iterative Method

Given the overall lossL(Xp) in Eq. (4), we can thereforegenerate the batch of protected images Xp by minimizing

4In case of there is only one single image or a small number of images,we can augment the images with multiple transformations to make up alarge batch, with the results shown in Appendix B.

3900

L(Xp). Given the definitions of Liden and MMD in Eq. (2)and Eq. (3), L(Xp) is a differentiable function w.r.t. Xp,and thus we can iteratively apply fast gradient method [22]multiple times with a small step size α to generate protectedimages by minimizing the loss L(Xp). In particular, weoptimize Xp via

Xpt+1 = Π{Xr,ℓp,ϵ}

(Xp

t − α ·Normalize(∇XL(Xpt ))

), (5)

where Xpt is the batch of protected images at the t-th itera-

tion, Π is the projection function that projects the protectedimages onto the ℓp norm bound, and Normalize(·) is usedto normalize the gradient (e.g., a sign function under theℓ∞ norm bound or the ℓ2 normalization under the ℓ2 normbound). We perform the iterative process for a total numberof T iterations and get the final protected images as Xp

T . Toprevent protected images from falling into local minima andimprove their transferability for other black-box face recog-nition models, we incorporate the momentum technique [8]into the iterative process.

4.3. Search Optimal xt via Greedy Insertion

When there is only one target face image in GI , we donot need to consider how to select xt to effectively incor-porate into Eq. (4). When the target set GI contains multi-ple target images, it offers more potential optimization di-rections to get better performance. Therefore, we developan optimization algorithm to search for the optimal targetwhile generating protected images as Eq. (5). Specifically,for the iterative procedure in Eq. (5) with T iterations, weselect a representative target for each protected image in GIat each iteration for updates, which belongs to a subset se-lection problem.

Definition 1. Let St denote the set of the selected tar-gets from GI at each iteration until the t-th iteration. LetF denote a set mapping function that outputs a gain value(larger is better) in R for a set. For xt ∈ GI , we define∆(xt|St) = F (St ∪ {xt}) − F (St) be the marginal gainof F at St given xt.

Formally, as the iteration gets increasing in the iterationloop, if the marginal gain decreases monotonically, then Fwill belong to the family of submodular functions [55]. Fora submodular problem, a greedy algorithm can be used tofind an approximate solution, and it has been shown thatsubmodularity will have a (1−1/e)-approximation [30] formonotinely submodular functions. Although our iterativeidentity-protection method is not guaranteed to be strictlysubmodular, the solution based on greedy insertion stillplays an obvious role even if submodularity is not strictlydecreased [55, 13], which evaluate the theoretical resultsand justify that the greedy algorithm has a performanceguarantee for the maximization of approximate submodu-larity. Therefore, we adopt the greedy insertion solution

Algorithm 1: Search Optim. via Greedy InsertionInput: The privacy-protection objective function

Liden from Eq. (2); a real face xr and amulti-identity face images GI ; a featurerepresentation function f ; a gain function G.

Input: The protected image xp generated before thecurrent iteration.

Output: The best target image xt∗ in GI .1 gbest ← 0; xt∗ ← None;2 for xt in GI do3 Get the loss Liden(x

t,xp) via Eq. (2);4 Compute the gradient∇xLiden(x

t,xp);5 Generate candidate protected image

xp = Π{xr,ℓp,ϵ}(xp − α ·

Normalize(∇xLiden(xt,xp)));

6 Calculate g = G(xp);7 if g > gbest then8 gbest ← g; xt∗ ← xt;9 end

10 end

as an approximately optimal solution for our multi-targetproblem.

By analyzing the above setting, we perform the approx-imate submodular optimization by greedy insertion algo-rithm, which calculates the gain of every object from thetarget set at each iteration and integrates the object with thelargest gain into current subset St by Definition 1 as

St+1 = St ∪ {argmaxxt∈GI

∆(xt|St)}. (6)

To achieve this, we need to define the above set mappingfunction F . In particular, we specify F as first generat-ing a protected image xp given the targets in St for t iter-ations via Eq. (5), and then using a function G to computea gain value by Definition 1. An appropriate gain functionG should choose examples that are effective for minimizingLiden(x

t,xp) at each iteration. It is noted that G must alsohave positive value and a larger value indicates better per-formance. Based on this analysis, we design a feature-basedsimilarity gain function as

G(xp) = log(1+ max

xt∈GIexp(Df (x

p,xr)−Df (xp,xt))

), (7)

where the algorithm tends to select a target closer to the realimage in the feature space at each iteration. The algorithmis summarized in Algorithm 1.

5. ExperimentsIn this section, we conduct extensive experiments in the

aspect of identity protection to demonstrate the effective-

3901

Model Backbone Loss Parameters (M)FaceNet [37] InceptionResNetV1 Triplet 27.91

SphereFace [26] Sphere20 A-Softmax 28.08CosFace [46] Sphere20 LMCL 22.67ArcFace [6] IR-SE50 Arcface 43.80

MobleFace [5] MobileFaceNet Softmax 1.20ResNet50 [18] ResNet50 Softmax 40.29

Table 2. Chosen target models that lie in various settings, includingdifferent architectures and training objectives.

ness of the proposed method. We thoroughly evaluate dif-ferent properties of our method based on various state-of-the-art face recognition models.5

5.1. Experimental Settings

Datasets. The experiments are constructed on the La-beled Face in the Wild (LFW) [19] and MegFace [21]datasets. We involve some additional considerations todraw near realistic testing scenarios: 1) practical galleryset: we first select 500 different identities as the protectedidentities. Meanwhile, we randomly select an image fromeach identity as the probe image (total 500 images), and theother images (not selecting one template) for each identityare assembled to form a gallery set because the gallery offace encryption in social media includes multiple imagesper identity (more difficult yet practical for simultaneouslyconcealing multiple images per identity); 2) target identi-ties: we randomly select another 10 identities as I from adataset in the Internet named MS-Celeb-1M [17]. We se-lect one image for each of these target identities to form GIand the remaining images are integrated into the gallery set,which can ensure the unawareness of gallery set that targetimages in the optimized phase are different from ones inthe testing; 3) additional identities: we add additional 500identities to the gallery set, which accords with a realistictest scenario. Thus we construct two challenging yet practi-cal data scenarios (over 1k identities and total 10K images).

Target models. We select models with diverse back-bones and training losses to fully demonstrate the ability toprotect user privacy in Tab. 2. In experiments, we first useMTCNN [53] to detect faces in the image, then align the im-ages and crop them to 112× 112, meaning that the identitymasks are executed only in the face area. Only one modelis used as known model to generate the identity masks, andtest the protection performance in other unknown models.

Compared Methods. We investigate many adversar-ial face encryption methods [40, 32], which essentially de-pend on single-target adversarial attack method [22]. Ad-vanced MIM [8] introduces the momentum into iterativeprocess [22] to improve the black-box transferability, andDIM and TIM [50, 9] aim to achieve better transferabilityby input or gradient diversity. Note that TIM only focuson evading defense models and experimentally also achieveworse performance than MIM and DIM. Thus MIM and

5Code at https://github.com/ShawnXYang/TIP-IM.

DIM are regarded as more effective single-target black-box algorithms as comparison. As original DIM only sup-port single-target attack in the iterative optimization, wethus incorporate a multi-target version for DIM via a dy-namic assignment from same target set in the inner min-imization, named MT-DIM. Besides, we study the influ-ence of other multi-target optimization methods. We denotean additional gain function based on Eq. (7) as G1(x) =log

(1 +

∑xt∈GI

exp(Df (x,xr) − Df (x,x

t)))

which isnamed Center-Opt. Center-Opt promotes protected imagesto be updated towards the mean center of target identities inthe feature space, which is similarly adopted in [36]. Notethat single-target methods calculate optimal result as finalreport by attempting a target from the same target set. Weset the number of iterations as N = 50, the learning rateα = 1.5 and the size of perturbation ϵ = 12 under the ℓ∞norm bound, which are identical for all the experiments.

Evaluation Metrics. To comprehensively evaluate theprotection success rate, we report Rank-N targeted iden-tity success rate named Rank-N-T and untargeted identitysuccess rate named Rank-N-UT (higher is better), which areconsistent with the evaluation of face recognition [6, 46].Specifically, given a probe image x and a gallery set G withat least one image of the same identity with x, meanwhileG has images of target identities. The face recognition al-gorithm ranks the distance Df for all images in the galleryto x. Rank-N-T means that at least one of the top N imagesbelongs to the target identity, whereas Rank-N-UT needs tosatisfy that top N images do not have the same identity as x.In this paper, we report Rank-1-T / Rank-1-UT and Rank-5-T / Rank-5-UT. Note that Rank-1-T / Rank-1-UT (Accuracy/ Misclassification) is the most common evaluation metricin prior works, whereas Rank-5-T / Rank-5-UT can providea comprehensive understanding since it is not sure whetherthe image will reappear in the top-K candidates. All meth-ods including single-target methods adopt the same targetidentities and evaluation criterion for a fair comparison.

To test the imperceptibility of the generated protectedimages, we adopt the standard quantitative measures—PSNR (dB) and structural similarity (SSIM) [47], as wellas MMD in the face area. For SSIM and PSNR, a largervalue means better image quality, whereas a smaller MMDvalue indicates superior performance.

5.2. Effectiveness of Black-box Face EncryptionWe first generate protected images against ArcFace, Mo-

bileFace, and ResNet50 respectively, by the proposed TIP-IM. We then feed the generated protected images to all facemodels for testing the performance in Tab. 3 and Tab. 4.Our algorithm achieves nearly two times of the success ratesthan previous state-of-the-art method MT-DIM in terms ofRank-1-T and Rand-5-T, and outperforms other methodsby a large margin, whereas SSIM values among compared

3902

Method ArcFace MobileFace ResNet50 SphereFace FaceNet CosFaceR1-T R5-T R1-T R5-T R1-T R5-T R1-T R5-T R1-T R5-T R1-T R5-T

ArcFace

MIM [8]DIM [50]

MT-DIM [50]Center-Opt

TIP-IM

94.0∗

94.8∗

34.8∗

59.4∗

97.2∗

96.9∗

97.6∗

68.2∗

84.6∗

98.8∗

14.316.818.836.869.8

45.848.053.666.090.6

8.210.815.828.856.0

32.434.846.057.680.6

3.14.23.86.613.2

14.515.618.421.432.0

3.14.49.6

11.832.8

17.919.033.635.456.2

1.72.62.03.811.4

10.111.011.213.031.0

MobileFace

MIM [8]DIM [50]

MT-DIM [50]Center-Opt

TIP-IM

8.19.410.614.844.0

27.928.830.241.868.2

96.1∗

96.6∗40.2∗

53.0∗

96.6∗

98.3∗

98.4∗

73.2∗

83.4∗

99.2∗

26.728.818.621.862.8

61.563.249.453.685.8

4.56.26.85.812.6

18.019.222.225.629.8

3.74.89.8

12.228.8

19.420.627.029.646.2

0.31.22.03.612.4

4.15.211.012.431.2

ResNet50

MIM [8]DIM [50]

MT-DIM [50]Center-Opt

TIP-IM

5.27.014.213.234.2

24.726.437.637.1456.8

24.626.622.426.862.4

56.557.252.658.683.4

30.1∗

31.4∗

31.4∗

41.6∗

95.6∗

64.8∗

65.0∗

65.0∗

73.4∗

98.2∗

8.19.25.26.811.4

23.424.216.821.025.6

4.96.89.49.2

23.2

20.722.829.227.240.0

0.92.01.82.410.8

5.66.68.812.026.2

Table 3. Rank-1-T and Rank-5-T (%) of black-box identity protection against different models on LFW. ∗ indicates white-box results.

Attack ArcFace MobileFace ResNet50 SphereFace FaceNet CosFaceR1-U R5-U R1-U R5-U R1-U R5-U R1-U R5-U R1-U R5-U R1-U R5-U

ArcFaceDIM [50]

MT-DIM [50]TIP-IM

95.8∗

96.0∗

97.4∗

91.8∗

93.6∗

96.4∗

67.673.679.4

58.265.268.8

58.264.870.4

48.054.256.8

79.682.885.2

68.273.076.6

67.473.073.4

53.660.063.4

74.274.484.0

62.863.873.8

MobileFaceDIM [50]

MT-DIM [50]TIP-IM

60.266.468.6

45.452.858.8

96.4∗

95.4∗

96.6∗

92.0∗

95.4∗

94.8∗

72.277.681.4

59.068.471.2

80.083.284.4

69.673.274.0

68.474.677.6

53.658.660.4

75.676.079.4

62.262.868.2

ResNet50DIM [50]

MT-DIM [50]TIP-IM

77.679.083.6

50.452.459.6

80.684.887.0

72.275.281.8

95.4∗

94.4∗

96.8∗

91.6∗

93.8∗

94.6∗

80.482.285.8

65.872.475.0

69.077.879.4

53.260.865.4

64.277.083.6

61.865.873.0

Table 4. Rank-1-UT and Rank-5-UT (%) of black-box identity protection against different models on LFW. ∗ indicates white-box attacks.

Metric γ = 0.0 γ = 1.0 γ = 2.0 γ = 3.0

ArcFacePSNR(↑)SSIM(↑)MMD(↓)

25.260.65200.7567

25.590.66900.7562

26.080.69860.7554

27.630.78170.7518

MobileFacePSNR(↑)SSIM(↑)MMD(↓)

25.240.64900.7567

25.180.65230.7568

25.720.68280.7559

27.190.75330.7525

ResNet50PSNR(↑)SSIM(↑)MMD(↓)

25.140.65070.7570

25.260.65950.7567

25.740.68970.7558

27.210.760

0.7525

Table 5. The average PSNR (db), SSIM, and MMD of the pro-tected images generated by TIP-IM with different γ.

methods are very similar in Fig. 3. MT-DIM obtains moreacceptable performance than single method DIM, indicat-ing that multi-target setting yields a better black-box trans-ferability. It can be also observed that different multi-targetmethods will influence the performance, and proposed TIP-IM defined in Eq. (7) achieves better performance thanCenter-Opt, . We also report the results of Rank-1-UTand Rank-5-UT in Tab. 4, which can still maintain the bestperformance with an average accuracy of Rank-1-UT over80% for black-box models. As a whole, TIP-IM providesmore promising multi-target optimization direction, mak-ing generated protected images more effective for black-box models. Note that the protected images generated byArcFace have excellent transferability to the other black-box models. Thus we will have priority to select ArcFaceor ensemble mechanism [8] as the substitute model for bet-ter performance in practical application. Due to the spacelimitation, we leave the results of MegFace in Appendix A.

Comparison experiments about target images. Wetest the performance of different numbers of targets in Ap-pendix C. We experimentally find 10 target identities in

MIM DIM MT-DIM Center-Opt TIP-IM0.50

0.55

0.60

0.65

0.70

0.75

0.80

SSIM

ArcFace

MIM DIM MT-DIM Center-Opt TIP-IM0.50

0.55

0.60

0.65

0.70

0.75

0.80

SSIM

MobileFace

Figure 3. Comparison of SSIM for different methods.

this paper is enough, which implies that small increases inthe number of targets can obtain impressive performance inspite of taking slight timing cost. We specify some gener-ated images from StyleGAN [20] as target images. The re-sults show that our algorithm still has excellent black-boxperformance of identity protection. In practical applica-tions, we can arbitrarily specify the available and autho-rized target identity set or generated face images, and ouralgorithm is applicable to any target set.

5.3. NaturalnessTo examine whether our algorithm is able to control the

naturalness of protected images in the process of generatingthe samples, we perform experiments with different coeffi-cient γ. Tab. 5 shows the evaluation results of different facerecognition models (including ArcFace, MobileFace, andResNet50) w.r.t three different metrics—PSNR, SSIM, andMMD. As γ increases, the visual quality of the generatedimages is getting better based on different metrics, which isalso consistent with the example in Fig. 4. Therefore, condi-tioned on different coefficient γ, we can control the degree

3903

= 0 = 2.00 = 2.25 = 2.50 = 2.75

Naturalnessγ γ γ γ γ

= 1.75γDIMOriginal Ours

Figure 4. Experiments on how different γ affects the performance. Green hook refers to successful targeted identity protection while redhook refers to failure, which also implies a trade-off on effectiveness and naturalness. Best view when zoom in.

0.0 0.5 1.0 1.5 2.0 2.5 3.00.00.10.20.30.40.50.60.70.80.91.0

Ran

k-1-

T Sc

ore

ArcFace

0.600.620.640.660.680.700.720.740.760.780.80

SSIM

0.0 0.5 1.0 1.5 2.0 2.5 3.00.00.10.20.30.40.50.60.70.80.91.0

Ran

k-1-

T Sc

ore

MobileFace

CosFace ArcFace MobileFace ResNet50 SphereFace FaceNet

0.600.620.640.660.680.700.720.740.760.780.80

SSIM

Naturalness

Figure 5. Rank-1-T score and SSIM of protected images generatedby TIP-IM with different γ against different models.

of the generated protected images. Apart from quantitativemeasures, we also performed naturalness manipulation fordifferent γ dynamically in Fig. 4. The image looks morenatural as the γ increases, whereas to a certain extent iden-tity protection tends to fail. We also perform a more generalevaluation on all given recognition models in Fig. 5. As γincreases, SSIM values perform a general downward trendfor Rank-1-T accuracy, meaning that appropriate γ is cru-cial for transferability and naturalness.

Practicability. Face encryption focuses on generatingeffective and natural adversarial identity masks, which can-not be realized by most previous adversarial attacks w.r.t.the effectiveness (in Tab. 3) and naturalness (in Fig. 3 andFig. 4). In practical applications, users can adopt proposedTIP-IM to adjust γ to control stronger obfuscation perfor-mance (effectiveness) or visual quality (naturalness).

5.4. Effectiveness on a Real-World ApplicationIn this section, we apply our proposed TIP-IM to test

the identity protection performance on a commercial facesearch API available at Tencent AI Open Platform6. Theworking mechanism and training data are completely un-known for us. To simulate the privacy data scenario, we usethe same gallery set described above. We choose 20 probefaces from above probe set to execute face search based onsimilarity ranking in this platform. All 20 probe faces canbe identified at Rank1. Then we generate correspondingprotected images from probe faces to execute face search.

6https://ai.qq.com/product/face.shtml

Face Search in Practical SystemFace Search in Practical System

Real Face

Protected Face

94 90 89 97 80 80

87 80 77 88 81 81Real Face

Protected Face

Figure 6. Examples of face encryption on the real-world facerecognition API. We separately use real and protected faces byTIP-IM as probes to do face search and show top three results bysimilarity. Blue boxes represent the faces with same identities asprobe faces and green boxes imply the faces belonging to targetedidentities. Similarity scores with probe face are marked in yellow.

For return rankings there exists 6 target identities in rank1and 16 in rank5. Note that those faces with the same iden-tity also show a decreasing similarity in different degrees,which also illustrates the effectiveness for black-box facesystem, and two examples are shown in Fig. 6.

6. ConclusionIn this paper, we studied the problem of identity protec-

tion by simulating realistic identification systems in the so-cial media. Extensive experiments show that proposed TIP-IM method enables users to protect their private informationfrom being exposed by the unauthorized identification sys-tems while not affecting the user experience in social media.

AcknowledgementsThis work was supported by the National Key

Research and Development Program of China (No.s2020AAA0104304, 2020AAA0106302), NSFC Projects(Nos. 61620106010, 62076147, U19A2081, U19B2034,U1811461), Beijing Academy of Artificial Intelligence(BAAI), Alibaba Group through Alibaba Innovative Re-search Program, Tsinghua-Huawei Joint Research Program,a grant from Tsinghua Institute for Guo Qiang, Tiangong In-stitute for Intelligent Computing, and the NVIDIA NVAILProgram with GPU/DGX Acceleration.

3904

References[1] Anish Athalye, Nicholas Carlini, and David Wagner. Obfus-

cated gradients give a false sense of security: Circumventingdefenses to adversarial examples. In International Confer-ence on Machine Learning (ICML), 2018. 2

[2] Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, SatyenKale, Frank McSherry, and Kunal Talwar. Privacy, accuracy,and consistency too: a holistic solution to contingency tablerelease. In Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database sys-tems, pages 273–282, 2007. 3

[3] Avrim Blum, Cynthia Dwork, Frank McSherry, and KobbiNissim. Practical privacy: the sulq framework. In Proceed-ings of the twenty-fourth ACM SIGMOD-SIGACT-SIGARTsymposium on Principles of database systems, pages 128–138, 2005. 3

[4] Karsten M Borgwardt, Arthur Gretton, Malte J Rasch, Hans-Peter Kriegel, Bernhard Scholkopf, and Alex J Smola. Inte-grating structured biological data by kernel maximum meandiscrepancy. Bioinformatics, 22(14):e49–e57, 2006. 4

[5] Sheng Chen, Yang Liu, Xiang Gao, and Zhen Han. Mobile-facenets: Efficient cnns for accurate real-time face verifica-tion on mobile devices. In Chinese Conference on BiometricRecognition, pages 428–438. Springer, 2018. 6

[6] Jiankang Deng, Jia Guo, Niannan Xue, and StefanosZafeiriou. Arcface: Additive angular margin loss for deepface recognition. In Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, pages 4690–4699, 2019. 1, 6

[7] Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, HangSu, Zihao Xiao, and Jun Zhu. Benchmarking adversarial ro-bustness. arXiv preprint arXiv:1912.11852, 2019. 3

[8] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, JunZhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial at-tacks with momentum. In Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR),2018. 2, 5, 6, 7

[9] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu.Evading defenses to transferable adversarial examples bytranslation-invariant attacks. In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition(CVPR), 2019. 6

[10] Yinpeng Dong, Hang Su, Baoyuan Wu, Zhifeng Li, Wei Liu,Tong Zhang, and Jun Zhu. Efficient decision-based black-box adversarial attacks on face recognition. In Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2019. 2, 3

[11] Cynthia Dwork. Differential privacy: A survey of results. InInternational conference on theory and applications of mod-els of computation, pages 1–19. Springer, 2008. 3

[12] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and AdamSmith. Calibrating noise to sensitivity in private data analy-sis. In Theory of cryptography conference, pages 265–284.Springer, 2006. 3

[13] Uriel Feige, Vahab S Mirrokni, and Jan Vondrak. Maximiz-ing non-monotone submodular functions. SIAM Journal onComputing, 40(4):1133–1153, 2011. 5

[14] Oran Gafni, Lior Wolf, and Yaniv Taigman. Live face de-identification in video. In Proceedings of the IEEE Inter-national Conference on Computer Vision, pages 9378–9387,2019. 1

[15] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, BingXu, David Warde-Farley, Sherjil Ozair, Aaron Courville, andYoshua Bengio. Generative adversarial nets. In Advancesin neural information processing systems, pages 2672–2680,2014. 1, 3

[16] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy.Explaining and harnessing adversarial examples. In Inter-national Conference on Learning Representations (ICLR),2015. 2, 3

[17] Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, andJianfeng Gao. Ms-celeb-1m: A dataset and benchmark forlarge-scale face recognition. In European Conference onComputer Vision, pages 87–102. Springer, 2016. 6

[18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.Identity mappings in deep residual networks. In EuropeanConference on Computer Vision (ECCV), pages 630–645.Springer, 2016. 6

[19] Gary B Huang, Marwan Mattar, Tamara Berg, and EricLearned-Miller. Labeled faces in the wild: A databaseforstudying face recognition in unconstrained environments.In Technical report, 2007. 6

[20] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten,Jaakko Lehtinen, and Timo Aila. Analyzing and improv-ing the image quality of stylegan. In Proceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition, pages 8110–8119, 2020. 7

[21] Ira Kemelmacher-Shlizerman, Steven M Seitz, DanielMiller, and Evan Brossard. The megaface benchmark: 1million faces for recognition at scale. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recogni-tion, pages 4873–4882, 2016. 6

[22] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Ad-versarial examples in the physical world. In InternationalConference on Learning Representations (ICLR) Workshops,2017. 2, 5, 6

[23] Martha Larson, Zhuoran Liu, SFB Brugman, and ZhengyuZhao. Pixel privacy. increasing image appeal while blockingautomatic inference of sensitive scene information. 2018. 1

[24] Fengjun Li, Bo Luo, and Peng Liu. Secure information ag-gregation for smart grids using homomorphic encryption. In2010 first IEEE international conference on smart grid com-munications, pages 327–332. IEEE, 2010. 2

[25] Tao Li and Lei Lin. Anonymousnet: Natural face de-identification with measurable privacy. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recogni-tion Workshops, pages 0–0, 2019. 1, 3

[26] Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, BhikshaRaj, and Le Song. Sphereface: Deep hypersphere embeddingfor face recognition. In Proceedings of the IEEE conferenceon computer vision and pattern recognition, pages 212–220,2017. 1, 2, 6

[27] Haiping Lu, Karl Martin, Francis Bui, Konstantinos N Pla-taniotis, and Dimitris Hatzinakos. Face recognition with bio-metric encryption for privacy-enhancing self-exclusion. In

3905

2009 16th International Conference on Digital Signal Pro-cessing, pages 1–8. IEEE, 2009. 2

[28] Richard McPherson, Reza Shokri, and Vitaly Shmatikov.Defeating image obfuscation with deep learning. arXivpreprint arXiv:1609.00408, 2016. 1, 3

[29] Frank McSherry and Kunal Talwar. Mechanism design viadifferential privacy. In 48th Annual IEEE Symposium onFoundations of Computer Science (FOCS’07), pages 94–103. IEEE, 2007. 3

[30] George L Nemhauser, Laurence A Wolsey, and Marshall LFisher. An analysis of approximations for maximizingsubmodular set functions—i. Mathematical programming,14(1):265–294, 1978. 5

[31] Seong Joon Oh, Rodrigo Benenson, Mario Fritz, and BerntSchiele. Faceless person recognition: Privacy implications insocial media. In European Conference on Computer Vision,pages 19–35. Springer, 2016. 1, 3

[32] Seong Joon Oh, Mario Fritz, and Bernt Schiele. Adversarialimage perturbation for privacy protection a game theory per-spective. In 2017 IEEE International Conference on Com-puter Vision (ICCV), pages 1491–1500. IEEE, 2017. 2, 3,6

[33] Tianyu Pang, Xiao Yang, Yinpeng Dong, Kun Xu, Jun Zhu,and Hang Su. Boosting adversarial training with hypersphereembedding. arXiv preprint arXiv:2002.08619, 2020. 3

[34] Slobodan Ribaric, Aladdin Ariyaeeinia, and Nikola Pavesic.De-identification for privacy protection in multimedia con-tent: A survey. Signal Processing: Image Communication,47:131–151, 2016. 2

[35] Jose Marconi Rodrigues, William Puech, Peter Meuel, Jean-Claude Bajard, and Marc Chaumont. Face protection by fastselective encryption in a video. 2006. 1

[36] Andras Rozsa, Manuel Gunther, and Terranee E Boult. Lotsabout attacking deep features. In 2017 IEEE InternationalJoint Conference on Biometrics (IJCB), pages 168–176.IEEE, 2017. 3, 6

[37] Florian Schroff, Dmitry Kalenichenko, and James Philbin.Facenet: A unified embedding for face recognition and clus-tering. In Proceedings of the IEEE conference on computervision and pattern recognition, pages 815–823, 2015. 1, 6

[38] Ayon Sen, Xiaojin Zhu, Liam Marshall, and Robert Nowak.Should adversarial attacks use pixel p-norm? arXiv preprintarXiv:1906.02439, 2019. 2, 3

[39] Shawn Shan, Emily Wenger, Jiayun Zhang, Huiying Li,Haitao Zheng, and Ben Y Zhao. Fawkes: Protecting pri-vacy against unauthorized deep learning models. In 29th{USENIX} Security Symposium ({USENIX} Security 20),pages 1589–1604, 2020. 3

[40] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, andMichael K. Reiter. Accessorize to a crime: Real andstealthy attacks on state-of-the-art face recognition. In ACMSigsac Conference on Computer and Communications Secu-rity, pages 1528–1540, 2016. 2, 3, 6

[41] Gurpreet Singh. A study of encryption algorithms (rsa, des,3des and aes) for information security. International Journalof Computer Applications, 67(19), 2013. 2

[42] David Stutz, Matthias Hein, and Bernt Schiele. Disentan-gling adversarial robustness and generalization. In Proceed-ings of the IEEE/CVF Conference on Computer Vision andPattern Recognition, pages 6976–6987, 2019. 3

[43] Qianru Sun, Liqian Ma, Seong Joon Oh, Luc Van Gool,Bernt Schiele, and Mario Fritz. Natural and effective obfus-cation by head inpainting. In Proceedings of the IEEE Con-ference on Computer Vision and Pattern Recognition, pages5050–5059, 2018. 1, 3

[44] Qianru Sun, Ayush Tewari, Weipeng Xu, Mario Fritz, Chris-tian Theobalt, and Bernt Schiele. A hybrid model for identityobfuscation by face replacement. In Proceedings of the Eu-ropean Conference on Computer Vision (ECCV), pages 553–569, 2018. 1, 2, 3

[45] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, JoanBruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. In-triguing properties of neural networks. In International Con-ference on Learning Representations (ICLR), 2014. 2, 3

[46] Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, DihongGong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface:Large margin cosine loss for deep face recognition. In Pro-ceedings of the IEEE Conference on Computer Vision andPattern Recognition, pages 5265–5274, 2018. 1, 6

[47] Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simon-celli, et al. Image quality assessment: from error visibility tostructural similarity. IEEE transactions on image processing,13(4):600–612, 2004. 6

[48] Michael J Wilber, Vitaly Shmatikov, and Serge Belongie.Can we still avoid automatic face detection? In 2016IEEE Winter Conference on Applications of Computer Vision(WACV), pages 1–9. IEEE, 2016. 1, 2, 3

[49] Yifan Wu, Fan Yang, and Haibin Ling. Privacy-protective-gan for face de-identification. arXiv preprintarXiv:1806.08906, 2018. 1, 3

[50] Cihang Xie, Zhishuai Zhang, Yuyin Zhou, Song Bai, JianyuWang, Zhou Ren, and Alan L Yuille. Improving transferabil-ity of adversarial examples with input diversity. In Proceed-ings of the IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2019. 6, 7

[51] Xiao Yang, Fangyun Wei, Hongyang Zhang, and Jun Zhu.Design and interpretation of universal adversarial patches inface detection. In Computer Vision–ECCV 2020: 16th Eu-ropean Conference, Glasgow, UK, August 23–28, 2020, Pro-ceedings, Part XVII 16, pages 174–191. Springer, 2020. 3

[52] Xiao Yang, Dingcheng Yang, Yinpeng Dong, Wenjian Yu,Hang Su, and Jun Zhu. Delving into the adversarial robust-ness on face recognition. arXiv preprint arXiv:2007.04118,2020. 2

[53] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao.Joint face detection and alignment using multitask cascadedconvolutional networks. IEEE Signal Processing Letters,23(10):1499–1503, 2016. 6

[54] Zhengli Zhao, Dheeru Dua, and Sameer Singh. Generatingnatural adversarial examples. In International Conference onLearning Representations (ICLR), 2018. 2, 3

[55] Yuxun Zhou and Costas J Spanos. Causal meets submodular:Subset selection with directed information. In Advances In

3906

Neural Information Processing Systems, pages 2649–2657,2016. 5

3907