Universal adversarial perturbations
Seyed-Mohsen Moosavi-Dezfooli∗†
Alhussein Fawzi∗†
Omar Fawzi‡
Pascal Frossard†
Abstract
Given a state-of-the-art deep neural network classifier,we show the existence of a universal (image-agnostic) andvery small perturbation vector that causes natural imagesto be misclassified with high probability. We propose a sys-tematic algorithm for computing universal perturbations,and show that state-of-the-art deep neural networks arehighly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye. We further empirically an-alyze these universal perturbations and show, in particular,that they generalize very well across neural networks. Thesurprising existence of universal perturbations reveals im-portant geometric correlations among the high-dimensionaldecision boundary of classifiers. It further outlines poten-tial security breaches with the existence of single directionsin the input space that adversaries can possibly exploit tobreak a classifier on most natural images.
1. Introduction
Can we find a single small image perturbation that foolsa state-of-the-art deep neural network classifier on all nat-ural images? We show in this paper the existence of suchquasi-imperceptible universal perturbation vectors that leadto misclassify natural images with high probability. Specif-ically, by adding such a quasi-imperceptible perturbationto natural images, the label estimated by the deep neu-ral network is changed with high probability (see Fig. 1).Such perturbations are dubbed universal, as they are image-agnostic. The existence of these perturbations is problem-atic when the classifier is deployed in real-world (and pos-sibly hostile) environments, as such a single perturbationcan be exploited by adversaries to break the classifier. In-deed, the perturbation process involves the mere addition of
∗The first two authors contributed equally to this work.†Ecole Polytechnique Federale de Lausanne, Switzerland‡ENS de Lyon, LIP, UMR 5668 ENS Lyon - CNRS - UCBL - INRIA,
Universite de Lyon, France
Joystick
Whiptail lizard
Balloon
Lycaenid
Tibetan masti�
Thresher
Grille
Flagpole
Face powder
Laborador
Chihuahua
Chihuahua
Jay
Laborador
Laborador
Tibetan masti�
Brabancon gri�on
Border terrier
Figure 1: When added to a natural image, a universal per-turbation image causes the image to be misclassified by thedeep neural network with high probability. Left images:Original natural images. The labels are shown on top ofeach arrow. Central image: Universal perturbation. Rightimages: Perturbed images. The estimated labels of the per-turbed images are shown on top of each arrow.
1
arX
iv:1
610.
0840
1v1
[cs
.CV
] 2
6 O
ct 2
016
one very small perturbation to all natural images, and canbe relatively straightforward to implement by adversaries inreal-world environments, while being relatively difficult todetect as such perturbations are very small and thus do notsignificantly affect data distributions. The surprising exis-tence of universal perturbations further reveals new insightson the topology of the decision boundaries of deep neuralnetworks. We summarize the main contributions of this pa-per as follows:
• We show the existence of universal image-agnosticperturbations for state-of-the-art deep neural networks.
• We propose an efficient algorithm for finding such per-turbations. The algorithm seeks a universal perturba-tion for a set of training points, and proceeds by aggre-gating atomic perturbation vectors that send successivedatapoints to the decision boundary of the classifier.
• We show that universal perturbations have a remark-able generalization property, as perturbations com-puted for a rather small set of training points fool newimages with high probability.
• We show that such perturbations are not only univer-sal across images, but also generalize well across deepneural networks. Such perturbations are therefore dou-bly universal, both with respect to the data and the net-work architectures.
• We explain and analyze the high vulnerability of deepneural networks to universal perturbations by examin-ing the geometric correlation between different partsof the decision boundary.
The robustness of image classifiers to structured and un-structured perturbations have recently attracted a lot of at-tention [19, 16, 20, 3, 5, 13, 14, 4]. Despite the impres-sive performance of deep neural network architectures onchallenging visual classification benchmarks [7, 10, 21, 11],these classifiers were shown to be highly vulnerable to per-turbations. In [19], such networks are shown to be unsta-ble to very small and often imperceptible additive adver-sarial perturbations. Such carefully crafted perturbationsare either estimated by solving an optimization problem[19, 12, 1] or through one step of gradient ascent [6], andresult in a perturbation that fools a specific data point. Afundamental property of these adversarial perturbations istheir intrinsic dependence on datapoints: the perturbationsare specifically crafted for each data point independently.As a result, the computation of an adversarial perturbationfor a new data point requires solving a data-dependent opti-mization problem from scratch, which uses the full knowl-edge of the classification model. This is different from theuniversal perturbation considered in this paper, as we seeka single perturbation vector that fools the network on most
natural images. Perturbing a new datapoint then only in-volves the mere addition of the universal perturbation to theimage (and does not require solving an optimization prob-lem/gradient computation). We emphasize that our notionof universal perturbation differs from the generalization ofadversarial perturbations studied in [19], where perturba-tions computed on the MNIST task were shown to gener-alize well across different neural network architectures. In-stead, we examine the existence of universal perturbationsthat are common to most data points belonging to the datadistribution.
2. Universal perturbationsWe formalize in this section the notion of universal per-
turbations, and propose a method for estimating such per-turbations. Let µ denote a distribution of images in Rd, andk define a classification function that outputs for each im-age x ∈ Rd an estimated label k(x). The main focus of thispaper is to seek perturbation vectors v ∈ Rd that fool theclassifier k on almost all datapoints sampled from µ. Thatis, we seek a vector v such that
k(x+ v) 6= k(x) for “most” x ∼ µ.
We coin such a perturbation universal, as it represents afixed image-agnostic perturbation that causes label changefor most images sampled from the data distribution µ. Wefocus here on the case where the distribution µ representsthe set of natural images, hence containing a huge amountof variability. In that context, we examine the existence ofvery small universal perturbations (in terms of the `p normwith p ∈ [1,∞)) that misclassify most images. The goalis therefore to find v that satisfies the following two con-straints:
1. ‖v‖p ≤ ξ,
2. Px∼µ
(k(x+ v) 6= k(x)
)≥ 1− δ.
The parameter ξ controls the magnitude of the perturbationvector v, and δ quantifies the desired fooling rate for allimages sampled from the distribution µ.
Algorithm. Let X = {x1, . . . , xm} be a set of imagessampled from the distribution µ. Our proposed algorithmseeks a universal perturbation v, such that ‖v‖p ≤ ξ, whilefooling most data points inX . The proposed algorithm pro-ceeds iteratively over the data points in X and graduallybuilds the universal perturbation, as illustrated in Fig. 2. Ateach iteration, the minimal perturbation ∆vi that sends thecurrent perturbed point, xi + v, to the decision boundaryof the classifier is computed, and aggregated to the currentinstance of the universal perturbation. In more details, pro-vided the current universal perturbation v does not fool data
point xi, we seek the extra perturbation ∆vi with minimalnorm that allows to fool data point xi by solving the follow-ing optimization problem:
∆vi ← arg minr‖r‖2 s.t. k(xi + v + r) 6= k(xi). (1)
To ensure that the constraint ‖v‖p ≤ ξ is satisfied, the up-dated universal perturbation is further projected on the `pball of radius ξ and centered at 0. That is, let Pp,ξ be theprojection operator defined as follows:
Pp,ξ(x0) = minx‖x− x0‖2 subject to ‖x0‖p ≤ ξ.
Then, our update rule is given by:
v ← Pp,ξ(v + ∆vi).
Several passes on the data set X are performed to improvethe quality of the universal perturbation. The algorithm isterminated when the empirical “fooling rate” on the per-turbed data set Xv := {x1 + v, . . . , xm + v} exceeds thetarget threshold 1− δ. That is, we stop the algorithm when-ever
Err(Xv) :=1
m
m∑i=1
1k(xi+v) 6=k(xi)≥ 1− δ.
The detailed algorithm is provided in Algorithm 1. Notethat, in practice, the number of data points m in X need notbe large to compute a universal perturbation that is validfor the whole distribution µ. In particular, we can set mto be much smaller than the number of training points (seeSection 3).
The proposed algorithm involves solving at most m in-stances of the optimization problem in Eq. (1) at each pass.While this optimization problem is not convex when k is astandard classifier (e.g., a deep neural network), several ef-ficient approximate methods have been devised for solvingthis problem [19, 12, 8]. We use in the following the ap-proach in [12] for its efficency. It should further be noticedthat the objective of Algorithm 1 is not to find the smallestuniversal perturbation that fools most data points sampledfrom the distribution, but rather to find one such perturba-tion with sufficiently small norm. In particular, differentrandom shufflings of the set X naturally lead to a diverseset of universal perturbations v satisfying the required con-straints. The proposed algorithm can therefore be leveragedto generate multiple universal perturbations for a deep neu-ral network (see next section for visual examples).
3. Universal perturbations for deep netsWe now analyze the robustness of state-of-the-art deep
neural network classifiers to universal perturbations usingAlgorithm 1.
∆v 1
x1,2,3
R1
R2
v∆v 2
R3
Figure 2: Schematic representation of the proposed algo-rithm used to compute universal perturbations. In this illus-tration, data points x1, x2 and x3 are super-imposed, and theclassification regions Ri are shown in different colors. Ouralgorithm proceeds by aggregating sequentially the minimalperturbations sending the current perturbed points xi + voutside of the corresponding classification region Ri.
Algorithm 1 Computation of universal perturbations.
1: input: Data points X , classifier k, desired `p norm ofthe perturbation ξ, desired accuracy on perturbed sam-ples δ.
2: output: Universal perturbation vector v.3: Initialize v ← 0.4: while Err(Xv) ≤ 1− δ do5: for each datapoint xi ∈ X do6: if k(xi + v) = k(xi) then7: Compute the minimal perturbation that
sends xi + v to the decision boundary:
∆vi ← arg minr‖r‖2
s.t. k(xi + v + r) 6= k(xi).
8: Update the perturbation:
v ← Pp,ξ(v + ∆vi).
9: end if10: end for11: end while
CaffeNet [9] VGG-F [2] VGG-16 [17] VGG-19 [17] GoogLeNet [18] ResNet-152 [7]
`2X 85.4% 85.9% 90.7% 86.9% 82.9% 89.7%Val. 85.6 87.0% 90.3% 84.5% 82.0% 88.5%
`∞X 93.1% 93.8% 78.5% 77.8% 80.8% 85.4%Val. 93.3% 93.7% 78.3% 77.8% 78.9% 84.0%
Table 1: Fooling ratios on the set X , and the validation set.
In a first experiment, we assess the estimated universalperturbations for different recent deep neural networks onthe ILSVRC 2012 [15] validation set (50’000 images), andreport the fooling ratio, that is the proportion of images thatchange labels when perturbed by our universal perturbation.Results are reported for p = 2 and p = ∞, where werespectively set ξ = 2000 and ξ = 10. These numericalvalues were chosen in order to obtain a perturbation whosenorm is significantly smaller than the image norms, suchthat the perturbation is quasi-imperceptible when added tonatural images. Results are listed in Table 1. Each resultis reported on the set X , which is used to compute the per-turbation, as well as on the validation set (that is not usedin the process of the computation of the universal perturba-tion). Observe that for all networks, the universal perturba-tion achieves very high fooling rates on the validation set.Specifically, the universal perturbations computed for Caf-feNet and VGG-F fool more than 90% of the validation set(when p = ∞). In other words, for any natural image inthe validation set, the mere addition of our universal per-turbation fools the classifier more than 9 times out of 10.This result is moreover not specific to such architectures,as we can also find universal perturbations that cause VGG,GoogLeNet and ResNet classifiers to be fooled on naturalimages with probability edging 80%. These results havean element of surprise, as it shows the existence of singleuniversal perturbation vectors that cause natural images tobe misclassified with high probability, albeit being quasi-imperceptible to humans. To verify this latter claim, weshow visual examples of perturbed images in Fig. 3, wherethe GoogLeNet architecture is used. These images are ei-ther taken from the ILSVRC 2012 validation set (rows 1and 2), or taken by a mobile phone camera (row 3). Ob-serve that in most cases, the universal perturbation is quasi-imperceptible, yet this powerful image-agnostic perturba-tion is able to misclassify any image with high probabilityfor state-of-the-art classifiers. We refer to Appendix A forthe original (unperturbed) images, as well as their groundtruth labels. We visualize the universal perturbations corre-sponding to different networks in Fig. 4. It should be notedthat such universal perturbations are not unique, as manydifferent universal perturbations (all satisfying the two re-quired constraints) can be generated for the same network.In Fig. 5, we visualize five different universal perturba-
tions obtained by using different random shufflings in X .Observe that such universal perturbations are different, al-though they exhibit a similar pattern. This is moreoverconfirmed by computing the normalized inner products be-tween two pairs of perturbation images, as the normalizedinner products do not exceed 0.1, which shows that one canfind diverse universal perturbations.
While the above universal perturbations are computedfor a set X of 10′000 images from the training set (i.e., inaverage 10 images per class), we now examine the influenceof the size of X on the quality of the universal perturbation.We show in Fig. 6 the fooling rates obtained on the val-idation set for different sizes of X for GoogLeNet. Notefor example that with a set X containing only 500 images,we can fool more than 30% of the images on the validationset. This result is significant when compared to the num-ber of classes in ImageNet (1′000), as it shows that we canfool a large set of unseen images, even when using a setX containing less than one image per class! The universalperturbations computed using Algorithm 1 have therefore aremarkable generalization power over unseen data points,and can be computed on a very small set of training images.
Cross-model universality. While the computed pertur-bations are universal across unseen data points, we now ex-amine their cross-model universality. That is, we study towhich extent universal perturbations computed for a spe-cific architecture (e.g., VGG-19) are also valid for anotherarchitecture (e.g., GoogLeNet). Table 2 displays a matrixsummarizing the universality of such perturbations acrosssix different architectures. For each architecture, we com-pute a universal perturbation and report the fooling ratios onall other architectures; we report these in the rows of Table2. Observe that, for some architectures, the universal pertur-bations generalize very well across other architectures. Forexample, universal perturbations computed for the VGG-19network have a fooling ratio above 53% for all other testedarchitectures. This result shows that our universal perturba-tions are, to some extent, doubly-universal as they general-ize well across data points and very different architectures.It should be noted that, in [19], adversarial perturbationswere previously shown to generalize well, to some extent,across different neural networks on the MNIST problem.Our results are however different, as we show the general-
wool Indian elephant Indian elephant African grey
tabby African grey common newt carousel
grey fox macaw three-toed sloth macaw
Figure 3: Examples of perturbed images and their corresponding labels. The first two rows of images belong to the ILSVRC2012 validations set, and the last row are random images taken by a mobile phone camera. We refer to the appendix for theoriginal images.
izability of universal perturbations across different architec-tures on the ImageNet data set. This result shows that suchperturbations are of practical relevance, as they generalizewell across data points and architectures. In particular, inorder to fool a new image on an unknown neural network, amere addition of a universal perturbation computed on theVGG-19 architecture is likely to misclassify the data point.
Visualization of the effect of universal perturbations.To gain insights on the effect of universal perturbations onnatural images, we now visualize the distribution of labelson the ImageNet validation set. Specifically, we build a di-rected graph G = (V,E), whose vertices denote the labels,and directed edges e = (i→ j) indicate that the majority ofimages of class i are fooled into label j when applying theuniversal perturbation. The existence of edges i→ j there-fore suggests that the preferred fooling label for images ofclass i is j. We construct this graph for GoogLeNet, and vi-sualize the full graph in Appendix A for space constraints.The visualization of this graph shows a very peculiar topol-
ogy. In particular, the graph is a union of disjoint compo-nents, where all edges in one component mostly connect toone target label. See Fig. 7 for an illustration of two differ-ent connected components. This visualization clearly showsthe existence of several dominant labels, and that universalperturbations mostly make natural images classified withsuch labels. We hypothesize that these dominant labels oc-cupy large regions in the image space, and therefore repre-sent good candidate labels for fooling most natural images.Note that such dominant labels are automatically found bythe algorithm to generate universal perturbations, and arenot imposed a priori in the computation of perturbations.
4. Explaining the vulnerability to universalperturbations
The goal of this section is to analyze and explain the highvulnerability of deep neural network classifiers to universalperturbations. To understand the unique characteristics ofuniversal perturbations, we first compare such perturbations
(a) CaffeNet (b) VGG-F (c) VGG-16
(d) VGG-19 (e) GoogLeNet (f) ResNet-152
Figure 4: Universal perturbations computed for different deep neural network architectures. Images generated with p =∞,ξ = 10. The pixel values are scaled for visibility.
Figure 5: Diversity of universal perturbations for the GoogLeNet architecture. The five perturbations are generated usingdifferent random shufflings of the set X . Note that the normalized inner products by any pair of universal perturbations doesnot exceed 0.1, which highlights the diversity of such perturbations.
with other types of perturbations, namely i) random pertur-bation, ii) sum of adversarial perturbations over X , and iii)mean of the images (or ImageNet bias). For each perturba-tion, we depict a phase transition graph in Fig. 8 showingthe fooling rate on the validation set with respect to the `2norm of the perturbation. Different perturbation norms arethen achieved by scaling accordingly each perturbation witha multiplicative factor to have the target norm. Note that the
universal perturbation is computed for ξ = 2′000, and alsoscaled accordingly.
Observe that the proposed universal perturbation quicklyreaches very high fooling rates, even when the perturbationis constrained to be of small norm. For example, the uni-versal perturbation computed using Algorithm 1 achievesa fooling rate of 85% when the `2 norm is constrained toξ = 2′000, while other perturbations achieve much smaller
Table 2: Generalizability of the universal perturbations across different networks. The percentages indicate the fooling rates.The rows indicate the architecture for which the universal perturbations is computed, and the columns indicate the architecturefor which the fooling rate is reported.
VGG-F CaffeNet GoogLeNet VGG-16 VGG-19 ResNet-152VGG-F 93.7% 71.8% 48.4% 42.1% 42.1% 47.4 %CaffeNet 74.0% 93.3% 47.7% 39.9% 39.9% 48.0%GoogLeNet 46.2% 43.8% 78.9% 39.2% 39.8% 45.5%VGG-16 63.4% 55.8% 56.5% 78.3% 73.1% 63.4%VGG-19 64.0% 57.2% 53.6% 73.5% 77.8% 58.0%ResNet-152 46.3% 46.3% 50.5% 47.0% 45.5% 84.0%
Number of images in X500 1000 2000 4000
Fool
ing
ratio
(%)
0
10
20
30
40
50
60
70
80
90
Figure 6: Fooling ratio on the validation set versus the car-dinality of X . Note that even when the universal perturba-tion is computed on a very small set X (compared to train-ing and validation sets), the fooling ratio on the validationset is large.
ratios for comparable norms. In particular, random vectorssampled uniformly from the sphere of radius of 2′000 onlyfool 10% of the validation set. The large difference be-tween universal and random perturbations suggests that theuniversal perturbation exploits some geometric correlationsbetween different parts of the decision boundary of the clas-sifier. In fact, if the orientations of the decision boundary, inthe neighborhood of different data points, were completelyuncorrelated (and independent of the distance to the deci-sion boundary), the norm of the best universal perturbationwould be comparable to that of a random perturbation. Notethat the latter quantity is well understood (see [5]), as thenorm of the random perturbation required to fool a specificdata point precisely behaves as Θ(
√d‖r‖2), where d is the
dimension of the input space, and ‖r‖2 is the distance be-tween the data point and the decision boundary (or equiv-
alently, the norm of the smallest adversarial perturbation).For the considered ImageNet classification task, this quan-tity is equal to
√d‖r‖2 ≈ 2 × 104, for most data points,
which is at least one order of magnitude larger than the uni-versal perturbation (ξ = 2′000). This substantial differencebetween random and universal perturbations thereby sug-gests redundancies in the geometry of the decision bound-aries that we now explore.
For each image x in the validation set, we com-pute the adversarial perturbation vector r(x) =
arg minr ‖r‖2 s.t. k(x + r) 6= k(x). It is easy to seethat r(x) is normal to the decision boundary of the clas-sifier (at x + r(x)). The vector r(x) hence captures thelocal geometry of the decision boundary in the regionsurrounding the data point x. To quantify the correlationbetween different regions of the decision boundary of theclassifier, we define the matrix
N =
[r(x1)
‖r(x1)‖2. . .
r(xn)
‖r(xn)‖2
]of normal vectors to the decision boundaries in the vicin-ity of data points in the validation set. For binary linearclassifiers, the decision boundary is a hyperplane, and N isof rank 1, as all normal vectors are collinear. To capturemore generally the correlations in the decision boundary ofcomplex classifiers, we compute the singular values of thematrix N . The singular values of the matrix N , computedfor the CaffeNet architecture are shown in Fig. 9. We fur-ther show in the same figure the singular values obtainedwhen the columns of N are sampled uniformly at randomfrom the unit sphere. Observe that, while the latter singu-lar values have a slow decay, the singular values of N de-cay quickly, which confirms the existence of large corre-lations and redundancies in the decision boundary of deepnetworks. More precisely, this shows the existence of a sub-space S of low dimension d′ (with d′ � d), that containsmost normal vectors to the decision boundary in regionssurrounding natural images. We hypothesize that the exis-tence of universal perturbations fooling most natural imagesis partly due to the existence of such a low-dimensional sub-space that captures the correlations among different regions
great grey owl
platypus
nematode
dowitcher
Arctic fox
leopard
digital clock
fountain
slide rule
space shuttle
cash machine
pillow
computer keyboard
dining table
envelopemedicine chest
microwave
mosquito net
pencil box
plate rack
quilt
refrigerator
television
tray
wardrobe
window shade
Figure 7: Two connected components of the graph G = (V,E), where the vertices are the set of labels, and directed edgesi→ j indicate that most images of class i are fooled into class j.
0 2000 4000 6000 8000 100000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
UniversalRandomSumImageNet bias
Figure 8: Comparison between fooling rates of differentperturbations.
of the decision boundary. In fact, this subspace “collects”normals to the decision boundary in different regions, andperturbations belonging to this subspace are therefore likelyto fool datapoints. To verify this hypothesis, we choose arandom vector of norm ξ = 2′000 belonging to the sub-space S spanned by the first 100 singular vectors, and com-pute its fooling ratio on a different set of images (i.e., a setof images that have not been used to compute the SVD).Such a perturbation can fool nearly 38% of these images,thereby showing that a random direction in this well-soughtsubspace S significantly outperform random perturbations(we recall that such perturbations can only fool 10% of thedata). Fig. 10 illustrates the subspace S that captures thecorrelations in the decision boundary. It should further be
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5Index 104
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Sing
ular
val
ues
RandomAdversarial perturbations
Figure 9: Singular values of matrix N containing normalvectors to the decision decision boundary.
noted that the existence of this low dimensional subspacefurther explains the surprising generalization properties ofuniversal perturbations obtained in Fig. 6, where one canbuild relatively generalizable universal perturbations withonly 500 images (less than one image per class).
Unlike the above experiment, the proposed algorithmdoes not choose a random vector in this subspace, but ratherchooses a specific direction in order to maximize the over-all fooling rate. This explains the gap between the foolingrates obtained with the random vector strategy in S and Al-gorithm 1, respectively.
Figure 10: Illustration of the low dimensional subspaceS containing normal vectors to the decision boundary inregions surrounding natural images. For the purpose ofthis illustration, we super-impose three data-points {xi}3i=1,and the adversarial perturbations {ri}3i=1 that send the re-spective datapoints to the decision boundary {Bi}3i=1 areshown. Note that {ri}3i=1 all live in the subspace S.
5. ConclusionsWe showed the existence of small universal perturba-
tions that can fool state-of-the-art classifiers on natural im-ages. We proposed an iterative algorithm to generate uni-versal perturbations, and highlighted several properties ofsuch perturbations. In particular, we showed that universalperturbations generalize well across different classificationmodels, resulting in doubly-universal perturbations (image-agnostic, network-agnostic). We further explained the ex-istence of such perturbations with the correlation betweendifferent regions of the decision boundary. This providesinsights on the geometry of the decision boundaries of deepneural networks, and contributes to a better understandingof such systems. A theoretical analysis of the geometriccorrelations between different parts of the decision bound-ary will be the subject of future research.
References[1] O. Bastani, Y. Ioannou, L. Lampropoulos, D. Vytiniotis,
A. Nori, and A. Criminisi. Measuring neural net robustnesswith constraints. In Neural Information Processing Systems(NIPS), 2016. 2
[2] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman.Return of the devil in the details: Delving deep into convo-lutional nets. In British Machine Vision Conference, 2014.4
[3] A. Fawzi, O. Fawzi, and P. Frossard. Analysis of clas-sifiers’ robustness to adversarial perturbations. CoRR,abs/1502.02590, 2015. 2
[4] A. Fawzi and P. Frossard. Manitest: Are classifiers reallyinvariant? In British Machine Vision Conference (BMVC),pages 106.1–106.13, 2015. 2
[5] A. Fawzi, S. Moosavi-Dezfooli, and P. Frossard. Robustnessof classifiers: from adversarial to random noise. In NeuralInformation Processing Systems (NIPS), 2016. 2, 7
[6] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining andharnessing adversarial examples. In International Confer-ence on Learning Representations (ICLR), 2015. 2
[7] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learningfor image recognition. In IEEE Conference on ComputerVision and Pattern Recognition (CVPR), 2016. 2, 4
[8] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvari. Learn-ing with a strong adversary. CoRR, abs/1511.03034, 2015.3
[9] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir-shick, S. Guadarrama, and T. Darrell. Caffe: Convolu-tional architecture for fast feature embedding. In ACM Inter-national Conference on Multimedia (MM), pages 675–678,2014. 4
[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenetclassification with deep convolutional neural networks. InAdvances in neural information processing systems (NIPS),pages 1097–1105, 2012. 2
[11] Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng. Learn-ing hierarchical invariant spatio-temporal features for actionrecognition with independent subspace analysis. In Com-puter Vision and Pattern Recognition (CVPR), 2011 IEEEConference on, pages 3361–3368. IEEE, 2011. 2
[12] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deep-fool: a simple and accurate method to fool deep neural net-works. In IEEE Conference on Computer Vision and PatternRecognition (CVPR), 2016. 2, 3
[13] A. Nguyen, J. Yosinski, and J. Clune. Deep neural networksare easily fooled: High confidence predictions for unrecog-nizable images. In IEEE Conference on Computer Visionand Pattern Recognition (CVPR), pages 427–436, 2015. 2
[14] E. Rodner, M. Simon, R. Fisher, and J. Denzler. Fine-grainedrecognition in the noisy wild: Sensitivity analysis of con-volutional neural networks approaches. In British MachineVision Conference (BMVC), 2016. 2
[15] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh,S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein,A. Berg, and L. Fei-Fei. Imagenet large scale visual recog-nition challenge. International Journal of Computer Vision,115(3):211–252, 2015. 4
[16] S. Sabour, Y. Cao, F. Faghri, and D. J. Fleet. Adversarialmanipulation of deep representations. In International Con-ference on Learning Representations (ICLR), 2016. 2
[17] K. Simonyan and A. Zisserman. Very deep convolutionalnetworks for large-scale image recognition. In InternationalConference on Learning Representations (ICLR), 2014. 4
[18] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.Going deeper with convolutions. In IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2015. 4
[19] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan,I. Goodfellow, and R. Fergus. Intriguing properties of neuralnetworks. In International Conference on Learning Repre-sentations (ICLR), 2014. 2, 3, 4
[20] P. Tabacof and E. Valle. Exploring the space of adversar-ial images. IEEE International Joint Conference on NeuralNetworks, 2016. 2
[21] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface:Closing the gap to human-level performance in face verifica-tion. In IEEE Conference on Computer Vision and PatternRecognition (CVPR), pages 1701–1708, 2014. 2
A. AppendixFig. 11 shows the original images corresponding to the experiment in Fig. 3. Fig. 12 visualizes the graph showing
relations between original and perturbed labels (see Section 3 for more details).
Bouvier des Flandres Christmas stocking Scottish deerhound ski mask
porcupine killer whale European fire salamander toyshop
fox squirrel pot Arabian camel coffeepot
Figure 11: Original images. The first two rows are randomly chosen images from the validation set, and the last row ofimages are personal images taken from a mobile phone camera.
great grey owl
platypus
nematode
dowitcher
Arctic fox
leopard
digital clock
fountain
slide rule
space shuttle
junco
common iguana
axolotl
tree frog
banded gecko
American chameleon
green lizard
African chameleon
African crocodile
water snake
green mamba
cougar
rhinoceros beetle
weevil
grasshopper
cricket
walking stick
mantis
lacewing
Band Aid
cliff dwelling
hammer
photocopier
tank
tub
dough
burrito
green snake
tick
long-horned beetle
night snake
house �inch
African grey
chickadee
water ouzel
kite
black and gold garden spider
garden spider
sulphur-crested cockatoo
conch
hermit crab
white stork
ruddy turnstone
red-backed sandpiper
oystercatcher
albatross
grey whale
killer whale
Maltese dog
Pekinese
Walker hound
Lhasa
schipperke
Eskimo dog Great Pyrenees
Samoyed
Pembroke
toy poodle
Persian cat
ice bear
hartebeest
llama
sturgeon
abaya
acoustic guitar
aircraft carrier
airliner
airship
amphibian
assault ri�le
backpack
balance beam
balloon
ballpoint
bannister
barbell
barbershop
barn
bassoon
bathtub
beacon
bell cote
bikini
binoculars
boathouse
bobsled
bow tie
brassiere
breakwater
breastplate
bucketcaldron
cannon
can opener
car mirror
carton
cassette player
castle
catamaran
CD player
chime
church
cleaver
cocktail shaker
container ship
convertible
corkscrew
cornet
cowboy hat
cradle
crane
dam
desk
desktop computer
diaper
dock
dogsled
drilling platform
drum
drumstick
dumbbell
espresso maker
�ireboat
�lagpole
�lute
folding chair
fountain pen
frying pan
garbage truck
gasmask
golf ball
gown
grand piano
guillotine
hair spray
hand blower
harmonica
harvester
home theater
hook
iPod
iron
joystick
knee pad
lab coat
ladle
letter opener
lifeboat
liner
Loafer
loudspeaker
loupe
maillot
measuring cup
megalith
microphone
military uniform
missile
mixing bowl
mobile home
modem
monastery
mortarboard
mosque
mountain tent
mouse
mousetrap
moving van
muzzle
nail
nipple
oboe
oxygen mask paintbrush
palace
paper towel
passenger car
patio
pedestal
pencil sharpener
Petri dish
pier
ping-pong ball
pirate
plane
planetarium
plastic bag
plunger
Polaroid camera
pole
pool table
power drill
printer
projectile
projector
punching bag
purse
quill
racket
radiator
radio telescope
re�lex camera
rubber eraser
saltshaker
sandal
scale
screwdriver
seat belt
shoji
ski
sliding door
snowmobile
snowplow
soap dispenser
solar dish
space heater
spatula
speedboat
spotlight
steam locomotive
steel arch bridge
stethoscope
strainer
studio couch
stupa
submarine
suit
sunglass
sunscreen
suspension bridge
swimming trunks
switch
syringe
table lampteapot
thatch
toaster
trailer truck
trench coat
trimaran
tripod
turnstile
vacuum
vase
viaduct
violin
volleyball
warplane
washbasin
washer
water tower
whiskey jugwhistle
wig
wing
wok
worm fence
wreck
yawl
traf�ic light
plate
consomme
ice cream
hay
chocolate sauce
alp
cliff
geyser
lakeside
promontory
sandbar
seashore
valleyvolcano
groom
toilet tissue
great white shark
macaw
tiger shark
hammerheadelectric ray
stingray
gold�inch
indigo bunting
bulbul
jay
magpie
bald eagle
vine snake
barn spider
bee eater
hornbill
hummingbird
jacamar
jelly�ish
chambered nautilus
American egret
dugong
Siberian husky
ladybug
leaf beetle
�ly
ant
lea�hopper
dragon�ly
damsel�ly
admiral
cabbage butter�ly
sulphur butter�ly
lycaenid
Angora
beer glass
candle
cinema
cloak
electric guitar
jack-o'-lantern
lampshade
lens cap
lighter
matchstick
obelisk
parachute
parallel bars
pinwheel
ri�le
schooner
soup bowl
stage
tennis ball
torch
upright
water jug
bell pepper
lemon
red wine
scuba diver
ringneck snake
banana
American lobster
cray�ish
cockroach
Kerry blue terrier
giant schnauzer
miniature poodle
tabbytiger cat
Egyptian cat
lynx
black widow
Indian elephant
sloth bear
African elephant
cello
crutch
fur coat
maillot
miniskirt
neck brace
pickelhaube
potter's wheel
prison
sax
shovel
spindle
Crock Pot
Dutch oven
bulletproof vest
cardiganjersey
Windsor tie
chest
chiffonier
safe
stove
crash helmet
mask
shower cap
ski mask
sunglasses
cash machine
pillow
computer keyboard
dining table
envelopemedicine chest
microwave
mosquito net
pencil box
plate rack
quilt
refrigerator
television
tray
wardrobe
window shade
mailbag
beaker
thimble
china cabinet
coffee mug
coffeepot
lipstick
oil �ilter
perfume
wine bottle
espresso
cup
eggnog
binder
wallet
monitor
notebook
oscilloscope
screen
tape player
barrel
wool
bath towel
broom
crate
face powder
knot
lotion
maraca
mitten
mortar
pick
pill bottle
rule
sweatshirt
velvet
wooden spoon
ice lolly
cheeseburger
hotdog
spaghetti squash
butternut squash
Granny Smith
strawberry
Figure 12: Graph representing the relation between original and perturbed labels. Note that “dominant labels” appearsystematically. Please zoom for readability. Isolated nodes are removed from this visualization for readability.