17
Towards Robust Neural Networks via Random Self-ensemble Xuanqing Liu 1 , Minhao Cheng 1 , Huan Zhang 2 , and Cho-Jui Hsieh 1,3 1 Electrical and Computer Science, UC Davis, Davis CA 95616, USA {xqliu, mhcheng}@ucdavis.edu 2 Electrical and Computer Engineering, UC Davis, Davis CA 95616, USA [email protected] 3 Department of Statistics, UC Davis, Davis CA 95616, USA [email protected] Abstract. Recent studies have revealed the vulnerability of deep neural networks: A small adversarial perturbation that is imperceptible to hu- man can easily make a well-trained deep neural network misclassify. This makes it unsafe to apply neural networks in security-critical applications. In this paper, we propose a new defense algorithm called Random Self- Ensemble (RSE) by combining two important concepts: randomness and ensemble. To protect a targeted model, RSE adds random noise layers to the neural network to prevent the strong gradient-based attacks, and ensembles the prediction over random noises to stabilize the perfor- mance. We show that our algorithm is equivalent to ensemble an ininite number of noisy models fϵ without any additional memory overhead, and the proposed training procedure based on noisy stochastic gradient descent can ensure the ensemble model has a good predictive capability. Our algorithm signiicantly outperforms previous defense techniques on real data sets. For instance, on CIFAR-10 with VGG network (which has 92% accuracy without any attack), under the strong C&W attack within a certain distortion tolerance, the accuracy of unprotected model drops to less than 10%, the best previous defense technique has 48% accuracy, while our method still has 86% prediction accuracy under the same level of attack. Finally, our method is simple and easy to integrate into any neural network. 1 Introduction Deep neural networks have demonstrated their success in many machine learning and computer vision applications, including image classiication [14, 7, 35, 9, 34], object recognition [30] and image captioning [38]. Despite having near-perfect prediction performance, recent studies have revealed the vulnerability of deep neural networks to adversarial examples—given a correctly classiied image, a carefully designed perturbation to the image can make a well-trained neural network misclassify. Algorithms crafting these adversarial images, called attack algorithms, are designed to minimize the perturbation, thus making adversarial images hard to be distinguished from natural images. This leads to security

Towards Robust Neural Networks via Random Self-ensembleopenaccess.thecvf.com/content_ECCV_2018/papers/Xuanqing...If we denote the original neural network as f(w,x) where w ∈ Rdw

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Towards Robust Neural Networks via RandomSelf-ensemble

    Xuanqing Liu1, Minhao Cheng1, Huan Zhang2, and Cho-Jui Hsieh1,3

    1 Electrical and Computer Science, UC Davis, Davis CA 95616, USA{xqliu, mhcheng}@ucdavis.edu

    2 Electrical and Computer Engineering, UC Davis, Davis CA 95616, [email protected]

    3 Department of Statistics, UC Davis, Davis CA 95616, [email protected]

    Abstract. Recent studies have revealed the vulnerability of deep neuralnetworks: A small adversarial perturbation that is imperceptible to hu-man can easily make a well-trained deep neural network misclassify. Thismakes it unsafe to apply neural networks in security-critical applications.In this paper, we propose a new defense algorithm called Random Self-Ensemble (RSE) by combining two important concepts: randomnessand ensemble. To protect a targeted model, RSE adds random noiselayers to the neural network to prevent the strong gradient-based attacks,and ensembles the prediction over random noises to stabilize the perfor-mance. We show that our algorithm is equivalent to ensemble an ininitenumber of noisy models fϵ without any additional memory overhead,and the proposed training procedure based on noisy stochastic gradientdescent can ensure the ensemble model has a good predictive capability.Our algorithm signiicantly outperforms previous defense techniques onreal data sets. For instance, on CIFAR-10 with VGG network (which has92% accuracy without any attack), under the strong C&W attack withina certain distortion tolerance, the accuracy of unprotected model dropsto less than 10%, the best previous defense technique has 48% accuracy,while our method still has 86% prediction accuracy under the same levelof attack. Finally, our method is simple and easy to integrate into anyneural network.

    1 Introduction

    Deep neural networks have demonstrated their success in many machine learningand computer vision applications, including image classiication [14, 7, 35, 9, 34],object recognition [30] and image captioning [38]. Despite having near-perfectprediction performance, recent studies have revealed the vulnerability of deepneural networks to adversarial examples—given a correctly classiied image, acarefully designed perturbation to the image can make a well-trained neuralnetwork misclassify. Algorithms crafting these adversarial images, called attackalgorithms, are designed to minimize the perturbation, thus making adversarialimages hard to be distinguished from natural images. This leads to security

  • 2 X. Liu, M. Cheng, H. Zhang and C-J. Hsieh

    concerns, especially when applying deep neural networks to security-sensitivesystems such as self-driving cars and medical imaging.

    To make deep neural networks more robust to adversarial attacks, severaldefense algorithms have been proposed recently [23, 40, 17, 16, 39]. However, re-cent studies showed that these defense algorithms can only marginally improvethe accuracy under the adversarial attacks [4, 5].

    In this paper, we propose a new defense algorithm: Random Self-Ensemble(RSE). More speciically, we introduce the new “noise layer” that fuses inputvector with randomly generated noise, and then we insert this layer before eachconvolution layer of a deep network. In the training phase, the gradient is stillcomputed by back-propagation but it will be perturbed by random noise whenpassing through the noise layer. In the inference phase, we perform several for-ward propagations, each time with diferent prediction scores due to the noiselayers, and then ensemble the results. We show that RSE makes the networkmore resistant to adversarial attacks, by virtue of the proposed training andtesting scheme. Meanwhile, it will only slightly afect test accuracy when no at-tack is performed on natural images. The algorithm is trivial to implement andcan be applied to any deep neural networks for the enhancement.

    Intuitively, RSE works well because of two important concepts: ensembleand randomness. It is known that ensemble of several trained models canimprove the robustness [29], but will also increase the model size by k folds. Incontrast, without any additional memory overhead, RSE can construct ininitenumber of models fϵ, where ϵ is generated randomly, and then ensemble theresults to improve robustness. But how to guarantee that the ensemble of thesemodels can achieve good accuracy? After all, if we train the original modelwithout noise, yet only add noise layers at the inference stage, the algorithmis going to perform poorly. This suggests that adding random noise to an pre-trained network will only degrade the performance. Instead, we show that ifthe noise layers are taken into account in the training phase, then the trainingprocedure can be considered as minimizing the upper bound of the loss of modelensemble, and thus our algorithm can achieve good prediction accuracy.

    The contributions of our paper can be summarized below:

    – We propose the Random Self-Ensemble (RSE) approach for improving therobustness of deep neural networks. The main idea is to add a “noise layer”before each convolution layer in both training and prediction phases. Thealgorithm is equivalent to ensemble an ininite number of random models todefense against the attackers.

    – We explain why RSE can signiicantly improve the robustness toward ad-versarial attacks and show that adding noise layers is equivalent to trainingthe original network with an extra regularization of Lipschitz constant.

    – RSE signiicantly outperforms existing defense algorithms in all our experi-ments. For example, on CIFAR-10 data and VGG network (which has 92%accuracy without any attack), under C&W attack the accuracy of unpro-tected model drops to less than 10%; the best previous defense techniquehas 48% accuracy; while RSE still has 86.1% prediction accuracy under the

  • RSE for Robust Neural Networks 3

    same strength of attacks. Moreover, RSE is easy to implement and can becombined with any neural network.

    2 Related Work

    Security of deep neural networks has been studied recently. Let us denote theneural network as f(w, x) where w is the model parameters and x is the inputimage. Given a correctly classiied image x0 (f(w, x0) = y0), an attacking al-gorithm seeks to ind a slightly perturbed image x′ such that: (1) the neuralnetwork will misclassify this perturbed image; and (2) the distortion ∥x′ − x0∥is small so that the perturbation is hard to be noticed by human. A defensealgorithm is designed to improve the robustness of neural networks against at-tackers, usually by slightly changing the loss function or training procedure. Inthe following, we summarize some recent works along this line.

    2.1 White-box attack

    In the white-box setting, attackers have all information about the targeted neu-ral network, including network structure and network weights (denoted by w).Using this information, attackers can compute gradient with respect to inputdata ∇xf(w, x) by back-propagation. Note that gradient is very informative forattackers since it characterizes the sensitivity of the prediction with respect tothe input image.

    To craft an adversarial example, [11] proposed a fast gradient sign method(FGSM), where the adversarial example is constructed by

    x′ = x0 − ϵ · sign(∇xf(w, x0)) (1)

    with a small ϵ > 0. Based on that, several followup works were made to improvethe eiciency and availability, such as Rand-FGSM [32] and I-FGSM [17]. Re-cently, Carlini & Wagner [5] showed that constructing an adversarial examplecan be formulated as solving the following optimization problem:

    x′ = minx∈[0,1]d

    c · g(x) + ∥x− x0∥22, (2)

    where the irst term is the loss function that characterizes the success of theattack and the second term is to enforce a small distortion. The parameterc > 0 is used to balance these two requirements. Several variants were proposedrecently [6, 20], but most of them can be categorized in the similar framework.The C&W attack has been recognized as a strong attacking algorithm to testdefense methods.

    For untargeted attack, where the goal is to ind an adversarial example thatis close to the original example but yields diferent class prediction, the lossfunction in (2) can be deined as

    g(x) = max{maxi ̸=t

    (Z(x′)i)− Z(x′)t,−κ}, (3)

  • 4 X. Liu, M. Cheng, H. Zhang and C-J. Hsieh

    where t is the correct label, Z(x) is the network’s output before softmax layer(logits).

    For targeted attack, the loss function can be designed to force the classiierto return the target label. For attackers, targeted attack is strictly harder thanuntargeted attack (since once the targeted attack succeeds, the same adversarialimage can be used to perform untargeted attack without any modiication). Onthe contrary, for defenders, untargeted attacks are strictly harder to defense thantargeted attack. Therefore, we focus on defending the untargeted attacks in ourexperiments.

    2.2 Defense Algorithms

    Because of the vulnerability of adversarial examples [31], several methods havebeen proposed to improve the network’s robustness against adversarial exam-ples. [24] proposed defensive distillation, which uses a modiied softmax layercontrolled by temperature to train the “teacher” network, and then use the pre-diction probability (soft-labels) of teacher network to train the student network(it has the same structure as the teacher network). However, as stated in [5], thismethod does not work properly when dealing with the C&W attack. Moreover,[40] showed that by using a modiied ReLU activation layer (called BReLU) andadding noise into origin images to augment the training dataset, the learnedmodel will gain some stability to adversarial images. Another popular defenseapproach is adversarial training [17, 16]. It generates and appends adversarial ex-amples found by an attack algorithm to the training set, which helps the networkto learn how to distinguish adversarial examples. Through combining adversarialtraining with enlarged model capacity, [20] is able to create an MNIST modelthat is robust to the irst order attacks, but this approach does not work verywell on more diicult datasets such as CIFAR-10.

    It is worth mentioning that there are many defense algorithms (r.f. [3, 19,13, 8, 36, 27, 25]) against white box attacks in literature. Unfortunately, as [2,1] pointed out, these algorithms are not truly efective to white box attacks.Recall the “white box” means that the attackers know everything concerninghow models make decisions, these include the potential defense mechanisms. Inthis condition, the white box attacks can walk around all defense algorithmslisted above and the accuracy under attack can still be nearly zero. In addi-tion to changing the network structure, there are other methods [39, 21, 10, 12]“detecting” the adversarial examples, which are beyond the scope of our paper.

    There is another highly correlated work (r.f. [18]) which also adopts verysimilar idea, except that they view this problem from the angle of diferentialprivacy, while we believe that the adversarial robustness is more correlated withregularization and ensemble learning. Furthermore, our work is public availableearlier than this similar work on Arxiv.

  • RSE for Robust Neural Networks 5

    ConvBlock PoolingConvBlock ConvBlock ConvBlock Pooling FC

    noiselayer

    convlayer

    batchnorm

    activation

    �in �out = �in + �

    noiselayer

    � ∼ �(0, �1)

    Fig. 1. Our proposed noisy VGG style network, we add a noise layer before eachconvolution layer. For simplicity, we call the noise layer before the irst convolutionlayer the “init-noise”, and all other noise layer “inner-noise”. For these two kinds oflayers we adopt diferent variances of Gaussian noise. Note that similar design can betransplanted to other architectures such as ResNet.

    3 Proposed Algorithm: Random Self-Ensemble

    In this section, we propose our self-ensemble algorithm to improve the robustnessof neural networks. We will irst motivate and introduce our algorithm and thendiscuss several theoretical reasons behind it.

    It is known that ensemble of several diferent models can improve the robust-ness. However, an ensemble of inite k models is not very practical because it willincrease the model size by k folds. For example, AlexNet model on ImageNetrequires 240MB storage, and storing 100 of them will require 24GB memory.Moreover, it is hard to ind many heterogeneous models with similar accuracy.To improve the robustness of practical systems, we propose the following self-ensemble algorithm that can generate an ininite number of models on-the-lywithout any additional memory cost.

    Our main idea is to add randomness into the network structure. More specif-ically, we introduce a new “noise layer” that fuses input vector with a randomlygenerated noise, i.e. x → x + ϵ when passing through the noise layer. Then weadd this layer before each convolution layer as shown in Fig. 1. Since most at-tacks require computing or estimating gradient, the noise level in our model willcontrol the success rate of those attacking algorithms. In fact, we can integratethis layer into any other neural network.

    If we denote the original neural network as f(w, x) where w ∈ Rdw is theweights and x ∈ Rdx is the input image, then considering the random noiselayer, the network can be denoted as fϵ(w, x) with random ϵ ∈ Rde . Thereforewe have an ininite number of models in the pocket (with diferent ϵ) without

  • 6 X. Liu, M. Cheng, H. Zhang and C-J. Hsieh

    Algorithm 1 Training and Testing of Random Self-Ensemble (RSE)Training phase:for iter = 1, 2, . . . do

    Randomly sample (xi, yi) in datasetRandomly generate ϵ∼N (0, σ2) for each noise layer.Compute ∆w = ∇wℓ(fϵ(w, xi), yi) (Noisy gradient)Update weights: w ← w −∆w.

    end forTesting phase:Given testing image x, initialize p = (0, 0, . . . , 0)for j = 1, 2, . . . ,#Ensemble do

    Randomly generate ϵ∼N (0, σ2) for each noise layer.Forward propagation to calculate probability output

    pj = fϵ(w, x)

    Update p: p← p+ pj .end forPredict the class with maximum score ŷ = argmaxk pk

    having any memory overhead. However, adding randomness will also afect theprediction accuracy of the model. How can we make sure that the ensemble ofthese random models have enough accuracy?

    A critical observation is that we need to add this random layer in bothtraining and testing phases. The training and testing algorithms are listed inAlgorithm 1. In the training phase, gradient is computed as ∇wfϵ(w, xi) whichincludes the noise layer, and the noise is generated randomly for each stochasticgradient descent update. In the testing phase, we construct n random noises andensemble their probability outputs by

    p =

    n∑

    j=1

    fϵj (w, x), and predict ŷ = argmaxk

    pk. (4)

    If we do not care about the prediction time, n can be very large, but in practicewe found it saturates at n ≈ 10 (see Fig. 4).

    This approach is diferent from Gaussian data augmentation in [40]: theyonly add Gaussian noise to images during the training time, while we add noisebefore each convolution layer at both training and inference time. When training,the noise helps optimization algorithm to ind a stable convolution ilter that isrobust to perturbed input, while when testing, the roles of noise are two-folded:one is to perturb the gradient to fool gradient-based attacks.The other is it givesdiferent outputs by doing multiple forward operations and a simple ensemblemethod can improve the testing accuracy.

  • RSE for Robust Neural Networks 7

    3.1 Mathematical explanations

    Training and testing of RSE. Here we explain our training and testing proce-dure. In the training phase, our algorithm is solving the following optimizationproblem:

    w∗ = argminw

    1

    |Dtrain|

    (xi,yi)∈Dtrain

    Eϵ∼N (0,σ2)

    ℓ(

    fϵ(w, xi), yi)

    , (5)

    where ℓ(·, ·) is the loss function and Dtrain is the training dataset. Note thatfor simplicity we assume ϵ follows a zero-mean Gaussian, but in general ouralgorithm can work for a large variety of noise distribution such as Bernoulli-Gaussian: ϵi = biei, where ei

    iid∼ N (0, σ2) and bi

    iid∼ B(1, p).

    At testing time, we ensemble the outputs through several forward propaga-tion, speciically:

    ŷi = argmaxEϵ∼N (0,σ2)fϵ(w, xi). (6)

    Here argmax means the index of maximum element in a vector. The reasonthat our RSE algorithm achieves the similar prediction accuracy with originalnetwork is because (5) is minimizing an upper bound of the loss of (6) – Similarto the idea of [22], if we choose negative log-likelihood loss, then ∀w ∈ Rdw :

    1

    |Dtrain|

    (xi,yi)∈Dtrain

    Eϵ∼N (0,σ2)ℓ(

    fϵ(w, xi), yi)

    (a)≈ E(xi,yi)∼Pdata

    {

    − Eϵ∼N (0,σ2) log fϵ(w, xi)[yi]}

    (b)

    ≥ E(xi,yi)∼Pdata

    {

    − logEϵ∼N (0,σ2)fϵ(w, xi)[yi]}

    (c)

    ≥ E(xi,yi)∼Pdata

    {

    − logEϵ∼N (0,σ2)fϵ(w, xi)[ŷi]}

    (a)≈

    1

    |Dtest|

    xi∈Dtest

    − logEϵ∼N (0,σ2)fϵ(w, xi)[ŷi].

    (7)

    Where Pdata is the data distribution, Dtrain/test is the training set and test set,respectively. And (a) follows from generalization bound (see [28] or appendix fordetails), (b) comes from Jensen’s inequality and (c) is by the inference rule (6).So by minimizing (5) we are actually minimizing the upper bound of inferenceconidence − log fϵ(w, xi)[ŷi], this validates our ensemble inference procedure.

    RSE is equivalent to Lipschitz regularization. Another point of view is thatperturbed training is equivalent to Lipschitz regularization, which further helpsdefensing gradient based attack. If we ix the output label y then the loss functionℓ(fϵ(w, x), y) can be simply denoted as ℓ ◦ fϵ. Lipchitz of the function ℓ ◦ fϵ is aconstant Lℓ◦fϵ such that

    |ℓ(fϵ(w, x), y)− ℓ(fϵ(w, x̃), y)| ≤ Lℓ◦fϵ∥x− x̃∥ (8)

  • 8 X. Liu, M. Cheng, H. Zhang and C-J. Hsieh

    for all x, x̃. In fact, it has been proved recently that Lipschitz constant can beused to measure the robustness of machine learning model [15, 33]. If Lℓ◦fϵ islarge enough, even a tiny change of input x− x̃ can signiicantly change the lossand eventually get an incorrect prediction. On the contrary, by controlling Lℓ◦fto be small, we will have a more robust network.

    Next we show that our noisy network indeed controls the Lipschitz constant.Following the notation of (5), we can see that

    Eϵ∼N (0,σ2)ℓ(

    fϵ(w, xi), yi) (a)≈ Eϵ∼N (0,σ2)

    [

    ℓ(

    f0(w, xi), yi)

    + ϵ⊺∇ϵℓ(

    f0(w, xi), yi)

    +1

    2ϵ⊺∇2ϵℓ

    (

    f0(w, xi), yi)

    ϵ]

    (b)= ℓ

    (

    f0(w, xi), yi)

    +σ2

    2Tr

    {

    ∇2ϵℓ(

    f0(w, xi), yi)

    }

    .

    (9)For (a), we do Taylor expansion at ϵ = 0. Since we set the variance of noiseσ2 very small, we only keep the second order term. For (b), we notice that theGaussian vector ϵ is i.i.d. with zero mean. So the linear term of ϵ has zeroexpectation, and the quadratic term is directly dependent on variance of noiseand the trace of Hessian. As a convex relaxation, if we assume ℓ ◦ f0 is convex,then we have that d · ∥A∥max ≥ Tr(A) ≥ ∥A∥max for A ∈ Sd×d+ , we can rewrite(9) as

    Loss(fϵ, {xi}, {yi}) ≃ Loss(f0, {xi}, {yi}) +σ2

    2Lℓ◦f0 , (10)

    which means the training of noisy networks is equivalent to training the originalmodel with an extra regularization of Lipschitz constant, and by controlling thevariance of noise we can balance the robustness of network with training loss.

    3.2 Discussions

    Here we show both randomness and ensemble are important in our algorithm.Indeed, if we remove any component, the performance will signiicantly drop.

    First, as mentioned before, the main idea of our model is to have ininitenumber of models fϵ, each with a diferent ϵ value, and then ensemble the result.A naive way to achieve this goal is to ix a pre-trained model f0 and thengenerate many fϵ in the testing phase by adding diferent small noise to f0.However, Fig. 2 shows this approach (denoted as Test noise only) will result inmuch worse performance (20% without any attack). Therefore it is non-trivial toguarantee the model to be good after adding small random noise. In our randomself-ensemble algorithm, in addition to adding noise in the testing phase, we alsoadd noise layer in the training phase, and this is important for getting goodperformance.

    Second, we found adding noise in the testing phase and then ensemble thepredictions is important. In Fig. 2, we compare the performance of RSE with theversion that only adds the noise layer in the training phase but not in the testingphase (so the prediction is fϵ(w, x) instead of Eϵfϵ(w, x)). The results clearly

  • RSE for Robust Neural Networks 9

    show that the performance drop under smaller attacks. This proves ensemblein the testing phase is crucial.

    10 4 10 2 100 102c

    0

    20

    40

    60

    80

    Accu

    racy

    (%)

    Train+Test noiseTrain noise onlyTest noise onlyBaseline

    Fig. 2. We test three models on CIFAR10 and VGG16 network: In the irst model noiseis added both at training and testing time, in the second model noise is added only attraining time, in the last model we only add noise at testing time. As a comparison wealso plot baseline model which is trained conventionally. For all models that are noisyat testing time, we automatically enable self-ensemble.

    4 Experiments

    Datasets and network structure We test our method on two datasets—CIFAR10and STL10. We do not compare the results on MNIST since it is a much eas-ier dataset and existing defense methods such as [23, 40, 17, 16] can efectivelyincrease image distortion under adversarial attacks. On CIFAR10 data, we eval-uate the performance on both VGG-16 [26] and ResNeXt [37]; on STL10 datawe copy and slightly modify a simple model4 which we name it as “Model A”.

    Defense algorithms. We include the following defense algorithms into comparison(their parameter settings can be found in Tab. 1):

    – Random Self-Ensemble (RSE): our proposed method.– Defensive distillation [24]: irst train a teacher network at temperature T ,

    then use the teacher network to train a student network of the same archi-tecture and same temperature. The student network is called the distillednetwork.

    – Robust optimization combined with BReLU activation [40]: irst we replaceall ReLU activation with BReLU activation. And then at the training phase,we randomly perturb training data by Gaussian noise with σ = 0.05 assuggested.

    4 Publicly available at https://github.com/aaron-xichen/pytorch-playground

  • 10 X. Liu, M. Cheng, H. Zhang and C-J. Hsieh

    – Adversarial retraining by FGSM attacks [17, 16]: we irst pre-train a neu-ral network without adversarial retraining. After that, we either select anoriginal data batch or an adversarial data batch with probability 1/2. Wecontinue training it until convergence.

    Attack models. We consider the white-box setting and choose the state-of-the-artC&W attack [5] to evaluate the above-mentioned defense methods. Moreover, wetest our algorithm under untargeted attack, since untargeted attack is strictlyharder to defense than targeted attack. In fact, C&W untargeted attack is themost challenging attack for a defense algorithm.

    Moreover, we assume C&W attack knows the randomization procedure ofRSE, so the C&W objective function will change accordingly (as proposed in [1]for attacking an ensemble model). The details can be found in the appendix.

    Measure. Unlike attacking models that only need to operate on correctly clas-siied images, a competitive defense model not only protects the model whenattackers exist, but also keeps a good performance on clean datasets. Basedon this thought, we compare the accuracy of guarded models under diferentstrengths of C&W attack, the strength can be measured by L2-norm of imagedistortion and further controlled by parameter c in (2). Note that an adversarialimage is correctly predicted under C&W attack if and only if the original imageis correctly classiied and C&W attack cannot ind an adversarial example withina certain distortion level.

    10 3 10 2 10 1 100 101 102c

    0

    20

    40

    60

    80

    Accu

    racy

    (%)

    No noiseInit=0.05, Inner=0.02Init=0.1, Inner=0.05Init=0.2, Inner=0.1

    0.00 0.01 0.02 0.03 0.04 0.05-distortion

    20

    40

    60

    80

    Accu

    racy

    (%)

    RSEMandry et al.

    Fig. 3. Left: the efect of noise level on robustness and generalization ability. Clearlyrandom noise can improve the robustness of the model. Right: comparing RSE withadversarial defense method [20].

    4.1 The efect of noise level

    We irst test the performance of RSE under diferent noise levels. We use Gaus-sian noise for all the noise layers in our network and the standard deviation σ of

  • RSE for Robust Neural Networks 11

    10 2 10 1 100 101 102c

    0

    20

    40

    60

    80

    Accu

    racy

    (%)

    50-ensemble1-ensemble

    0 10 20 30 40 50#Ensemble

    80

    82

    84

    86

    88

    90

    92

    94

    Accu

    racy

    (%)

    Init=0.05, Inner=0.02Init=0.1, Inner=0.05Init=0.2, Inner=0.1No noise

    Fig. 4. Left: Comparing the accuracy under diferent levels of attack, here we chooseVGG16+CIFAR10 combination. We can see that the ensemble model achieves betteraccuracy under weak attacks. Right: Testing accuracy (without attack) of diferent n(number of random models used for ensemble).

    Table 1. Experiment setting for defense methods

    Methods Settings

    No defense Baseline modelRSE(for CIFAR10 + VGG16) Initial noise: 0.2, inner noise: 0.1, 50-ensembleRSE(for CIFAR10 + ResNeXt) Initial noise: 0.1, inner noise 0.1, 50-ensembleRSE(for STL10 + Model A) Initial noise: 0.2, inner noise: 0.1, 50-ensembleDefensive distill Temperature = 40Adversarial training (I) FGSM adversarial examples, ϵ ∼ U(0.1, 0.3)Adversarial training (II) Following [20], PGD adversary with ϵ∞ = 8.0256Robust Opt. + BReLU Following [40]

    Gaussian controls the noise level. Note that we call the noise layer before the irstconvolution layer the “init-noise”, and all other noise layers the “inner-noise”.

    In this experiment, we apply diferent noise levels in both training and testingphases to see how diferent variances change the robustness as well as general-ization ability of networks. As an example, we choose

    (σinit, σinner) = {(0, 0), (0.05, 0.02), (0.1, 0.05), (0.2, 0.1)} (11)

    on VGG16+CIFAR10. The result is shown in Fig. 3 (left).As we can see, both “init-noise” and “inner-noise” are beneicial to the robust-

    ness of neural network, but at the same time, one can see higher noise reducesthe accuracy for weak attacks (c ≲ 0.01). From Fig. 3, we observe that if theinput image is normalized to [0, 1], then choosing σinit = 0.2 and σinner = 0.1 isgood. Thus we ix this parameter for all the experiments.

  • 12 X. Liu, M. Cheng, H. Zhang and C-J. Hsieh

    Table 2. Prediction accuracy of defense methods under C&W attack with diferent c.We can clearly observe that RSE is the most robust model. Our accuracy level remainsat above 75% when other methods are below 30%.

    c = 0.01 c = 0.03 c = 0.06 c = 0.1 c = 0.2

    RSE(ours) 90.00% 86.06% 79.44% 67.19% 34.75%Adv retraining 27.00% 9.81% 4.13% 3.69% 1.44%

    Robust Opt+BReLU 75.06% 47.93% 30.94% 20.69% 13.50%Distill 49.88% 17.69% 4.56% 3.13% 1.44%

    No defense 30.38% 8.93% 5.06% 3.56% 2.19%

    10 2 10 1 100 101 102C

    0

    20

    40

    60

    Accu

    racy

    (%)

    STL-10 + ModelANo defenseAdv retrainingRobust Opt+BReLURSEDistill

    10 2 10 1 100 101 102C

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    Avg.

    Dist

    ortio

    n

    No defenseAdv retrainingRobust Opt+BReLURSEDistill

    10 2 10 1 100 101 102C

    0

    20

    40

    60

    80

    CIFAR10 + VGG16No defenseAdv retrainingRobust Opt+BReLURSEDistill

    10 2 10 1 100 101 102C

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2 No defenseAdv retrainingRobust Opt+BReLURSEDistill

    10 2 10 1 100 101 102C

    0

    20

    40

    60

    80

    CIFAR10 + ResNeXtNo defenseAdv retrainingRobust Opt+BReLURSEDistill

    10 2 10 1 100 101 102C

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0 No defenseAdv retrainingRobust Opt+BReLURSEDistill

    Fig. 5. Comparing the accuracy of CIFAR10+{VGG16, ResNeXt} and STL10+ModelA. We show both the change of accuracy and average distortion w.r.t. attackingstrength parameter c (the parameter in the C&W attack). Our model (RSE) clearlyoutperforms all the existing methods under strong attacks in both accuracy and averagedistortion.

    4.2 Self-ensemble

    Next we show self-ensemble helps to improve the test accuracy of our noisy mode.As an example, we choose VGG16+CIFAR10 combination and the standarddeviation of initial noise layer is σ = 0.2, other noise layers is σ = 0.1. Wecompare 50-ensemble with 1-ensemble (i.e. single model), and the result can befound in Fig. 4.

    We ind the 50-ensemble method outperform the 1-ensemble method by ∼8%accuracy when c < 0.4. This is because when the attack is weak enough, themajority choice of networks has lower variance and higher accuracy. On theother hand, we can see if c > 1.0 or equivalently the average distortion greaterthan 0.93, the ensemble model is worse. We conjecture that this is because whenthe attack is strong enough then the majority of random sub-models make wrongprediction, but when looking at any individual model, the random efect might

  • RSE for Robust Neural Networks 13

    Fig. 6. Targeted adversarial image distortion, each column indicates a defense algo-rithm and each row is the adversarial target (the original image is in “ship” class,shown in the right side). Here we choose c = 1 for targetd C&W attack. Visually, colorspot means the distortion of images, thus a successful defending method should leadto more spots.

    bird car cat deer dog frog horse plane truckNo defense 1.94 0.31 0.74 4.72 7.99 3.66 9.22 0.75 1.32

    Defensive distill 6.55 0.70 13.78 2.54 13.90 2.56 11.36 0.66 3.54Adv. retraining 2.58 0.31 0.75 6.08 0.75 9.01 6.06 0.31 4.08

    Robust Opt. + BReLU 17.11 1.02 4.07 13.50 7.09 15.34 7.15 2.08 17.57RSE(ours) 12.87 2.61 12.47 21.47 31.90 19.09 9.45 10.21 22.15

    Table 3. Image distortion required for targeted attacks.

    be superior than group decision. In this situation, self-ensemble may have anegative efect on accuracy.

    Practically, if running time is the primary concern, it is not necessary to calcu-late many ensemble models. In fact, we ind the accuracy saturates rapidly withrespect to number of models, moreover, if we inject smaller noise then ensemblebeneit would be weaker and the accuracy gets saturated earlier. Therefore, weind 10-ensemble is good enough for testing accuracy, see Fig. 4.

    4.3 Comparing defense methods

    Finally, we compare our RSE method with other existing defense algorithms.Note that we test all of them using C&W untargeted attack, which is the mostdiicult setting for defenders.

  • 14 X. Liu, M. Cheng, H. Zhang and C-J. Hsieh

    The comparison across diferent datasets and networks can be found in Tab. 2and Fig. 5. As we can see, previous defense methods have little efect on C&W at-tacks. For example, Robust Opt+BReLU [40] is useful for CIFAR10+ResNeXt,but the accuracy is even worse than no defense model for STL10+Model A. Incontrast, our RSE method acts as a good defence across all cases. Speciically,RSE method enforces the attacker to ind much more distorted adversarial im-ages in order to start a successful attack. As showed in Fig. 5, when we allow anaverage distortion of 0.21 on CIFAR10+VGG16, C&W attack is able to conductuntargeted attacks with success rate > 99%. On the contrary, by defending thenetworks via RSE, C&W attack only yields a success rate of ∼20%. Recently, an-other version of adversarial training is proposed [20]. Diferent from “Adversarialtraining (I)” shown in Tab. 1, it trains the network with adversaries generatedby multiple steps of gradient descent (therefore we call it “Adversarial training(II)” in Tab. 1). Compared with our method, the major weakness is that it takes∼10 times longer to train a robust network despite that the result is only slightlybetter than our RSE, see Fig. 3 (right).

    Apart from the accuracy under C&W attack, we ind the distortion of ad-versarial images also increases signiicantly, this can be seen in Fig. 2(2nd row),as c is large enough (so that all defense algorithms no longer works) our RSEmethod achieves the largest distortion.

    Although all above experiments are concerning untargeted attack, it does notmean targeted attack is not covered, as we said, targeted attack is harder forattacking methods and easier to defense. As an example, we test all the defensealgorithms on CIFAR-10 dataset under targeted attacks. We randomly pick animage from CIFAR10 and plot the perturbation xadv − x in Fig. 6 (the exactnumber is in Tab. 3), to make it easier to print out, we subtract RGB channelsfrom 255 (so the majority of pixels are white and distortions can be noticed).One can easily ind RSE method makes the adversarial images more distorted.

    Lastly, apart from CIFAR-10, we also design an experiment on a much largerdata to support the efectiveness of our method even on large data. Due to spacelimit, the result is postponed to appendix.

    5 ConclusionIn this paper, we propose a new defense algorithm called Random Self-Ensemble(RSE) to improve the robustness of deep neural networks against adversarialattacks. We show that our algorithm is equivalent to ensemble a huge amountof noisy models together, and our proposed training process ensures that the en-semble model can generalize well. We further show that the algorithm is equiv-alent to adding a Lipchitz regularization and thus can improve the robustnessof neural networks. Experimental results demonstrate that our method is veryrobust against strong white-box attacks. Moreover, our method is simple, easyto implement, and can be easily embedded into an existing network.

    Acknowledgement. The authors acknowledge the support of NSF via IIS-1719097 and the computing resources provided by Google cloud and Nvidia.

  • RSE for Robust Neural Networks 15

    References

    1. Athalye, A., Carlini, N.: On the robustness of the cvpr 2018 white-box adversarialexample defenses. arXiv preprint arXiv:1804.03286 (2018)

    2. Athalye, A., Carlini, N., Wagner, D.: Obfuscated gradients give a false sense ofsecurity: Circumventing defenses to adversarial examples. 35th International Con-ference on Machine Learning (ICML) (2018)

    3. Buckman, J., Roy, A., Rafel, C., Goodfellow, I.: Thermometer encoding: Onehot way to resist adversarial examples. In: International Conference on LearningRepresentations (2018), https://openreview.net/forum?id=S18Su--CW

    4. Carlini, N., Wagner, D.: Adversarial examples are not easily detected: Bypass-ing ten detection methods. In: Proceedings of the 10th ACM Workshop on Arti-icial Intelligence and Security. pp. 3–14. AISec ’17, ACM, New York, NY, USA(2017). https://doi.org/10.1145/3128572.3140444, http://doi.acm.org/10.1145/3128572.3140444

    5. Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In:Security and Privacy (SP), 2017 IEEE Symposium on. pp. 39–57. IEEE (2017)

    6. Chen, P.Y., Sharma, Y., Zhang, H., Yi, J., Hsieh, C.J.: Ead: Elastic-net attacksto deep neural networks via adversarial examples. In: Proceedings of the Thirty-Second AAAI Conference on Artiicial Intelligence (2018)

    7. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A.,Tucker, P., Yang, K., Le, Q.V., et al.: Large scale distributed deep networks. In:Advances in neural information processing systems. pp. 1223–1231 (2012)

    8. Dhillon, G.S., Azizzadenesheli, K., Bernstein, J.D., Kossaii, J., Khanna, A., Lip-ton, Z.C., Anandkumar, A.: Stochastic activation pruning for robust adversar-ial defense. In: International Conference on Learning Representations (2018),https://openreview.net/forum?id=H1uR4GZRZ

    9. Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C., Prakash,A., Kohno, T., Song, D.: Robust physical-world attacks on deep learning visualclassiication. In: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition. pp. 1625–1634 (2018)

    10. Feinman, R., Curtin, R.R., Shintre, S., Gardner, A.B.: Detecting adversarial sam-ples from artifacts. arXiv preprint arXiv:1703.00410 (2017)

    11. Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial ex-amples. In: International Conference on Learning Representations (2015), http://arxiv.org/abs/1412.6572

    12. Grosse, K., Manoharan, P., Papernot, N., Backes, M., McDaniel, P.: On the (sta-tistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280 (2017)

    13. Guo, C., Rana, M., Cisse, M., van der Maaten, L.: Countering adversarial imagesusing input transformations. In: International Conference on Learning Represen-tations (2018), https://openreview.net/forum?id=SyJ7ClWCb

    14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:Proceedings of the IEEE conference on computer vision and pattern recognition.pp. 770–778 (2016)

    15. Hein, M., Andriushchenko, M.: Formal guarantees on the robustness of a classiieragainst adversarial manipulation. In: Advances in Neural Information ProcessingSystems 30: Annual Conference on Neural Information Processing Systems 2017,4-9 December 2017, Long Beach, CA, USA. pp. 2263–2273 (2017)

    16. Huang, R., Xu, B., Schuurmans, D., Szepesvári, C.: Learning with a strong adver-sary. arXiv preprint arXiv:1511.03034 (2015)

  • 16 X. Liu, M. Cheng, H. Zhang and C-J. Hsieh

    17. Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. In:International Conference on Learning Representations (ICLR) (2017)

    18. Lecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., Jana, S.: Certiied Robustnessto Adversarial Examples with Diferential Privacy. ArXiv e-prints (Feb 2018)

    19. Ma, X., Li, B., Wang, Y., Erfani, S.M., Wijewickrema, S., Schoenebeck, G., Houle,M.E., Song, D., Bailey, J.: Characterizing adversarial subspaces using local intrinsicdimensionality. In: International Conference on Learning Representations (2018),https://openreview.net/forum?id=B1gJ1L2aW

    20. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learningmodels resistant to adversarial attacks. 6-th International Conference on LearningRepresentations (ICLR) (2018)

    21. Meng, D., Chen, H.: Magnet: A two-pronged defense against adversarial exam-ples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer andCommunications Security. pp. 135–147. CCS ’17, ACM, New York, NY, USA(2017). https://doi.org/10.1145/3133956.3134057, http://doi.acm.org/10.1145/3133956.3134057

    22. Noh, H., You, T., Mun, J., Han, B.: Regularizing deep neural networks by noise:Its interpretation and optimization. In: Advances in Neural Information ProcessingSystems. pp. 5115–5124 (2017)

    23. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Prac-tical black-box attacks against deep learning systems using adversarial examples.arXiv preprint arXiv:1602.02697 (2016)

    24. Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense toadversarial perturbations against deep neural networks. In: Security and Privacy(SP), 2016 IEEE Symposium on. pp. 582–597. IEEE (2016)

    25. Samangouei, P., Kabkab, M., Chellappa, R.: Defense-GAN: Protecting classi-iers against adversarial attacks using generative models. In: International Con-ference on Learning Representations (2018), https://openreview.net/forum?id=BkJ3ibb0-

    26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scaleimage recognition. In: International Conference on Learning Representation (2015)

    27. Song, Y., Kim, T., Nowozin, S., Ermon, S., Kushman, N.: Pixeldefend: Lever-aging generative models to understand and defend against adversarial exam-ples. In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=rJUYGxbCW

    28. Steinhardt, J., Koh, P.W.W., Liang, P.S.: Certiied defenses for data poisoningattacks. In: Advances in Neural Information Processing Systems. pp. 3520–3532(2017)

    29. Strauss, T., Hanselmann, M., Junginger, A., Ulmer, H.: Ensemble methods as a de-fense to adversarial perturbations against deep neural networks. arXiv:1709.03423(2017)

    30. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedingsof the IEEE conference on computer vision and pattern recognition. pp. 1–9 (2015)

    31. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I.,Fergus, R.: Intriguing properties of neural networks. In: International Conferenceon Learning Representation (2014)

    32. Tramèr, F., Kurakin, A., Papernot, N., Boneh, D., McDaniel, P.: Ensemble adver-sarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)

  • RSE for Robust Neural Networks 17

    33. Weng, T.W., Zhang, H., Chen, P.Y., Yi, J., Su, D., Gao, Y., Hsieh, C.J., Daniel, L.:Evaluating the robustness of neural networks: An extreme value theory approach.6-th International Conference on Learning Representations (ICLR) (2018)

    34. Xiao, C., Li, B., yan Zhu, J., He, W., Liu, M., Song, D.: Generating ad-versarial examples with adversarial networks. In: Proceedings of the Twenty-Seventh International Joint Conference on Artiicial Intelligence, IJCAI-18. pp.3905–3911. International Joint Conferences on Artiicial Intelligence Organization(7 2018). https://doi.org/10.24963/ijcai.2018/543, https://doi.org/10.24963/ijcai.2018/543

    35. Xiao, C., Zhu, J.Y., Li, B., He, W., Liu, M., Song, D.: Spatially transformedadversarial examples. arXiv preprint arXiv:1801.02612 (2018)

    36. Xie, C., Wang, J., Zhang, Z., Ren, Z., Yuille, A.: Mitigating adversarial efectsthrough randomization. In: International Conference on Learning Representations(2018), https://openreview.net/forum?id=Sk9yuql0Z

    37. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformationsfor deep neural networks. In: Computer Vision and Pattern Recognition (CVPR),2017 IEEE Conference on. pp. 5987–5995. IEEE (2017)

    38. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R.,Bengio, Y.: Show, attend and tell: Neural image caption generation with visualattention. In: International Conference on Machine Learning. pp. 2048–2057 (2015)

    39. Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples indeep neural networks. Network and Distributed System Security Symposium (2018)

    40. Zantedeschi, V., Nicolae, M.I., Rawat, A.: Eicient defenses against adversarialattacks. In: Proceedings of the 10th ACM Workshop on Artiicial Intelligence andSecurity. pp. 39–49. ACM (2017)