View
7
Download
0
Category
Preview:
Citation preview
Trust Region Based Adversarial Attack on NeuralNetworks
Zhewei Yao1 Amir Gholami1 Peng Xu2 Kurt Keutzer1
Michael W. Mahoney1
1University of California, Berkeley
2Stanford University
IEEE Conference on Computer Vision and Pattern Recognition 2019
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 1 / 51
Table of Contents
1 Introduction
2 Background
3 Contributions
4 Trust Region Optimization
5 Proposed Method
6 Results
7 Conclusion
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 2 / 51
Table of Contents
1 Introduction
2 Background
3 Contributions
4 Trust Region Optimization
5 Proposed Method
6 Results
7 Conclusion
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 3 / 51
IntroductionNeural networks are vulnerable to adversarial examples, inputs thatare crafted to fool the network
They are usually imperceptible, can cause a significant decrease inaccuracy, and can transfer to other networks that an attacker has notseen.
Figure: An attack on an image using the FGSM method
Explaining and harnessing adversarial examples (Goodfellow et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 4 / 51
One Pixel Attack
One Pixel Attack for Fooling Deep Neural Networks (Su et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 5 / 51
3D adversarial objects
Synthesizing Robust Adversarial Examples (Athalye et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 6 / 51
Physical attacks on traffic signs
Robust Physical-World Attacks on Deep Learning Visual Classification (Eykholt et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 7 / 51
Adversarial Patches to Attack Person Detection
Adversarial Patches to Attack Person Detection (Thys et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 8 / 51
Semantic Segmentation and Object Detection
Adversarial Examples for Semantic Segmentation and Object Detection (Xie et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 9 / 51
LIDAR attack
Adversarial Objects Against LiDAR-Based Autonomous Driving Systems (Cao et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 10 / 51
Table of Contents
1 Introduction
2 Background
3 Contributions
4 Trust Region Optimization
5 Proposed Method
6 Results
7 Conclusion
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 11 / 51
Definitions
Untargeted attacks are attacks that change the model classificationto a wrong label. Targeted attacks change the model classification toa specific class.
One-shot / one-step attack require one computational step togenerate an adversary. Methods that require an iterative loop arecalled iterative attacks.
White-box attacks need complete information of the target network(network architecture, gradients, parameters, etc.), while black-boxattacks do not need such information.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 12 / 51
Problem Formulation
Szegedy et al. first formalized the problem of finding adversarialexamples as the following optimization problem
min ||r||2subject to f(x+ r) 6= l
x+ r ∈ [0, 1]m
where l is the target label.
However, this is often computationally infeasible to solve. A commonapproach is to approximate it.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 13 / 51
L-BFGS
Szegedy et al. solved the following alternative optimization problemusing a box-constrained L-BFGS
min c|r|+ lossf (x+ r, l)
subject to x+ r ∈ [0, 1]m
Intriguing properties of neural networks (Szegedy et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 14 / 51
Fast Gradient Sign MethodGoodfellow et al. proposed the Fast Gradient Sign Method (FGSM),a single shot attack L∞ attack that only requires onebackpropagation call. The adversarial example is given by
xadv = x + εsign(∇xJ(x, y))
where J is the loss function. The adversarial example is then clippedto a specified range.
Explaining and harnessing adversarial examples (Goodfellow et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 15 / 51
Basic Iterative Method
Kurakin et al. extended the FGSM into an iterative method.
xadv0 = x
xadvn+1 = clip(xadv
n + αsign(∇xJ(xadvn , y)))
Often more effective than FGSM
Adversarial examples in the physical world (Kurakin et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 16 / 51
DeepFool
Fast iterative untargeted attack that produces adversarial examples bylinearizing the network to an affine multiclass classifier.
DeepFool: a simple and accurate method to fool deep neural networks (Moosavi-Dezfooli et al.)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 17 / 51
Carlini-Wagner Attack
Strong iterative targeted L2 attack that use gradient descent tominimize
||δ||2 + cf(x+ δ)
where
δ =1
2(tanh(w) + 1)− xi
f(x) = max(maxi 6=t{Z(x)i} − Z(x)t,−κ)
Has been shown to beat defensive distillation, which was believed tobe a robust defense against adversarial examples.
Very sensitive to hyper-parameter tuning.
Towards Evaluating the Robustness of Neural Networks (Carlini and Wagner)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 18 / 51
Problems and Challenges
Understanding adversarial examplesI Why are DNNs brittle?
Stronger attacksI Physical attacksI Black box attacks
DefenseI Denoising, affine tranformationsI Training a classifier to predict clean and adversarial examplesI Training / finetuning on adversarial examples (adversarial training)
DetectionI Classifying images as clean or adversarialI Locating adversarial patches
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 19 / 51
Other Problems
Adversarial training can be slow with stronger attacks
Iterative methods do not adjust the step size
In an attack setting, queries to a real world classifier may be limitedor costly
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 20 / 51
Table of Contents
1 Introduction
2 Background
3 Contributions
4 Trust Region Optimization
5 Proposed Method
6 Results
7 Conclusion
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 21 / 51
Contributions
The authors propose a white box targeted attack based on trustregion (TR) optimization.
Their method can adaptively choose the perturbation magnitude ateach iteration, which removes the need for expensive hyperparametertuning.
TR attack can produce perturbations faster than CW (up to 37.5×),and smaller in magnitude compared to DeepFool.
Their method can easily be extended to second-order TR attacks,which could be useful for nonlinear activation functions.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 22 / 51
Table of Contents
1 Introduction
2 Background
3 Contributions
4 Trust Region Optimization
5 Proposed Method
6 Results
7 Conclusion
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 23 / 51
Trust Region Optimization
Trust region methods are a classof iterative nonlinearoptimization algorithms
They are based around trustregions, which are (usually) ballsaround the current point inwhich a quadratic modelapproximation is used to find astep direction
https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 24 / 51
Trust Region Subproblem
Optimizing a function using a trust region method involves solvingtrust region subproblems
min mk(p) = fk + gkT p+
1
2pTBkp
s.t. ||p|| ≤ ∆k
where ∆k is the trust region radius, gk is the gradient at the currentpoint, and Bk is the Hessian.
The optimal p to the subproblem is called the Cauchy point.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 25 / 51
Updating the Trust Region
To update the size of the trust region at each iteration, we computethe following ratio
ρk =f(xk)− f(xk + pk)
mk(0)−mk(pk)
Based on the value of ρk and pk, we may choose to increase theradius, keep it the same, or decrease it
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 26 / 51
Trust region example - Initial Start
We will find the minimum of the Braninfunction. The Branin function has 3global minima.
min f(x1, x2) = (x2 − 0.129x12 +
1.6x1 − 6)2 + 6.07 cos(x1) + 10
The initial variables we define areI x = (6, 14)I ∆0 = 2,∆M = 5I t1 = 0.25, t2 = 2I η1 = 0.2, η2 = 0.75
Figure: Contour of Braninfunction
https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 27 / 51
Iteration 1
The algorithm starts at the green point,and the trust region is defined as thearea inside the circle centered at thestarting point.
After computing the Cauchy point ρk,we evaluate the ratio ρk, which cometout to be 0.99. Since ρk > η2, we takea full stepxk+1 = xk + pk = (5.767, 12.014), andwe increase the radius of the trustregion to ∆k = min(t2∆k,∆M ).
https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 28 / 51
Iteration 2
Starting with the new point and thelarger trust region, compute the Cauchypoint and ρk again to get ρk = 0.98.Thus, we take a full step, and the trustregion radius increases again.
https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 29 / 51
Iteration 3
This time, ρk = 0.578, which is notlarge enough to be trusted again. So westep again, but keep the radius thesame.
https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 30 / 51
Iteration 4
The new point is not a good prediction,and gives us ρk = −0.16. We do notstep, and we decrease the radius of thetrust region by a factor of t1 = 0.25.
https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 31 / 51
Iteration 5
We get ρk = 0.729, which is not largeenough. So we step forward and keepthe radius size the same.
https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 32 / 51
Iteration 6
In this case, ρk = 0.989. While this ishigh enough to update the radius,||pk|| 6= ∆k, so we step, but do notmake a full step, and we keep the radiusunchanged.
https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 33 / 51
Final Trajectory after 20 iterations
The algorithm terminates when thenorm of the gradient is close to 0, orwhen the difference between successfulpoints is close to 0.
https://optimization.mccormick.northwestern.edu/index.php/Trust-region_methods
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 34 / 51
Table of Contents
1 Introduction
2 Background
3 Contributions
4 Trust Region Optimization
5 Proposed Method
6 Results
7 Conclusion
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 35 / 51
Proposed Method
To find adversarial perturbations within a trust region, the authorssolve
min||∆xj ||p
mj(∆xj) = 〈∆xj ,gjt,i〉+
1
2〈∆xj ,Hj
t,i∆xj〉
where εj is the TR radius at the jth iteration, mj is theapproximation of the kernel function of f(xj−1) = zj−1
t − zj−1i , with
gjt,i and Hj
t,i denoting the corresponding gradient and Hessian.
If a ReLU activation is used, the Hessian is zero almost everywhere, sothey can omit the Hessian, and do a first order approximation instead.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 36 / 51
Table of Contents
1 Introduction
2 Background
3 Contributions
4 Trust Region Optimization
5 Proposed Method
6 Results
7 Conclusion
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 37 / 51
Metrics
The authors evaluated the speed of their method by recording thenumber of seconds it takes to find an adversarial image.
To measure perturbation, the authors use relative perturbation. Therelative perturbation of an image is defined as
ρp =||∆x||p||x||p
.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 38 / 51
Types of attacks used
Two types of attacks are performed: best class attack and hardestclass. Best class attack means we target the class with
arg minj
zt − zj||∇x(xt − zj)||
.
Similarly for hardest class attack, we attack the class with
arg maxj
zt − zj||∇x(xt − zj)||
.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 39 / 51
Summary of setup
DatasetsI CIFAR10I ImageNet
AttacksI Iterative FGSM (only L∞)I DeepFoolI Carlini-Wagner (only L2)I TR Non-AdaptI TR Adap
NetworksI CIFAR10
F AlexLikeF AlexLike-S (AlexLike with
swiss activation)F ResNetF Wide ResNet
I ImageNetF AlexNetF VGG16F ResNet50F DenseNet121
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 40 / 51
Time Performance on ImageNet
TR and TR Adap produce similar perturbations as CW, but withsignificantly less time (up to 37.5×).
Figure 4: Perturbation magnitude vs. time
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 41 / 51
Qualitative Results
Figure 2: Attacks on VGG-16 with L∞ norm. TR perturbation is smallerthan DF (1.9× smaller)
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 42 / 51
CIFAR10 Results
For all tables and models, the perturbation is chosen such that theaccuracy of the target model is reduced to less than 0.1%
Table 1: Best class attack.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 43 / 51
CIFAR10 Results
Table 2: Hardest class attack
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 44 / 51
ImageNet Results
Table 3: Best class attack
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 45 / 51
ImageNet Results
Table 4: Hardest class attack
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 46 / 51
Second order attack results
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 47 / 51
Table of Contents
1 Introduction
2 Background
3 Contributions
4 Trust Region Optimization
5 Proposed Method
6 Results
7 Conclusion
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 48 / 51
Conclusion
The authors propose a white box targeted attack called TR attack,which is based on trust region methods.
Their method can produce smaller adversarial perturbations veryquickly
TR attack can choose the perturbations at each iteration, and can beeasily extended to a second order attack.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 49 / 51
References I
Yulong Cao, Chaowei Xiao, Dawei Yang, Jing Fang, Ruigang Yang, Mingyan Liu, and BoLi.Adversarial objects against lidar-based autonomous driving systems.arXiv preprint arXiv:1907.05418, 2019.
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao,Atul Prakash, Tadayoshi Kohno, and Dawn Song.Robust physical-world attacks on deep learning visual classification.In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy.Explaining and harnessing adversarial examples.In International Conference on Learning Representations, 2015.
Alexey Kurakin, Ian Goodfellow, and Samy Bengio.Adversarial examples in the physical world.arXiv preprint arXiv:1607.02533, 2016.
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, andAnanthram Swami.Practical black-box attacks against machine learning.Proceedings of the 2017 ACM on Asia Conference on Computer and CommunicationsSecurity - ASIA CCS ’17, 2017.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 50 / 51
References II
Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai.One pixel attack for fooling deep neural networks.IEEE Transactions on Evolutionary Computation, 23(5):828–841, Oct 2019.
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, IanGoodfellow, and Rob Fergus.Intriguing properties of neural networks.In International Conference on Learning Representations, 2014.
Simen Thys, Wiebe Van Ranst, and Toon Goedeme.Fooling automated surveillance cameras: Adversarial patches to attack person detection.In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops,June 2019.
Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille.Adversarial examples for semantic segmentation and object detection.2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017.
Presented by Calvin Yong (UCF) Trust Region Attack CVPR 2019 51 / 51
Recommended