77
Color check if this is unreadable, we’re in trouble if this is unreadable, whatever 1 + + + + + + +

Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Color checkif this is unreadable, we’re in troubleif this is unreadable, whatever

1

+ +

+ +

+ +

+

Page 2: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Adversarial Example Defenses:

Ensembles of Weak Defenses are not Strong

Warren HeJames Wei

Xinyun ChenNicholas Carlini

Dawn Song

UC Berkeley

2

Page 3: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

AlphaGo: Winning over World Champion

3

Source: David Silver

Page 4: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Source: Kaiming He

Achieving Human-Level Performance on ImageNet Classification

4

Page 5: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Deep Learning Powering Everyday Products

5

pcmag.com theverge.com

Page 6: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Deep Learning Systems Are Easily Fooled

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. Intriguing properties of neural networks. ICLR 2014.

6

ostrich

Page 7: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

OutlineBackground: neural networks and adversarial examples

Defenses against adversarial examples

Ensemble defenses case studies

● Feature squeezing● Specialists+1● Unrelated detectors

Conclusion

7

Page 8: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

OutlineBackground: neural networks and adversarial examples

Defenses against adversarial examples

Ensemble defenses case studies

● Feature squeezing● Specialists+1● Unrelated detectors

Conclusion

8

Page 9: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Neural networksInput: a vector of numbers, e.g., image pixels

9

input

Page 10: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Neural networksLinear combination (matrix multiply) and add bias

10

input

W b

Page 11: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Neural networksNonlinearity, e.g., ReLU(x) = max(0, x)

11

input

W b

Page 12: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Neural networksMany layers of these (deep)

12

input

W b

...

Page 13: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Neural networksIn image classification, softmax function converts output to probabilities

13

input

W b

... % output

Page 14: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Neural networksOverall, a great big function that takes an input x and parameters θ.

14

xinput

θ

outputF(x, θ)

Page 15: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Neural networksOverall, a great big function that takes an input x and parameters θ.

Some training data in x, know what output should be,use gradient descent to figure out best θ.

15

xinput

θ

outputF(x, θ)

gradients

Page 16: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Adversarial examplesSmall change in input, wrong output.

Smallness referred to as distortion.

Measured in L2 distance:Euclidean distance if image were a vector of pixel values

16

Page 17: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Adversarial examplesState of the art: Vulnerable

● Image classification● Caption generation● Speech recognition● Natural language processing● Policies, reinforcement learning● Self-driving cars

17

Stop Sign

Yield Sign

Page 18: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Generating adversarial examplesHow? Gradients again.

Differentiate with respect to inputs, rather than parametersGet: how to change each pixel to make output a little more wrong

18

xinput

θ

outputF(x, θ)

Page 19: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Generating adversarial examplesWe have gradient → We optimize

Given original input x and correct output y:

19

close to originalsome other input

where output is wrong

Page 20: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: Other threat modelsThese were white-box attacks, where attacker knows the model parameters.

Black-box scenarios have less information available.There are techniques to use white-box attacks in black-box scenarios.

We focus on white-box attacks in this work.

20

Page 21: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

OutlineBackground: neural networks and adversarial examples

Defenses against adversarial examples

Ensemble defenses case studies

● Feature squeezing● Specialists+1● Unrelated detectors

Conclusion

21

Page 22: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: DefensesEnsemble

Normalization

Distributional detection

PCA detection

Secondary classification

Stochastic

Generative

Training process

Architecture

Retrain

Pre-process input

Detection

Prevention

Page 23: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Background: DefensesWe evaluate defenses:

● Can we still algorithmically find adversarial examples?● Do we need higher distortion?

23

Page 24: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

OutlineBackground: neural networks and adversarial examples

Defenses against adversarial examples

Ensemble defenses case studies

● Feature squeezing● Specialists+1● Unrelated detectors

Conclusion

24

Page 25: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Data setsMNIST

25

CIFAR-10

Page 26: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Are ensemble defenses stronger?Stronger!

26

Not much stronger

Page 27: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

OutlineBackground: neural networks and adversarial examples

Defenses against adversarial examples

Ensemble defenses case studies

● Feature squeezingAddress two kinds of perturbations

● Specialists+1● Unrelated detectors

Conclusion

27

Page 28: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Ensemble defense: Feature squeezingRun prediction on multiple versions of an input image

Use “squeezing” algorithms to produce different versions of input

If predictions differ too much, input is adversarial

Xu, W., Evans, D., & Qi, Y. (2017). Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. arXiv preprint arXiv:1704.01155.

model

model

⋮squeeze

inputcompare

predictions (threshold)

result

prediction or adversarial

28

Page 29: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Ensemble defense: Feature squeezing“Squeezing” an image removes some of its information

Maps many images to the same image:Ideally maps adversarial examplesto something easier to classify

29

Page 30: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Feature squeezing algorithms and attacksTwo specific squeezing algorithms

● Color depth reduction● Spatial smoothing

Effectiveness when used in isolation

30

Page 31: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Convert image colors to low bit-depth

Eliminates small changeson many pixels

><?

31

Page 32: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Works well against fast gradient sign method (FGSM)

Instead of optimizing, do one quick step

32

Page 33: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Works well against fast gradient sign method (FGSM)

Gradient in direction of wrong prediction, as usual

33

pixels

Page 34: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Works well against fast gradient sign method (FGSM)

Sign of that gradient: only increase or decrease

34

pixels

Page 35: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Works well against fast gradient sign method (FGSM)

Increase or decrease each pixel by ϵ

35

pixels

Page 36: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Not fully differentiable

modeldepth

input result

36

Page 37: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Can be attacked using a substitute that excludes the non-differentiable part

modeldepth

input result

37

Page 38: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Can be attacked using a substitute that excludes the non-differentiable part

modelinput result

gradients

38

Page 39: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Can be attacked using a substitute that excludes the non-differentiable part

modelinput result

modeldepth

input result

try on complete system

keep if still misclassified

39

Page 40: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsColor depth reduction

Can be attacked using a substitute that excludes the non-differentiable part

modelinput result

modeldepth

input result

Attacker uses two versions.

One differentiable for generating candidates.

One for testing candidates.

40

Page 41: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithms

Color depth reduction: untargeted optimization attack

Can be attacked using a substitute that excludes the non-differentiable part

original

adversarial

MNIST, reduction to 1 bit CIFAR-10, reduction to 3 bits

41

Page 42: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithms

Color depth reduction

99%-100% success rate, but increases average L2 distortion

42

Page 43: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsSpatial smoothing

Median filter:replace each pixel with medianaround its neighborhood

Eliminates strong changeson a few pixels

43

Page 44: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsSpatial smoothing

Can be attacked directly using existing techniques

modelsmooth

input result

44

gradients

Page 45: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsSpatial smoothing: untargeted optimization attack

Can be attacked directly using existing techniques

original

adversarial

MNIST, 3×3 smoothing CIFAR-10, 2×2 smoothing

45

Page 46: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Squeezing algorithmsSpatial smoothing

100% success rate, about the same average L2 distortion

46

Page 47: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Feature squeezingFull defense combines these squeezing algorithms in an ensemble.

If predictions differ by too much (L1 distance), input is adversarial.

model

modeldepth

inputcompare

predictions (threshold)

adv.?

modelsmooth

pred.

47

Page 48: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Feature squeezing: AttackLoss function● Make prediction wrong● Make all predictions have low L1 distance● Stay close to original image

model

modeldepth

inputcompare

predictions (threshold)

adv.?

modelsmooth

pred.

48

Page 49: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Feature squeezing: AttackWrong prediction is fully differentiable

model

modeldepth

inputcompare

predictions (threshold)

adv.?

modelsmooth

pred.

49

gradients

Page 50: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Feature squeezing: AttackL1 distance only gets gradients from two branches.

Attacker tests candidates on complete system.

model

modeldepth

inputcompare

predictions (threshold)

adv.?

modelsmooth

pred.

50

approx. gradients

Page 51: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Feature squeezing: AttackEnsemble defense

Can be attacked using gradients from differentiable branchesand random initialization

51

original

adversarial

MNIST, 1-bit, 3×3 CIFAR-10, 3-bit, 2×2

Page 52: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Feature squeezing: AttackEnsemble defense

100% success rate, average adversarial-ness less than original images,average L2 distortion not much higher than individual squeezing algorithms

52

Page 53: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Feature squeezingDon’t have to completely fool the strongest component defense

model

modeldepth

inputcompare

predictions (threshold)

benign

modelsmooth

pred.

53

Be careful about howdefenses are put together.

Page 54: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Are ensemble defenses stronger?Stronger!

54

Not much stronger

Page 55: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Are ensemble defenses stronger?Stronger!

55

Not much stronger

Feature squeezing(MNIST) +23%

Page 56: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Are ensemble defenses stronger?Stronger!

56

Not much stronger Weaker?

Feature squeezing(MNIST) +23%

Feature squeezing(CIFAR-10) −36%

Page 57: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

OutlineBackground: neural networks and adversarial examples

Defenses against adversarial examples

Ensemble defenses case studies

● Feature squeezing● Specialists+1

Multiple models, to cover common errors● Unrelated detectors

Conclusion

57

Page 58: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Ensemble defense: Specialists+1Combine specialist classifiers that classify among sets of confusing classes.

Example: automobiles are more often confused with trucks than with dogs.

“Automobile” includes sedans, SUVs, things of that sort.“Truck” includes only big trucks. Neither includes pickup trucks.

Abbasi, M., & Gagné, C. (2017). Robustness to Adversarial Examples through an Ensemble of Specialists. ICLR 2017 Workshop Track.

58

Page 59: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Ensemble defense: Specialists+1Two sets corresponding to each class:

● The most common confused classes (top 80%)● The rest of the classes

For auto: truck, ship, frog and airplane, auto, bird, cat, deer, dog, horse

Additionally, a “generalist” set with all classes

59

Page 60: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Ensemble defense: Specialists+1For each set, train a classifier to classify between those classes

If all classifiers that can predict a class do predict that class,then only those classifiers vote; otherwise, all classifiers vote

Class with most votes is the prediction

If average confidence among voting classifiers is low,then input is adversarial

bird

frog

cat

bird

bird

cat

cat

bird

bird

auto

cat

cat

dog

cat

cat

dog

truck

cat

cat

cat

cat

60

Page 61: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Specialists+1: attackTargeted attack: figure out which classifiers would be neededto win with a unanimous vote cat

cat

cat

cat

cat

cat

cat

cat

cat

cat

cat

61

Page 62: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Specialists+1: attackTargeted attack: figure out which classifiers would be neededto win with a unanimous vote

Optimize loss function made from those classifiers’ outputs:add up loss functions that we would use for individual ones

Favor high confidence, not just misclassification

cat

cat

cat

cat

cat

cat

cat

cat

cat

cat

catΣ62

loss =

Page 63: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Specialists+1: attackTargeted optimization attack

63

original

adversarial

MNIST, high confidence

Page 64: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Specialists+1: attackTargeted optimization attack

Randomly chosen targets, 99% success rate,average confidence higher than average of benign images

64

Page 65: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Are ensemble defenses stronger?Stronger!

65

Not much stronger Weaker?

Feature squeezing(MNIST) +23%

Feature squeezing(CIFAR-10) −36%

Specialists+1(MNIST) +6%

Page 66: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

OutlineBackground: neural networks and adversarial examples

Defenses against adversarial examples

Ensemble defenses case studies

● Feature squeezing● Specialists+1● Unrelated detectors

Does it matter if defenses are designed to work well together?

Conclusion

66

Page 67: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Three unrelated detectors1. A separate network that distinguishes benign and adversarial images.

GONG, Z., WANG, W., AND KU, W.-S. Adversarial and clean data are not twins. arXiv preprint arXiv:1704.04960 (2017).

67

input conv conv FC FC output

adv.?...

Page 68: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Three unrelated detectors1. A separate network that distinguishes benign and adversarial images.

GONG, Z., WANG, W., AND KU, W.-S. Adversarial and clean data are not twins. arXiv preprint arXiv:1704.04960 (2017).

2. The above, but using convolution filtered images from within the model, instead of input images.METZEN, J. H., GENEWEIN, T., FISCHER, V., AND BISCHOFF, B. On detecting adversarial perturbations. 5th International Conference on Learning Representations (ICLR) (2017).

68

input conv conv FC FC output

adv.?...

Page 69: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Three unrelated detectors1. A separate network that distinguishes benign and adversarial images.

GONG, Z., WANG, W., AND KU, W.-S. Adversarial and clean data are not twins. arXiv preprint arXiv:1704.04960 (2017).

2. The above, but using convolution filtered images from within the model, instead of input images.METZEN, J. H., GENEWEIN, T., FISCHER, V., AND BISCHOFF, B. On detecting adversarial perturbations. 5th International Conference on Learning Representations (ICLR) (2017).

3. Density estimate using Gaussian kernels, on the final hidden layer of the model.FEINMAN, R., CURTIN, R. R., SHINTRE, S., AND GARDNER, A. B. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017).

69

input conv conv FC FC output

adv.?...

Page 70: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Ensemble: three unrelated detectorsIf any of the three detect adversarial, system outputs adversarial.

70

input conv conv FC FC output

adv.?

detector 3

detector 2

detector 1

or

Page 71: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Unrelated detectors: attackFully differentiable system. Again, previous approaches are directly applicable.

71

input conv conv FC FC output

adv.?

detector 3

detector 2

detector 1

or

gradients

Page 72: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Unrelated detectors: attack100% success rate, imperceptible perturbations on CIFAR-10

72

Page 73: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Are ensemble defenses stronger?Stronger!

73

Not much stronger Weaker?

Feature squeezing(MNIST) +23%

Feature squeezing(CIFAR-10) −36%

Specialists+1(MNIST) +6%

Unrelated detectors(CIFAR-10) +60%

Page 74: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

OutlineBackground: neural networks and adversarial examples

Defenses against adversarial examples

Ensemble defenses case studies

● Feature squeezing● Specialists+1● Unrelated detectors

Conclusion

74

Page 75: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Are ensemble defenses stronger?Stronger!

75

Not much stronger Weaker?

Feature squeezing(MNIST) +23%

Feature squeezing(CIFAR-10) −36%

Specialists+1(MNIST) +6%

Unrelated detectors(CIFAR-10) +60%

Page 76: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

Are ensemble defenses stronger?Not these ones:

● Ensembles with parts designed to work together○ Feature squeezing○ Specialists+1

● Unrelated detectors○ Gong et al., Metzen et al., and Feinman et al.

Combining defenses does not guaranteethat the ensemble will be a stronger defense.

76

Page 77: Color check - USENIXColor depth reduction: untargeted optimization attack Can be attacked using a substitute that excludes the non-differentiable part original adversarial MNIST, reduction

ConclusionsCombining defenses does not guaranteethat the ensemble will be a stronger defense.

Lessons:

1. Evaluate proposed defenses against strong attacks.FGSM is fast, but other methods may succeed where FGSM fails.

2. Evaluate proposed defenses against adaptive adversaries.Common assumption in security community, that attacker knows about defense, would be useful in adversarial examples research.

77