21
Chawin Sitawarin EECS, UC Berkeley [email protected] On the Robustness of Deep k-Nearest Neighbor David Wagner EECS, UC Berkeley [email protected] 2 nd Deep Learning and Security Workshop (IEEE S&P 2019)

On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Chawin SitawarinEECS, UC [email protected]

On the Robustness of Deep k-Nearest Neighbor

David WagnerEECS, UC [email protected]

2nd Deep Learning and Security Workshop (IEEE S&P 2019)

Page 2: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Motivation

“ Adversarial examples for kNN and DkNN ”

• No previous work attacks kNN directly

• Deep k-Nearest Neighbor (DkNN) shows a possibility for detecting adversarial examples but it is difficult to evaluate

• kNN is not differentiable so most existing attacks don’t work

• To measure how robust they really are, we need a white-box attack(no security through obscurity)

DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 2Chawin Sitawarin

Page 3: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

k-Nearest Neighbor

Threat model: white-box, untargeted, Lp norm-ball adversarial exampleso All training samples are known to the attacker

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 3

k = 3

𝑧

Page 4: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Attack on kNN

• Baseline: mean attacko Move 𝑧 towards mean of

the nearest classo Use binary search to

determine the distance

• But this is not optimal

𝑧

mean

Mean Attack

Page 5: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Attack on kNN

• Our gradient-based attacko Main idea: move 𝑧 towards

a set of m nearest neighbors from a different class, {𝑥%} 𝑧

Page 6: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Attack on kNN

• Our gradient-based attacko Main idea: move 𝑧 towards

a set of m nearest neighbors from a different class, {𝑥%}

o Set up as a constrained optimization problem

*We use Euclidean distance here, but it can be directly substituted with cosine distance

𝑧

𝑥'

𝑥(

𝑥)

Here, 𝑚 = 5

𝑥-

𝑥.

Nearby training sample PerturbationInput

Lp-norm constraint Constraint on input domain

Page 7: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Attack on kNN

• Our gradient-based attacko Main idea: move 𝑧 towards

a set of m nearest neighbors from a different class, {𝑥%}

o Set up as a constrained optimization problem

Nearby training sample PerturbationInput

Lp-norm constraint Constraint on input domain

• It’s sufficient to be closeto only 𝑥( and 𝑥)!

• But it is difficult to know which 𝑥% ahead of time

𝑧

𝑥'

𝑥(

𝑥)

𝑥-

𝑥.

Here, 𝑚 = 5

Page 8: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Attack on kNN

• Our gradient-based attacko Main idea: move 𝑧 towards

a set of m nearest neighbors from a different class, {𝑥%}

o Set up as a constrained optimization problem

𝑧

𝑥'

𝑥(

𝑥)

𝑥-

𝑥.

Here, 𝑚 = 5

𝜂

• We want to ignore samples that are too far away by setting a threshold

• But hard threshold is not differentiable

Page 9: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Attack on kNN

• Our gradient-based attacko Main idea: move 𝑧 towards

a set of m nearest neighbors from a different class, {𝑥%}

o Set up as a constrained optimization problem

o Use sigmoid as a soft threshold

o Choose 𝜂 to be mean distance to k-th neighbor

𝑧

𝑥'

𝑥(

𝑥)

𝑥-

𝑥.

Here, 𝑚 = 5

Sigmoid function Threshold

𝜂

Approximate hard threshold with a soft, differentiable one

Page 10: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Results on kNN

• kNN uses cosine distance with k = 75 on MNIST dataset

Most have perceptible / semantic perturbation

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 10

Attacks Accuracy (%) Mean Perturbation (L2)No Attack 95.74 -Mean Attack 5.89 8.611Our Gradient Attack 9.89 6.565

Clean

Adv.

Page 11: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Deep k-Nearest Neighbor

• Proposed by Papernot & McDaniel ’18

• Essentially, kNN on outputs of multiple layers of a neural network

• Simple scheme that offers some interpretability

• Can detect out-of-distribution samples and adversarial examples to some degree

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 11

6

Layer 4

Layer 3

Layer 2

Layer 1

Prediction

Input

Legitimate Sample

Page 12: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Attack on DkNN

• Baseline: mean attacko Same as kNN

• Our gradient-based attacko Similar to our gradient-

based attack on kNN

Gradient-based attack on kNN

Page 13: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Attack on DkNN

• Our gradient-based attacko Similar to our gradient-

based attack on kNNo Instead of distance in the

pixel space, we consider distance in the representation space

o And sum over all the layers

Sum over all layers Representation at 𝜆-th layer

Page 14: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Results on DkNN

• We use the same network and hyperparameters suggested by Papernot & McDaniel ’18

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 14

Attacks Accuracy (%) Mean Perturbation (L2)No Attack 98.83 -Mean Attack 13.13 4.408P&M’18 Attack 16.02 3.459Our Gradient Attack 0.00 2.164

Page 15: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Results on DkNN

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 15

Neighbors at layer 4

Prediction

Input

5

Mean Attack

5

Our Gradient Attack

Neighbors at layer 3

Neighbors at layer 2

Neighbors at layer 1

Page 16: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Results on DkNN

• Some perturbations have semantic meaning• But some are imperceptible• Suggests that L2-norm is not

always a good metric• Suggests that there is some

hope for the defense

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 16

Clean

Adv.

Clean

Adv.

Clean

Adv.

Page 17: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Credibility

• DkNN can output a credibility score for a give input• It can be used to filter out adversarial examples and out-of-

distribution samples• Promising but not very effective currently

o Some adversarial examples have a high credibility scoreo Some clean samples have a low credibility score

• We refer to paper for more details

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 17

Page 18: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Conclusion

• We propose an attack on kNN and DkNN• Nonetheless, they appear to be more robust compared to other

algorithms out of the boxoRequires larger perturbationo Some perturbation also has semantic meaning

• Improving the DkNNoOngoing work: DkNN on representations of a robust network

(e.g. adversarially trained networks)oMore robust variants of kNN (e.g. weighted voting)

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 18

Page 19: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Extra Slides

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 19

Page 20: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

• Gradient-based attacko Main idea: move 𝑧 towards

a group of 𝑚 nearby samples (𝑥%) from a different class

o Set up as a constrained optimization problem

o Use sigmoid as a soft threshold

o Choose 𝜂 to be mean distance to k-th neighbor

Sigmoid function Threshold𝑑

𝜎 𝑑 − 𝜂 ≈ 50 if 𝑑 < 𝜂1 if 𝑑 > 𝜂

𝜎 𝑥 =1

1 + 𝑒<=>

Sigmoid

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 20

Page 21: On the Robustness of Deep k-Nearest Neighbor · k-Nearest Neighbor Threat model: white-box, untargeted, Lpnorm-ball adversarial examples oAll training samples are known to the attacker

Credibility

Chawin Sitawarin DLS '19 (IEEE S&P) On the Robustness of Deep k-Nearest Neighbor 21

Distribution of credibility