Semidefinite relaxations for certifying robustness to adversarial … · 2019. 1. 13. ·...

Semidefinite relaxations for certifying robustness to

adversarial examples

Jacob Steinhardt Percy Liang

Aditi Raghunathan

ML: Powerful But Fragile

ML: Powerful But Fragile• ML is successful on several tasks: object recognition, game playing, face

recognition

• ML systems fail catastrophically in presence of adversaries

recognition

• Different kinds of adversarial manipulations — data poisoning, manipulation of test inputs, model theft, membership inference etc.

recognition

• Different kinds of adversarial manipulations — data poisoning, manipulation of test inputs, model theft, membership inference etc.

• Focus on adversarial examples — manipulation of test inputs

Adversarial Examples

[Sharif et al. 2016]

Glasses ! Impersonation

Glasses ! Impersonation Banana + patch !Toaster

[Brown et al. 2017]

Glasses ! Impersonation Banana + patch !Toaster Stop + sticker !Yield

[Brown et al. 2017] [Evtimov et al. 2017]

3D Turtle ! Rifle

[Athalye et al. 2017]

3D Turtle ! Rifle

Noise ! “Ok Google”

[Carlini et al. 2017]

3D Turtle ! Rifle

Noise ! “Ok Google”

[Carlini et al. 2017]

Malware ! Benign

[Grosse et al. 2017]

What is an adversarial example?

Definition of attack model usually application specific and complex

We consider the well studied attack model `1

Panda Gibbon

Szegedy et al. 2014

|xadv � x|i ✏ for i = 1, 2, . . . d

Panda Gibbon

Szegedy et al. 2014

|xadv � x|i ✏ for i = 1, 2, . . . d xadv 2 B✏(x)

Panda Gibbon

Szegedy et al. 2014

History

Hard to defend even in this well defined model…

History• [Szegedy+ 2014]: First discover adversarial examples

• [Goodfellow+ 2015]: Adversarial training (AT) against FGSM

• [Papernot+ 2015]: Defensive Distillation

• [Carlini & Wagner 2016]: Distillation is not secure

• [Papernot + 2017]: Better distillation

• [Carlini & Wagner 2017]: Ten detection strategies fail

• [Madry+ 2017]: AT against PGD, informal argument about optimality

• [Lu + July 12 2017]: ”NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles”

• [Athalye and Sutskever July 17 2017]: Break above defense

• [Athalye, Carlini, Wagner]: Break 6 out of 7 ICLR defenses 6

Provable robustness

Can we get robustness to all attacks?

Provable robustness

f(x)Let be the scoring function and adversary wants to maximizef(x)

Provable robustness

Attacks: Generate points in B✏(x)

Provable robustness

Afgsm(x) = x+ ✏ sign�rf(x)

Provable robustness

Aopt(x) = argmaxx

Provable robustness

Network is provably robust if optimal attack fails

Aopt(x) = argmaxx

Provable robustness

Network is provably robust if optimal attack fails

Aopt(x) = argmaxx

f? ⌘ f(Aopt(x)) < 0

Network is provably robust if

Provable robustnessf? ⌘ f(Aopt(x)) < 0

Computing is intractable in generalf?

• Combinatorial approaches to compute

Provable robustness

f? ⌘ f(Aopt(x)) < 0

• SMT based Reluplex [Katz+ 2018]

Provable robustness

f? ⌘ f(Aopt(x)) < 0

• MILP based with specialized preprocessing [Tjeng+ 2018]

Provable robustness

f? ⌘ f(Aopt(x)) < 0

• Convex relaxations to compute upper bound on

Provable robustness

f? ⌘ f(Aopt(x)) < 0

• Upper bound is negative optimal attack fails =)

Provable robustness

f? ⌘ f(Aopt(x)) < 0

• Upper bound is negative optimal attack fails

• Computationally efficient upper bound =)

Provable robustness

f? ⌘ f(Aopt(x)) < 0

Provable robustness

Not robust

fupper

f? ⌘ f(Aopt(x)) < 0

Provable robustness

Not robust

fupper 0f? fupper

Robust and certified

f? ⌘ f(Aopt(x)) < 0

Provable robustness

Not robust

fupper 0f? fupper

Robust and certified

0f? fupper

Robust and not certified

f? ⌘ f(Aopt(x)) < 0

Two layer networks

Key idea: Uniformly bound gradients

Two layer networks

f(x) f(x) + ✏maxx

krf(x)k19

Two layer networks

f(x) f(x) + ✏maxx

krf(x)k1

Two layer networks

Bound on gradient:

f(x) f(x) + ✏maxx

krf(x)k1

Two layer networks

Bound on gradient:

f(x) f(x) + ✏maxx

krf(x)k1

Two layer networks

Bound on gradient:

f(x) f(x) + ✏maxx

krf(x)k1

Two layer networks

Bound on gradient:

optimize over activations

f(x) f(x) + ✏maxx

krf(x)k1

Two layer networks

Bound on gradient:

optimize over signs of perturbation

f(x) f(x) + ✏maxx

krf(x)k1

Two layer networks

Bound on gradient:

optimize over signs of perturbation

Final step: SDP relaxation (similar to MAXCUT) leads to Grad-cert

f(x) f(x) + ✏maxx

krf(x)k1

Relaxation Training

Relaxation TrainingTraining a neural network

Objective:

Differentiable objective but expensive gradients

Objective:

Duality to the rescue!

Objective:

Regularizer:

D · �+max

�(M(v,W )� diag(c)

�+ 1> max(c, 0)

Objective:

Regularizer:

Just one max eigenvalue computation for gradients

D · �+max

�(M(v,W )� diag(c)

�+ 1> max(c, 0)

Results on MNIST

Attack: Projected Gradient Descent attack of Madry et al. 2018 Adversarial training: Minimizes this lower bound on training set

Results on MNIST

Gradient based bound is quite loose

Results on MNIST

Train with Grad-cert(attack)

Results on MNIST

Attack: Projected Gradient Descent attack of Madry et al. 2018 Our method: Minimize gradient based upper bound 14

Train with Grad-cert(attack)

Results on MNIST

Train with Grad-cert(attack)Train with Grad-cert(Certified)

Results on MNIST

Gradient based bound is much better!

Results on MNIST

Training a network to minimize gradient upper bound finds networks where the bound is tight

Results on MNIST

Comparison with Wong and Kolter 2018 (LP-cert)

Results on MNIST

Network PGD-attack LP-cert Grad-cert