14
Sanity Checks for Saliency Maps Julius Adebayo *+ , Justin Gilmer # , Michael Muelly # , Ian Goodfellow # , Moritz Hardt^ # , Been Kim # * Work was done during the Google AI residency program, + MIT, ^ UC Berkeley, # Google Brain.

Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Sanity Checks for Saliency Maps

Julius Adebayo*+, Justin Gilmer#, Michael Muelly#, Ian Goodfellow#, Moritz Hardt^#, Been Kim# *Work was done during the Google AI residency program, +MIT, ^UC Berkeley, #Google Brain.

Page 2: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Interpretability

To use machine learning more responsibly.

Page 3: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Investigating post-training interpretability methods.

!3

Given a fixed model, find the evidence of prediction.

Page 4: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

!4

Junco Bird-ness

A trained machine learning model

(e.g., neural network)

Given a fixed model, find the evidence of prediction.

Why was this a Junco bird?

Investigating post-training interpretability methods.

Page 5: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

One of the most popular techniques:

Saliency maps

!5

Caaaaan do!

A trained machine learning model

(e.g., neural network)

The promise: these pixels are the

evidence of prediction.

Junco Bird-ness

Page 6: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

The promise: these pixels are the

evidence of prediction.

Sanity check question.

!6

A trained machine learning model

(e.g., neural network)Junco Bird-ness

Page 7: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Sanity check question.

!7

A trained machine learning model

(e.g., neural network)

If so, when prediction changes, the explanation should change.

Extreme case: If prediction is random, the explanation should

REALLY change.

The promise: these pixels are the

evidence of prediction.

Junco Bird-ness

Page 8: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Sanity check: When prediction changes, do explanations change?

Saliency map

Page 9: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Randomized weights! Network now makes garbage predictions.

Saliency map

Sanity check: When prediction changes, do explanations change?

Page 10: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Saliency map

!!!!!???!?

Sanity check: When prediction changes, do explanations change?

Randomized weights! Network now makes garbage predictions.

Page 11: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Saliency map

!!!!!???!?

Sanity check: When prediction changes, do explanations change?

Randomized weights! Network now makes garbage predictions.

the evidence of prediction?????

Page 12: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Before After

Gui

ded

Ba

ckpr

op

Inte

grat

ed

Gra

dien

t

Sanity check1: When prediction changes, do explanations change?

No!

Page 13: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

!13

Networks trained with….

Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages?

No!

Page 14: Sanity Checks for Saliency Maps - NeurIPS05... · Sanity check2: Networks trained with true and random labels, Do explanations deliver different messages? No! Conclusion!14 Poster

Conclusion

!14

Poster #30 10:45am - 12:45pm

@Room 210

• Confirmation bias: Just because it “makes sense” to humans, doesn’t mean it reflects the evidence for prediction.

• Do sanity checks for your interpretability methods! (e.g., TCAV [K. et al ’18])

• Others who independently reached the same conclusions: [Nie, Zhang, Patel ’18] [Ulyanov, Vedaldi, Lempitsky ’18]

• Some of these methods have been shown to be useful for humans. Why? More studies needed.