81
Explaining and Interpreting Deep Neural Networks Klaus-Robert Müller , Wojciech Samek, Gregoire Montavon, Sebastian Lapuschkin, Kristof Schütt et al.

Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Explaining and Interpreting Deep Neural Networks

Klaus-Robert Müller, Wojciech Samek, Gregoire

Montavon, Sebastian Lapuschkin, Kristof Schütt et al.

Page 2: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Outline

• general remarks on ML, also on explaining and interpreting

• understanding single decisions of nonlinear learners

• Layer-wise Relevance Propagation (LRP)

• Applications in Neuroscience and Physics

Page 3: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

ML in a nutshell

Kernel Methods: SVM etc.

Deep Neural Networks

Page 4: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Based on: ICASSP 2017 Tutorial

Page 5: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Acknowledgements

Page 6: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Recent ML systems reach superhuman performance

ML in the sciences

Page 7: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

From Data to Information

Page 8: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

From Data to Information

Page 9: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Interpretable vs. powerful models?

Page 10: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Interpretable vs. powerful models?!

Kernel machines

Page 11: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Interpretable vs. powerful models?

Page 12: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Different dimensions of interpretability

Page 13: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Why interpretability?

Page 14: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Why interpretability?

Page 15: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Why interpretability?

Page 16: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Why interpretability?

Page 17: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Why interpretability?

Page 18: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Why interpretability?

Page 19: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Techniques of Interpretation

Page 20: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Techniques of Interpretation

Page 21: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Techniques of Interpretation

Page 22: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Techniques of Interpretation

Page 23: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Techniques of Interpretation

Page 24: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Techniques of Interpretation

Page 25: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Techniques of Interpretation

Page 26: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Techniques of Interpretation

Page 27: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Interpreting models

Page 28: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Interpreting with class prototypes

Page 29: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Examples of Class Prototypes

Page 30: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Building more natural prototypes

Montavon, Samek, Müller arxiv 2017

Page 31: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Building Prototypes using a generator

Page 32: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Building Prototypes using a generator

Page 33: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Types of Interpretation

Page 34: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Approaches to interpretability

Page 35: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Explaining models

Page 36: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Explaining Neural Network Predictions

Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

- based on generic theory (related to Taylor decomposition – deep taylor decomposition M et al 16)

- applicable to any NN with monotonous activation, BoW models, Fisher Vectors, SVMs etc.

Explanation: “Which pixels contribute how much to the classification” (Bach et al 2015)

(what makes this image to be classified as a car)

Sensitivity / Saliency: “Which pixels lead to increase/decrease of prediction score when changed”

(what makes this image to be classified more/less as a car) (Baehrens et al 10, Simonyan et al 14)

Cf. Deconvolution: “Matching input pattern for the classified object in the image” (Zeiler & Fergus 2014)

(relation to f(x) not specified) Activation Maximization

Each method solves a different problem!!!

Page 37: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Classification

cat

ladybug

dog

large activation

Explaining Neural Network Predictions

Page 38: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Explanation

cat

ladybug

dog

=

Initialization

Explaining Neural Network Predictions

Page 39: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Explanation

cat

ladybug

dog

Theoretical interpretation

Deep Taylor Decomposition

?

Explaining Neural Network Predictions

depends on the activations and the weights

Page 40: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Explanation

cat

ladybug

dog

Relevance Conservation Property

Explaining Neural Network Predictions

large relevance

Page 41: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Advantages of LRP over Sensitivity

1. Global explanations: What makes a car a car and not what makes a car less / more a car.

2. No discontinuities: small variations do not result in large changes of the relevance.

Page 42: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Advantages of LRP over both Sensitivity and Deconvolution

Image specific explanations: LRP takes into account the activations.

LRP provides different

explanations for different

input images.

For NNs without pooling

layers Sensitvity and

Deconvolution provides the

same explanations for

different samples.

Page 43: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Positive and Negative Evidence: LRP distinguishs between positive evidence,

supporting the classification decision, and negative evidence, speaking against the

prediction

LRP indicates what speaks

for class ‘3’ and speaks

against class ‘9’

The sign of Sensitivity and

Deconvolution does not have

this interpretation.

-> taking norm gives unsigned

visualizations

Advantages of LRP over both Sensitivity and Deconvolution

Page 44: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Male or Female?

Page 45: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Advantages of LRP over both Sensitivity and Deconvolution

Aggregation of Relevance: LRP explanations are normalized (conservation of

relevance). This allows to meaningfully aggregate relevance over datasets or

regions in an image.

Page 46: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Explaining Neural Network Predictions

Sensitivity Deconvolution LRP

Page 47: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Application: understanding different DNN Architectures

GoogleNet focuses on the

animal faces and only few pixels.

BVLC CaffeNet is less sparse.

Page 48: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Explaining Predictions Pixel-wise

Neural networks Kernel methods

Page 49: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Understanding learning models

for complex gaming scenarios

Page 50: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Analysing Breakout: LRP vs. Sensitivity

LRP sensitivity

Page 51: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Perspectives

Page 52: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Is the Generalization Error

all we need?

Page 53: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Application: Comparing Classifiers

Page 54: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Machine Learning in the Sciences

Page 55: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Machine Learning in Neuroscience

Page 56: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

BBCI Set-up: Let the machines learn

Artifact removal

[cf. Müller et al. 2001, 2007, 2008, Dornhege et al. 2003, 2007, Blankertz et al. 2004, 2005, 2006, 2007, 2008]

Page 57: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Brain Computer Interfacing: ‚Brain Pong‘

Leitmotiv: ›let the machines learn‹

Berlin Brain Computer Ínterface

• ML reduces patient training from

300h -> 5min

Applications

• help/hope for patients (ALS,

stroke…)

• neuroscience

• neurotechnology (video

coding, gaming, monitoring

driving)

Page 58: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

DNN Explanation Motor Imagery BCI

Note: Explanation available for single Trial (Sturm et al submitted)

Page 59: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Machine Learning in Chemistry,

Physics and Materials

Matthias Rupp, Anatole von Lilienfeld,

Alexandre Tkatchenko, Klaus-Robert Müller

Page 60: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Machine Learning for chemical compound space

Ansatz:

instead of

[from von Lilienfeld]

Page 61: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Coulomb representation of molecules

2.4

iii Z=M

ji

ji

ijRR

ZZ=M

{Z1,R

1}

{Z2,R

2}

{Z3,R

3}

{0,R22}{0,R

21} {0,R23}

+ phantom atoms

{Z4,R

4}

...

Coulomb Matrix (Rupp, Müller et al 2012, PRL)

ijM

2323 M

Page 62: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Kernel ridge regression

Distances between M define Gaussian kernel matrix K

Predict energy as sum over weighted Gaussians

using weights that minimize error in training set

Exact solution

As many parameters as molecules + 2 global parameters, characteristic length-scale or kT of system (σ), and noise-level (λ)

[from von Lilienfeld]

Page 63: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Predicting Energy of small molecules: Results

March 2012

Rupp et al., PRL

9.99 kcal/mol

(kernels + eigenspectrum)

December 2012

Montavon et al., NIPS

3.51 kcal/mol

(Neural nets + Coulomb sets)

2015 Hansen et al 1.3kcal/mol at

10 million times faster than the

state of the art

Prediction considered chemically

accurate when MAE is below 1

kcal/mol

Dataset available at http://quantum-machine.org

Page 64: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Learning Atomistic Representations with

Deep Tensor Neural Networks

Kristof Schütt,Farhad Arbabzadah,

Stefan Chmiela, Alexandre Tkatchenko,

Klaus-Robert Müller

Page 65: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Input Representation

Page 66: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Deep Tensor Neural Network

Page 67: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers
Page 68: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

DTNN in detail

Page 69: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

DTNN in detail II

Page 70: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Chemical Compound Space

Page 71: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Molecular Dynamics Simulations

Page 72: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Explaining and Visualizing the learned interactions

Page 73: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Local ‚potentials‘ for various probes

Page 74: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Quantum Chemical Insights: aromaticity

Page 75: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Quantum Chemical Insights

Page 76: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Conclusion

• explaining & interpreting nonlinear models is essential

• orthogonal to improving DNNs and other models

• need for opening the blackbox …

• understanding nonlinear models is essential for Sciences & AI

• new theory: LRP is based on deep taylor expansion SAMEK LECTURE

Remark: @NIPS 2017 ML4QC & XPLAINABLE ML workshops

Page 77: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers
Page 78: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

© 2013 Berlin Big Data Center • All Rights Reserved

Page 79: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Further Reading I

Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., & Samek, W. (2015). On pixel-wise

explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one, 10(7).

Bießmann, F., Meinecke, F. C., Gretton, A., Rauch, A., Rainer, G., Logothetis, N. K., & Müller, K. R. (2010).

Temporal kernel CCA and its application in multimodal neuronal data analysis. Machine Learning,

79(1-2), 5-27.

Blum, L. C., & Reymond, J. L. (2009). 970 million druglike small molecules for virtual screening in the

chemical universe database GDB-13. Journal of the American Chemical Society, 131(25), 8732-8733.

Braun, M. L., Buhmann, J. M., & Müller, K. R. (2008). On relevant dimensions in kernel feature spaces. The

Journal of Machine Learning Research, 9, 1875-1908

Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., & Müller, K. R. (2017). Machine

learning of accurate energy-conserving molecular force fields. Science Advances, 3(5), e1603015.

Hansen, K., Montavon, G., Biegler, F., Fazli, S., Rupp, M., Scheffler, M., von Lilienfeld, A.O., Tkatchenko,

A., and Muller, K.-R. "Assessment and validation of machine learning methods for predicting molecular

atomization energies." Journal of Chemical Theory and Computation 9, no. 8 (2013): 3404-3419.

Hansen, K., Biegler, F., Ramakrishnan, R., Pronobis, W., von Lilienfeld, O. A., Müller, K. R., & Tkatchenko,

A. (2015). Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and

Nonlocality in Chemical Space, J. Phys. Chem. Lett. 6, 2326−2331.

Harmeling, S., Ziehe, A., Kawanabe, M., & Müller, K. R. (2003). Kernel-based nonlinear blind source

separation. Neural Computation, 15(5), 1089-1124.

Sebastian Mika, Gunnar Ratsch, Jason Weston, Bernhard Scholkopf, KR Muller (1999), Fisher discriminant

analysis with kernels, Neural Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE

Signal Processing Society Workshop, 41-48.

Kloft, M., Brefeld, U., Laskov, P., Müller, K. R., Zien, A., & Sonnenburg, S. (2009). Efficient and accurate lp-

norm multiple kernel learning. In Advances in neural information processing systems (pp. 997-1005).

Page 80: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Further Reading II

Laskov, P., Gehl, C., Krüger, S., & Müller, K. R. (2006). Incremental support vector learning: Analysis,

implementation and applications. The Journal of Machine Learning Research, 7, 1909-1936 Mika, S.,

Schölkopf, B., Smola, A. J., Müller, K. R., Scholz, M., & Rätsch, G. (1998). Kernel PCA and De-Noising

in Feature Spaces. In NIPS (Vol. 4, No. 5, p. 7).

Müller, K. R., Mika, S., Rätsch, G., Tsuda, K., & Schölkopf, B. (2001). An introduction to kernel-based

learning algorithms. Neural Networks, IEEE Transactions on, 12(2), 181-201.

Montavon, G., Braun, M. L., & Müller, K. R. (2011). Kernel analysis of deep networks. The Journal of

Machine Learning Research, 12, 2563-2581.

Montavon, Grégoire, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas Ziehe,

Alexandre Tkatchenko, Anatole V. Lilienfeld, and Klaus-Robert Müller. "Learning invariant

representations of molecules for atomization energy prediction." In Advances in Neural Information

Processing Systems, pp. 440-448. 2012.

Montavon, G., Braun, M., Krueger, T., & Muller, K. R. (2013). Analyzing local structure in kernel-based

learning: Explanation, complexity, and reliability assessment. IEEE Signal Processing Magazine, 30(4),

62-74.

Montavon, G., Orr, G. & Müller, K. R. (2012). Neural Networks: Tricks of the Trade, Springer LNCS 7700.

Berlin Heidelberg.

Montavon, Grégoire, Matthias Rupp, Vivekanand Gobre, Alvaro Vazquez-Mayagoitia, Katja Hansen,

Alexandre Tkatchenko, Klaus-Robert Müller, and O. Anatole von Lilienfeld. "Machine learning of

molecular electronic properties in chemical compound space." New Journal of Physics 15, no. 9

(2013): 095003.

Snyder, J. C., Rupp, M., Hansen, K., Müller, K. R., & Burke, K. Finding density functionals with

machine learning. Physical review letters, 108(25), 253002. 2012.

Page 81: Explaining and Interpreting Deep Neural Networks · Explaining Neural Network Predictions Layer-wise relevance Propagation (LRP, Bach et al 15) first method to explain nonlinear classifiers

Further Reading III

Pozun, Z. D., Hansen, K., Sheppard, D., Rupp, M., Müller, K. R., & Henkelman, G., Optimizing transition

states via kernel-based machine learning. The Journal of chemical physics, 136(17), 174101. 2012 .

K. T. Schütt, H. Glawe, F. Brockherde, A. Sanna, K. R. Müller, and E. K. U. Gross, How to represent crystal

structures for machine learning: Towards fast prediction of electronic properties Phys. Rev. B 89,

205118 (2014)

K.T. Schütt, F Arbabzadah, S Chmiela, KR Müller, A Tkatchenko, Quantum-chemical insights from deep

tensor neural networks, Nature Communications 8, 13890 (2017)

Rätsch, G., Onoda, T., & Müller, K. R. (2001). Soft margins for AdaBoost. Machine learning, 42(3), 287-

320.

Rupp, M., Tkatchenko, A., Müller, K. R., & von Lilienfeld, O. A. (2012). Fast and accurate modeling of

molecular atomization energies with machine learning. Physical review letters, 108(5), 058301.

Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue

problem. Neural computation, 10(5), 1299-1319.

Smola, A. J., Schölkopf, B., & Müller, K. R. (1998). The connection between regularization operators and

support vector kernels. Neural networks, 11(4), 637-649.

Schölkopf, B., Mika, S., Burges, C. J., Knirsch, P., Müller, K. R., Rätsch, G., & Smola, A. J. (1999). Input

space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5),

1000-1017.

Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., & Müller, K. R. (2002). A new discriminative kernel

from probabilistic models. Neural Computation, 14(10), 2397-2414.

Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., & Müller, K. R. (2000). Engineering support

vector machine kernels that recognize translation initiation sites. Bioinformatics, 16(9), 799-807.

.