16
Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data Keith T. Butler 1,2 , Manh Duc Le 3 , Jeyan Thiyagalingam 1,4 , Toby G. Perring 3 1 SciML, Scientific Computing Department, STFC Rutherford Appleton Laboratory, Harwell Campus, Didcot, OX11 0QX, UK 2 Department of Materials Science and Engineering, University of Oxford, 21 Banbury Rd, Oxford OX2 6HT 3 ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, Harwell Campus, Didcot, OX11 0QX, UK 4 Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ Abstract. Deep neural networks provide flexible frameworks for learning data representations and functions relating data to other properties and are often claimed to achieve ‘super-human’ performance in inferring relationships between input data and desired property. In the context of inelastic neutron scattering experiments, however, as in many other scientific scenarios, a number of issues arise: (i) scarcity of labelled experimental data, (ii) lack of uncertainty quantification on results, and (iii) lack of interpretability of the deep neural networks. In this work we examine approaches to all three issues. We use simulated data to train a deep neural network to distinguish between two possible magnetic exchange models of a half-doped manganite. We apply the recently developed deterministic uncertainty quantification method to provide error estimates for the classification, demonstrating in the process how important realistic representations of instrument resolution in the training data are for reliable estimates on experimental data. Finally we use class activation maps to determine which regions of the spectra are most important for the final classification result reached by the network. email: [email protected] arXiv:2011.04584v2 [cond-mat.mtrl-sci] 20 Nov 2020

Interpretable, calibrated neural networks for analysis and ...z email: [email protected] arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks foranalysis and understanding of inelastic neutronscattering data

Keith T. Butler1,2‡, Manh Duc Le3, JeyanThiyagalingam1,4, Toby G. Perring3

1SciML, Scientific Computing Department, STFC Rutherford AppletonLaboratory, Harwell Campus, Didcot, OX11 0QX, UK2Department of Materials Science and Engineering, University of Oxford, 21Banbury Rd, Oxford OX2 6HT3ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory,Harwell Campus, Didcot, OX11 0QX, UK4Department of Engineering Science, University of Oxford, Parks Road, Oxford,OX1 3PJ

Abstract. Deep neural networks provide flexible frameworks for learning datarepresentations and functions relating data to other properties and are oftenclaimed to achieve ‘super-human’ performance in inferring relationships betweeninput data and desired property. In the context of inelastic neutron scatteringexperiments, however, as in many other scientific scenarios, a number ofissues arise: (i) scarcity of labelled experimental data, (ii) lack of uncertaintyquantification on results, and (iii) lack of interpretability of the deep neuralnetworks. In this work we examine approaches to all three issues. We usesimulated data to train a deep neural network to distinguish between twopossible magnetic exchange models of a half-doped manganite. We apply therecently developed deterministic uncertainty quantification method to provideerror estimates for the classification, demonstrating in the process how importantrealistic representations of instrument resolution in the training data are forreliable estimates on experimental data. Finally we use class activation mapsto determine which regions of the spectra are most important for the finalclassification result reached by the network.

‡ email: [email protected]

arX

iv:2

011.

0458

4v2

[co

nd-m

at.m

trl-

sci]

20

Nov

202

0

Page 2: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 2

1. Introduction

Neural networks have witnessed a renaissance in thelast decade, dramatically improving on existing stateof the art performance in fields from image processingto automatic language translation. In the sciences, ithas been proposed that we are on the cusp of a “fourthparadigm” of data-driven discovery [1, 2], an idea thatis supported by the recent explosion in research usingmachine learning (ML), not least in materials scienceand condensed matter physics [3, 4, 5, 6].

Experiments at large-scale facilities such asneutron sources would seem to be a natural fitfor the application of data-driven methods foranalysis [7]. Indeed there have been a number ofrecent publications applying machine learning to manyaspects of neutron science, from pulse discriminationin scintillator detectors [8], to enhancing resolution [9]and exploration of collected data [10]. ML techniqueshave been used for the analysis of diffuse neutronscattering in spin-ice systems [11], small angle neutronscattering [12, 13] and for constructing inter-atomicpotentials to compare to inelastic neutron scatteringdata [14]. However, questions still remain tobe answered before machine learning becomes afully integrated and trusted part of the neutrondata analysis workflow; for example can we placeuncertainty estimates on our predictions and can weexplain why a neural network predicted what it did?A particularly challenging case is the interpretationof inelastic neutron scattering (INS) data from singlecrystal samples.

INS is a powerful technique to investigate thewave vector and frequency (equivalently excitationenergy) dependency of the spectrum of excitationsin condensed matter. The neutron scattering cross-section is particularly simple: it directly yields two-particle correlation functions, for example the particle-particle correlation function, or in the case of magneticsystems the spin-spin correlation function, and byvirtue of the fluctuation-dissipation theorem, thegeneralised response functions of the system [15,16]. Consequently INS has played a central roledetermining, for example, phonon and spin wavedispersion relations and the spectra of magneticfluctuations in materials, which in turn can play centralroles in the thermal and charge transport properties,and the mechanisms of exotic phenomena such ashigh temperature superconductivity and heavy fermion

behaviour [17, 18, 19, 20].The richest information comes from experiments

with single crystal samples, where the spectra canbe explored as a function of all four componentsof wave vector and frequency. The latest INSspectrometers for such experiments at pulsed neutronsources (e.g. [21, 22, 23]) and equivalent instruments atreactors (e.g. [24]) allow the incident neutron energyto be fixed, and typically have ≈3 steradians ofposition sensitive area detectors divided into ∼ 105

detector elements, in each of which the energy of everyscattered neutron (and hence the energy transferredto excitations) is resolved into ∼ 200 − 500 channels.With just one crystal orientation, therefore, data arecollected in > 107 voxels in a three dimensional (3D)space (corresponding to the the two coordinates onthe area detector, and energy transfer). In the caseof quasi-2D magnetic materials, where the magneticcoupling is significant only within planes of atomsbut negligible between planes, these three coordinatescan be transformed into those of wave vector withinthe plane and energy transfer, thereby in principleenabling the entire excitation spectrum to be measuredin parallel. An extension of this approach is to combine∼ 100 − 500 measurements between which the crystalis rotated by a few tenths of a degree. The crystalrotation provides another independent coordinate sothat the full 4D wave vector - excitation energy space ofexcitations can be measured, in ∼ 109− ∼ 1010 voxels([25] and references therein).

The sizes of the datasets collected on theseinstruments are challenging to analyse and interpretwith conventional but well-established and well-understood methods and software. The motivationfor this work has been to investigate to what extentML methods can reduce the labour of inelasticneutron data analysis while still returning reliableand interpretable information. In this contributionwe re-examine the data from a well-understood INSmeasurement of spin waves in a single crystal ofPr(Ca0.9Sr0.1)2Mn2O7 [26], a moderately complexmagnetic material, using recent advances in deeplearning. Different models of the exchange interactionsin Pr(Ca,Sr)2Mn2O7 were proposed based on theobserved magnetic and crystallographic structure,but INS measurements of the spin wave spectrumand theoretical modelling were key to distinguishingbetween them. We show that ML techniques canbe used to distinguish between the models and

Page 3: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 3

can highlight the diagnostic regions of the spectrumresponsible for this categorisation.

In Section 2 we briefly describe the interest inand theoretical background to the physics of themanganites, and Pr(Ca0.9Sr0.1)2Mn2O7 in particular,together with the different exchange models. InSection 3 we discuss the experimental data andhow it was modelled using linear spin wave theoryto create training datasets for our neural networks,taking account of the instrument resolution. InSection 4, the main part of the paper, we introducethe neural network classifier, incorporate uncertaintyquantification into the network and show how classactivation maps may be used to understand itsinferences. Finally, we discuss future challenges andsummarise our conclusions.

2. Exchange interaction models in half-dopedmanganites

Perovskite manganese oxides RE(1−x)AxMnO3 (RE= rare earth, A = Ca,Sr,Ba,Pb) and their layeredanalogues have been widely studied not just because ofthe colossal magnetoresistance (CMR) they can show -up to many orders of magnitude change in resistivity inapplied magnetic fields of a few Tesla - but also becauseof the rich physics that arises from the coupling of thecharge, spin, lattice and orbital degrees of freedom [27].Typically the most pronounced CMR is with light holedoping x ≈ 0.2 − 0.4 where the material is balancedbetween being in a ferromagnetic (FM) metallic phaseand an antiferromagnetic (AFM) charge and orbitalordered (CO) insulating phase. One of the commonestAFM CO phases is found near half doping, x = 0.5,and the nature of this phase was the topic of the INSstudy in ref. [26].

The structure of the perovskite manganese oxidesconsists of manganese ions, each surrounded by sixoxygen ions forming a distorted octahedron, and theseoctahedra are in turn connected at their vertices tocreate a (nearly) cubic network, with a rare earthor alkaline earth cation filling the gap between theoctahedra. Pr(Ca,Sr)2Mn2O7 is a half-doped bilayermanganite consisting of pairs of MnO6 octahedrallayers, which are separated by a layer of the interstitialatoms that reduce the magnetic exchange couplingbetween bilayers by two orders of magnitude comparedto the intra-bilayer couplings.

The famous and long-standing Goodenough modelfor the AFM CO phase is shown in Figure 1(a).It is based on the Goodenough-Kanamori-Andersonrules [28, 29] for magnetic superexchange together witha checkerboard ordering of equal numbers of formally3+ and 4+ valence Mn ions. The octahedra around theMn3+ are elongated due to the singly occupied higher

a

bJF1

JAJF2

JF3JFS

JA1

JFW

JA2

a

b

Dimer

(a) (b)

Mn4+Mn3+

Figure 1. Two magnetic exchange models in a single sheetof Mn ions in a half-doped manganite showing the CE-typemagnetic ordering. (a) Goodenough model (b) dimer model,similar to Goodenough but neighbouring Mn3+/Mn4+ sitesalong the ferromagnetic zig-zag chains are loosely bound in adimer.

energy eg orbitals. Accommodating this elongationwithin the structure results in a herringbone patternof the eg orbitals and the experimentally observed CE-type magnetic structure, consisting of ferromagneticordering along zig-zag chains with antiferromagneticordering between neighbouring zig-zags.

A very different and somewhat controversial modelcame to prominence in 2002, following a seminalneutron diffraction experiment [30]. In this model,known as the Zener polaron model, adjacent Mn3+

and Mn4+ spins form dimers with strong ferromagneticintra-dimer coupling, but much weaker inter-dimercoupling (Figure 1(b)). The two models have the sameunderlying periodicities and it is difficult to distinguishbetween them using diffraction techniques as this relieson the presence or absence of (and the differencesbetween the intensities of) weak superlattice Braggpeaks. However, measuring the spin wave spectrumoffers a way to distinguish between the models.The limiting case of strong intra-dimer coupling wasstraightforwardly eliminated on the grounds of theperiodicities of the spin wave dispersion relations [26]compared to the data. A more realistic scenario isthat of weaker dimerisation. This dimer model waseventually eliminated in ref. [26], but it was far fromstraightforward, because with an appropriate set ofmagnetic exchange parameters the dimer model canclosely reproduce the spin wave dispersion relations inthe Goodenough model, with only subtle differences inthe intensities throughout most of the Brillouin zone.Determining which of the Goodenough and dimermodels correctly described the magnetic interactionswas the primary result of ref. [26] which concluded thatthe Goodenough model best fits the INS data.

Both the Goodenough and dimer models can bedescribed by a spin Hamiltonian taking the standard

Page 4: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 4

Goodenough model dimer model

Ref (meV) Range (meV) Ref (meV) Range (meV)

JF1 -11.39(5) [-20, 0] JFS -14.20(8) [-20, 0]

JA 1.50(2) [0, 3] JFW -8.43(6) [-20, 0]

JF2 -1.35(7) [-3, 3] JA1 1.52(1) [-3, 3]

JF3 1.50(5) [-3, 3] JA2 1.52(1) [-3, 3]

J⊥ 0.88(3) [0, 3] J⊥ 0.92(3) [-20, 0]

D 0.074(1) [0, 0.2] D 0.073(1) [0, 0.2]

Table 1. Spin wave exchange parameters (meV) for theGoodenough and dimer models for Pr(Ca0.9Sr0.1)2Mn2O7 withstandard errors in parentheses as determined by [26], and therange of values used to generate random datasets for trainingthe neural network classifier in this work. Positive valuesindicate antiferromagnetic exchange. D denotes the single-ionanisotropy term, with positive values here indicating an easy-plane anisotropy.

Heisenberg form

H =∑

i,j

JijSi · Sj +D∑

i

(Szi )2 (1)

where the atom pairs ij are not the same in the twomodels, so the topology of the exchange interactions Jijwill differ. In this formulation, a positive Jij indicatesantiferromagnetic interactions (this is opposite to theconvention used in ref [26]). The final term is asingle-ion anisotropy term which with D > 0 tendsto keep the spins in the a − b plane consistent withobservations. Figure 1 shows the exchange topologyof each model, and Table 1 summarises the publishedexchange interactions [26] deduced for the two models.

In the Goodenough model the nearest-neighbourFM interaction JF1 along the zig-zags is supplementedby two next-nearest-neighbour interactions JF2 andJF3, which need not be FM but must be permittedto reproduce experimentally observed periodicities inthe experimental data. In addition there is anAFM nearest-neighbour exchange JA between the zig-zags. In the dimer model, the nearest-neighbour FMinteractions within a zig-zag chain are replaced by astronger intra-dimer FM interaction JFS and a weakerinter-dimer FM interaction JFW . Between the zig-zagchains are additional inter-dimer interactions denotedJA1 and JA2 as they link antiparallel spins. Thesymmetry of the model does not require them to beidentical, nor is it necessary for both of them tobe AFM. Finally, in both models there is an inter-layer interaction coupling the two sheets of the bilayeralong the crystallographic c direction, denoted J⊥.The magnetic coupling between bilayers is negligible;consequently the spin wave spectrum is dispersive onlyin the a− b plane and so can be treated as quasi-2D.

3. Data

3.1. Experimental data

The experimental INS data was acquired using theMAPS spectrometer at the ISIS Neutron and MuonSource, and the conventional analysis of that datawas previously published [26]. Datasets were collectedwith several different incident neutron energies andmonochromating (Fermi) chopper speeds (Ei = 25,35, 50, 70, 100, 140 meV, with corresponding speedsf = 300, 200, 200, 250, 300, 400 Hz) to span the fullbandwidth of the spin wave spectrum in a series ofenergy transfer windows with appropriate wave vectorand energy resolution. The co-aligned array of singlecrystals of Pr(Ca0.9Sr0.1)2Mn2O7 was mounted withthe incident beam parallel to the c axis so that the a−bplane is imaged in the detectors. All measurementswere carried out in a closed cycle refrigerator at4 K. For each dataset, an estimate of non-magneticscattering and background was made from several cutsas a function of energy transfer (~ω) at different wavevectors, from which sections were stitched togetherto construct a single cut that included no spin wavescattering.

To use with the convolutional neural networksdescribed below, the background-subtracted data werearranged into a gray scale image, obtained by firstvertically stacking ten constant-energy transfer slicesper incident energy spectrum, each stack uniformlydividing the energy transfer range between 0.1Ei and0.7Ei into slices with thickness 0.06Ei. We chose toexclude data close to the elastic line (i.e. zero energytransfer) as the signal there is dominated by non-magnetic scattering, and also to exclude the spectrumabove 0.7Ei as the signal there has a significant non-magnetic background. The constant-energy transferslices are gray scale 40×40 pixel images over the wavevector range −1 < h, k < 1, which covers the fullperiodicity of the spin waves in the reciprocal lattice.The ten slices for each of the six Ei form 40×400 pixelstrips, which were then stacked horizontally producinga 240×400 pixel image, shown in Figure 2. Thisimage was used as input to the trained neural networkclassifier in order to determine which exchange model(Goodenough or dimer) better describes the measureddata.

3.2. Training Data

The convolutional neural networks described in thenext section were trained on simulated data, in thesame 240×400 gray scale image format as the exper-imental data. For the training datasets we used lin-ear spin wave theory, as implemented in the SpinWcode [31], to simulate the spin wave spectrum corre-

Page 5: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 5

Data

Ei=25 35 50 70 100 140 0.10-0.16 Ei

0.16-0.22 Ei

0.22-0.28 Ei

0.28-0.34 Ei

0.34-0.40 Ei

0.40-0.46 Ei

0.46-0.52 Ei

0.52-0.58 Ei

0.58-0.64 Ei

0.64-0.70 Ei

Goodenough

Ei=25 35 50 70 100 140

Dimer

Ei=25 35 50 70 100 140

Figure 2. 2D representation of the experimental data (left panel). The data are arranged column-wise in terms of incidentneutron energy (Ei), with Ei in meV given at the top of the column. The data are arranged row-wise into bins of energy transfer~ω=0.10-0.16Ei, 0.16-0.22Ei etc. The middle and right panels show the calculated spectra using the Goodenough (middle) anddimer (right) exchange models at the published parameters noted in Table 1.

sponding to sets of distinct spin Hamiltonian param-eters {Jij , D}. Each constant energy transfer slice(40×40 pixel sub-image) is computed independentlyby convolving the spin wave theory spectrum with thespectrometer resolution function for each detector ele-ment - energy transfer bin in the data, using a MonteCarlo integration method [32] (named “Tobyfit” afterthe first program to implement it), as implemented inthe neutron scattering data analysis code Horace [25].The resulting simulation is sliced to produce a stackof constant energy transfer slices in precisely the sameway as the experimental data are sliced. This reso-lution function calculation properly accounts for thefunctional form of the contributions from individualinstrument components. However, the resolution con-volution is quite computationally intensive, requiringapproximately 12 CPU-hr per parameter set (240×400image). The largest amount of time is spent by thespin wave calculation itself, that is, within SpinW, be-cause the resolution convolution requires the calculatedstructure factor at a large number of wave vector andenergy transfer (Q, ~ω) positions, each of which re-quires the computation and diagonalisation of a Hamil-tonian matrix and additional matrix multiplicationsto obtain the spin-spin correlation function and henceneutron scattering intensity. In order to speed thisup we used another application, Brille [33], to pre-compute these structure factors in a dense grid withinthe first Brillouin zone, and then to linearly interpolatewithin this grid for the required (Q, ~ω) points neededby the resolution convolution. This reduced the calcu-lation time to ≈1 CPU-hr per parameter set.

We also explored an approximate and faster

resolution convolution method, motivated by thegeneral desire to reduce the number of CPU hoursneeded when using ML techniques. Instead of usingexpensive Monte Carlo integration, the resolutionfunction covariance matrix is pre-computed for eachincident energy spectrum in a grid in (Q, ~ω) space andthen used to define a 4D Gaussian profile with whichthe spin wave theory dispersion is convolved. Withan 80×80 grid per 40×40 pixel slice the calculationtimes were reduced to ≈10 CPU-min per parameterset. However, the resulting calculated spectra yieldedlarger classification errors as described below. Thecoarseness of the grid resulted in an underestimateof the resolution broadening, and consequently thecalculation is “sharper” than the data and the energytransfer of the top (bottom) of the spin wave dispersionbands were calculated to be too low (high) in energytransfer. The problem can be rectified by using finergrids but this then results in similar calculation timesto the “Tobyfit” method with Brille interpolation.Furthermore, the use of a pre-computed grid alsoresults in aliasing effects for certain grid sizes whichthe Monte Carlo method is not susceptible to. Theimportant lesson from this experience is that accurateaccount of the resolution function needs to be taken forrobust interpretation of INS, despite the concomitantcost in computing resources.

We generated 3322 images for each of theGoodenough and dimer models, split 6000:644 betweentraining and validation datasets, which required a totalof ∼7000 CPU-hr. The parameters for these datasetsare randomly generated by independently selectingfrom a uniform distribution within the limits noted in

Page 6: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 6

Table 1 for each Jij and model. These limits bracketlikely maximum ranges of estimates of the parametersfrom the bandwidths of the spin wave branches andvalues of the exchange constants determined in variousother manganites. This mimics the procedure thatwould be followed if the training datasets were beinggenerated for an unknown system.

The data can be downloaded from an onlinerepository [34]. Figure 2 shows the calculated spectrausing the Monte Carlo resolution convolution methodfor both the Goodenough and dimer models alongsidethe measured data.

4. A network for phase discrimination

4.1. Can a neural network learn to distinguishphases?

We first want to establish that we can train aneural network (NN) capable of learning to distinguishbetween the Goodenough and dimer models based onthe INS spectrum. This is a classic inverse problem;while the forward mapping from the spin Hamiltonianfor each model to the INS spectrum is well known, theinverse is not.

Analysing the spin wave spectrum pixel-by-pixelusing a multi-layer perceptron (MLP) type NNarchitecture would lead to a rapidly exploding numberof parameters. In addition, as an MLP treats the inputimage as a vector of data, it can be very sensitive tosmall shifts in spectral feature positions which maynot be characteristic of the different exchange models.While these shifts do not pose a problem for thetraining of the network which is done on generateddata, it may become an issue when confronted byexperimental data, because slight miscalibrations ofthe spectrometer may produce data which is offsetcompared to the ideal simulations. Furthermore,due instrumental resolution broadening, neighbouringpixels in a spectrum (image) are correlated, and aMLP would have to learn these correlations explicitly.Instead, we employ the popular convolutional neuralnetwork (CNN) architecture [35, 36], whereby theimage (spectrum) is first passed through a series offilters, before feeding the values of these filters intoa MLP architecture. During training the weights ofthe filters and the MLP are updated. The filterscan be seen as learning a reduced representation ofthe spectrum (including implicit correlations betweenneighbouring pixels), while the MLP learns thefunction to map these features to the correspondingexchange interaction model.

There are a wide array of CNN architecturesalready trained and available for image recognitionproblems, for example the alexnet [37] and resnext[38] architectures which are both previous winners of

20

64 64

32 32

32

16

5x5

16

16 32 32 32

1x2

1x3

3x3

1x3 1x3 1x3

2x2

16

5x5

16

16 32 32 32

1x2

1x3

3x3

1x3 1x3 1x3

2x2

344448

64

a

b

c

Figure 3. Network architectures. (a) The convolutional neuralnetwork (CNN) architecture; at each (green) convolution layerthe number of filters is indicated below and the size of theconvolution matrix above the blocks, and at each fully connected(orange) layer the number of input dimensions is indicated. Aftereach block of convolution layers we perform a MaxPool (brownblock with pool size indicated) and batch normalisation (seetext). The large dense layer (344448 element orange block) isa vector obtained by flattening the outputs of the final layer ofconvolution filters, this layer is then connected to a 64 node layerand finally to the classification head. (b) The same CNN as (a),but rather than flatten the final layer of filters a global poolingof each of the 32 filters is used to generate 32 values which areconnected directly to the classification head; this architectureis used to construct class activation maps. (c) The variationalautoencoder architecture; at each convolution (green) layer thenumber of filters is indicated, we also perform (2×2) striding (theconvolution filter skips across the input in steps of two, therebyhalving the size of the output) at each convolutional layer, thecentral latent (orange) layer consists of 20 Gaussian distributionsas characterised by a mean and standard deviation.

Page 7: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 7

the imagenet [39] data classification challenge. In thiswork we build a relatively simple CNN architecture,as the data that we are classifying and the numberof potential classes are quite small. As we willdemonstrate later, choosing simpler architectures canhelp with interpretation of the final CNN inference.

The architecture we built is represented schemat-ically in Figure 3 (a). The convolution layers are cou-pled with a pooling layer so that the image is subse-quently down-sized as it passes through the convolu-tions and a batch normalisation is applied after pool-ing. In batch normalisation the outputs from a layerare normalised to fall between 0 and 1 across a pre-defined batch size of training samples. In MaxPool thesize of a filter is reduced between layers by pooling sec-tions of the filter and taking the maximum value ofeach pool. At the final convolutional layer the valuesof each filter F j

xy (where x and y are spatial coordi-nates and j is the filter index) are flattened to producea 117× 92× 32 = 344448-element vector § which is fedinto a fully-connected layer (MLP) with d = 64 outputnodes. Classification is then performed by a weightedsummation of these 64 values vi based on trainableweights wc

i and a trainable bias bc, which serve as in-put to a sigmoid function φ. That is, the score for classc is

Y c = φ(

i=d∑

i=1

wci vi + bc) (2)

with the class with the highest score selected by thenetwork.

The simulated training data are split into trainingand validation sets as described in the data generationsection. To save memory the training dataset is dividedinto batches of 32 images. In a single training iteration,the network is fed these 32 images to calculate theloss function (we use the mean binary cross-entropyloss of the classification of all images in the batch asthe loss function). The gradient of each adjustableweight in the network with respect to the loss is thencalculated using backpropagation and the weight isadjusted in the direction of steepest descent. Thisiteration is repeated for each batch of 32 images, untilall batches have been passed through the networkonce, which is termed an epoch. At the end of eachepoch the loss from the separate validation dataset iscalculated. If this validation loss is much larger thanthe loss calculated from the training set after adjustingthe weights, the network is overfitting the training

§ A convolutional filter with size n, padding p and stride schanges an image with input dimension d to (d− n + 2p)/s + 1;in our classification CNNs p = 0 and s = 1 for all cases. Thatis, after the first 5 × 5 convolutional layer, the image (initially240× 400) size is 236× 396 and after the first 1× 2 MaxPool itssize is 234× 197. The number of filters we used at each layer isspecified in Figure 3. The final convolution layer has 32 filtersand the image has been downsized to 117× 92 pixels.

data such that it cannot generalise its inference tothe similar data in the validation set. The trainingis set to terminate if the validation accuracy has notimproved for 20 epochs, up to a maximum of 500epochs. The full details of the network training andthe code for the architecture are available in an onlinerepository [34]. The final trained CNN achieves > 96%accuracy in predicting the correct exchange model fordata in the validation set for both Goodenough anddimer datasets. The training curves are included assupporting information.

4.2. How much can we trust the predictions?

When a network has been trained to separate datainto two classes, it can be difficult to know howreliable a prediction on a new piece of data is. Binaryclassification (Equation 2) applies a sigmoid function,which intentionally favours output values close toeither 0 or 1, forcing separation between classes;however, this means that an individual sample maybe classified strongly into a class despite a degree ofambiguity in the result. For example, when we trainedthe previously described network on data simulatedwith the Monte Carlo resolution convolution, andthen feed it the experimental data spectrum, it givesclass scores corresponding to [Y dimer, Y Goodenough] of[0.012, 0.988]. However, when the same networkwas trained using data simulated using the fasterpre-computed grid of covariance matrix resolutionconvolution method, it gives [0.999, 0.001]. That is,it not only wrongly classifies the experimental data asbeing in the dimer class, it does so strongly. It is thusincorrect and misleading to interpret this output vectorof the sigmoid classification as a measure of confidencefor that classification. To address this concern wehave adapted the recently published deterministicuncertainty quantification (DUQ) scheme to work withour classification network [40].

There are a range of methods available foruncertainty quantification in NN predictions, forexample Bayesian neural nets (BNNs) [41]. Pure BNNsare intractable for exact inference, although a numberof approximations have been proposed, includingthe simple-to-implement MC Dropout approach [42].In practice most of these Bayesian approaches areoutperformed by the Deep Ensembles (DE) approach[43], in which multiple networks with the samearchitecture are trained from different initial valuesof weights and biases and dataset orderings, resultingin an distribution of answers. The drawback ofthe DE approach is the requirement to train andrun multiple networks and the associated lineargrowth of computational cost. We instead choosethe DUQ approach, which can provide uncertaintyestimates in a single forward pass of the network,

Page 8: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 8

Deterministic uncertainty quantification

Logistic regression

FFF

FFFF

v

v

v

v

+ec

+ec

¾

¾

Kc

Kc

Y c

Y c

Figure 4. Schematic representation of the deterministicuncertainty quantification (DUQ) method. The input initiallypasses through a series of convolutions (orange block) to extractfeatures. In standard logistic regression the outputs from theconvolutions are classified by summing the weights connectingeach filter fi to the class C of interest, as in equation 2.DUQ clusters the examples based on distances Kc in a highdimensional space of the outputs from the convolutions fromthe centre ec of clusters of training examples, according toequation 3.

but shows performance in out-of-distribution datadetection (identifying cases where predictions wouldbe an extrapolation from training data, rather thanan interpolation) on a par with DE approaches [40].We note that several new approaches show goodpotential for uncertainty quantification in NNs, suchas the stochastic differential equation approach [44]and radial BNNs [45], but a full comparison of allapproaches is beyond the scope of the current study.

The principle of DUQ as applied here is asfollows. Let MΘ : x ∈ Rn → v ∈ Rd be thetransformation learnt by the network which maps aninput vector x (the input image, size 240× 400 pixels)with dimension n = 96000 to an output features vectorv = {v1, . . . , vd} with dimension d = 64 (the numberof output nodes connected to the classification head),and Θ are the parameters (weights and filters) of thenetwork. In DUQ [40] (Figure 4), instead of the finaldense layer (the classification head) which maps v tothe class scores Y = {Y 1, . . . , Y c}, a different mappingis learnt which transforms v into a vector space withdimension m where the samples of the same class care clustered together. This is embodied by a per-classlearnable weight matrix Wc ∈ Rm×d with m = 64 inour case. The correlation Kc between this Wcv vectorand the class centroid ec, given by

Kc(v, ec) = exp

[−||Wcv − ec||22

2nσ2

], (3)

is then a measure of the uncertainty in the classifica-tion, with the hyper-parameter σ being a characteristiclength scale, also learned during the training. The net-

work then assigns a given input sample x to the classwith the largest correlation Kc. Thus instead of clas-sifying the sample based on the highest score Y c fromequation 2, it is classified as belonging to the centroidclosest to it, with a confidence Kc that corresponds tothe Euclidean distance between the new point and thecentroid. In this way, data that are out of the trainingdistribution are far from any of the trained centres ofgravity and the confidence for classification is low.

The network architecture we now use to classifythe experimental neutron scattering data is the sameone demonstrated in the previous section to reliablypredict the correct exchange model with the validationdatasets, and is depicted in Fig 3 (a). The onlydifference is that the vector at the 64 node layer is thenused as the input to the DUQ analysis. This CNN wastrained for 100 epochs on data prepared as describedin the Training Data section.

Networks were trained on the simulated datasetsfor both the Goodenough and dimer models, generatedby the resolution convolution methods describedearlier: one using the fast but approximate methodusing a pre-computed grid of covariance matrices(Mpre

Θ ), and the other using the more expensive butaccurate Monte Carlo integration (MMC

Θ ) method. Wethen use these networks to classify the background-subtracted experimental spectrum. The DUQnetworks yields a correlation value between 0 and 1 toindicate the distance in hyperspace between the weightvector associated with the input and the output classes.

The network trained on pre-computed resolu-tion data Mpre

Θ gives an output corresponding to[Kdimer,KGoodenough] of [0.73, 0.49] , showing that thenetwork correlates this input more with the dimermodel than with the Goodenough model. This clas-sification is incorrect; moreover, the correlation valuessuggest the the network does not distinguish betweenthe two models with great certainty and the experi-mental data lie out of distribution of the training data.

To demonstrate this point further we havecalculated the correlation values KGoodenough that thenetwork assigns to the Goodenough category for everyspectrum in the training set for Mpre

Θ . The correlationvalues are presented in Figure 5, demonstrating clearlythat on data corresponding to the training distributionthe network will overwhelmingly predict values veryclose to 1.0 for the (ground truth) Goodenoughclass and close to 0.0 for the dimer class. Thisfurther emphasises that the result from Mpre

Θ on thebackground subtracted experimental data indicatesthat these data are not within the training distribution.

In contrast, when the network trained onMonte Carlo resolution-convolved data MMC

Θ wastested with the background subtracted experimentalspectrum, the output corresponding to the classes

Page 9: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 9

Figure 5. The correlation values Kc for the c = Goodenoughcategory assigned by the Mpre

Θ DUQ classifier. The plot is ahistogram of values of the Goodenough class output given by theneural network on the full training datasets of spectra generatedusing the Goodenough (orange) and dimer (green) models (3322spectra each).

[Kdimer,KGoodenough] was [0.05, 0.99]. The MMCΘ

network not only predicts the correct result, but doesso with a high degree of confidence, indicating thatthe training data with the Monte Carlo resolutionconvolution method covers a distribution encapsulatingthe experimental data with the background subtracted.

This result highlights two important points. Firstthe inclusion of accurate representations of instrumentand experiment resolutions is critical if one wishes totrain a neural network on simulated data and later toapply it to analyse experimental results. Second havinga method to quantify the confidence of the machinelearning is extremely important. In a standard CNNwith logistic regression rather than DUQ the Mpre

Θ

would have classified the spectrum as belonging tothe dimer class, with no indication that there wasuncertainty in this classification or that the data wereout of distribution.

To further demonstrate the utility of the DUQmethod for identifying out of distribution data we havetested the MMC

Θ network on some “Franken-data”.The Franken-data are generated by stitching togetherspectra from several different materials measured withthe same instrument settings (and hence instrumentresolution) as the PCSMO data. The resultingimage, constructed to have the same dimensionsas the PCSMO data, looks like a reasonable INSspectrum at first glance. The MMC

Θ network givescorrelation values of [0.35, 0.85], where the closenessof the two values is indicative of a sample outside ofthe training distribution, indicating the utility of DUQfor identifying cases where the classification should notbe trusted.

Y c

Representation LearningClassification head

FFF

FFFF

f

f

f

f

f

Y c

Y c

Figure 6. Class activation map: The input initially passesthrough a series of convolutions (orange block) to extract

features. The final layer of filters (Fjxy summed to give fj) are

fed into the classification head as the global average of each filterto do classification (red block). To build an activation map, wetake the cross product of the filter weights with the activationlinking the average of that filter to the class of interest. For asingle image we take the average of all of these cross productsto build a map of why that particular class was activated,equation 5.

4.3. Why does the network predict what it does?

We have established that a CNN is capable of per-forming the inverse problem of inferring the magneticexchange model from the simulated spectrum. How-ever, a highly pertinent question is: can the networktell us what influenced this decision? If we can addressthis question we can gain confidence in the network,and can also use it in a predictive fashion, so that inadvance of analysing data the CNN could be used towork out what are the most important regions for at-tention in the data. Moreover, knowing what in thedata influenced the decision is what helps yields scien-tific insight.

To ask the network which are the importantregions for making a distinction we employ classactivation maps (CAMs) [46, 47]. CAMs relate regionsof an input signal to the output classification. Themechanism for producing a CAM is outlined in Figure6. In this example, the final convolutional layer isdirectly connected to the classification head (a densefully-connected layer), so the extent to which a filterF j

xy is responsible for a classification Y c is determined

by the weight wcj connecting the jth filter to the cth

class. The activation map is then obtained by:

Acxy =

1

nf

j=nf∑

j=1

wcjF j

xy . (4)

That is, the image Acxy is the average (denoted

〈. . .〉 in Figure 6) of the nf filters weighted by theircontributions to the classification c.

This approach however, cannot work with theclassification architecture we introduced in section 4.1

Page 10: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 10

where there is an intervening fully connected layer be-tween the final convolutional layer and the classifi-cation head, because it requires the weights wc

j im-

plying that the jth filter directly impact the class c.Thus, we use the Grad-CAM [47] method where in-stead of the weights wc

j , the gradients dY c

dfj are used

where f j =∑

xy F jxy is the sum of all pixels (global

pool) of a filter. These gradients can be calculatedthrough any number of intermediate layers betweenthe last convolutional layer and the classification headusing backpropagation. Thus, the activation map weused is given by:

Acx,y =

1

nf

j=nf∑

j=1

dY c

df jF j

xy . (5)

Applying this Grad-CAM to the original architec-ture, depicted in Figure 3 (a), yielded a class activa-tion map which does not reveal any activation patternswhich correspond to physical intuition. Thus, whilstthe extra complexity from the intermediate denselyconnected layer between the final convolution layer andthe classification head allows the classification networkto be flexible and to produce an accurate determinationbetween the Goodenough and dimer models, it also ob-scures the relationships between the spatial inputs ofthe model and the outputs.

We next build a simpler network architecture(Figure 3 (b)), in this case rather than flattening thefinal convolutional layer we take the global averageof each of the filters and make an input vector ofsize 32 (one for each filter) and connect this directlyto the classification head of the network. Thismeans that the filters now directly contribute to theclassification and the more activated a filter is, themore it contributes. However, when presented withthe experimental spectrum this network classifies withvery little certainty, as the new network is less flexibleand less robust to experimental noise.

To overcome the sensitivity to experimental noisewe build a variational autoencoder (VAE) to clean thesignal [48]. Previous work has shown the applicationof autoencoders for cleaning diffuse (i.e. energyintegrated) neutron scattering signals [11]. We builda VAE with two convolutional layers in the encoder,with strided convolutions in order to compress theimage size - the architecture is shown in Figure 3 (c).At the “bottle-neck” or “latent” layer we have 20units. The network is then trained on 3500 examplesof simulated dimer and Goodenough datasets (1750of each) with Monte Carlo resolution convolution, thetraining is run for 100 epochs to maximise the evidencelower bound (ELBO) loss[48, ?]. For a VAE thenetwork trains to reproduce the input, so no labels arerequired. ELBO balances a loss on the reconstruction(how similar output is to input) with the divergence

Ei=25 35 50 70 100 140 Data

0.10-0.16 Ei

0.16-0.22 Ei

0.22-0.28 Ei

0.28-0.34 Ei

0.34-0.40 Ei

0.40-0.46 Ei

0.46-0.52 Ei

0.52-0.58 Ei

0.58-0.64 Ei

0.64-0.70 Ei

Ei=25 35 50 70 100 140 VAE cleaned

0.0 0.1 0.2 0.3 0.4 0.50.0 0.1 0.2 0.3 0.4 0.5

Ei=25 35 50 70 100 140 CAM

0.10-0.16 Ei

0.16-0.22 Ei

0.22-0.28 Ei

0.28-0.34 Ei

0.34-0.40 Ei

0.40-0.46 Ei

0.46-0.52 Ei

0.52-0.58 Ei

0.58-0.64 Ei

0.64-0.70 Ei

Ei=25 35 50 70 100 140 Mean CAM

0.02 0.04 0.06 0.080.2 0.4 0.6 0.8

Figure 7. Upper: the effect of cleaning the experimentalspectrum with a variational autoencoder, left is the experimentalspectrum, right is the same spectrum passed through the trainedautoencoder. Lower: Class activation maps. Left shows the classactivation map for the autoencoder cleaned signal, right we havetaken the mean of the activations per slice of the spectrum andplotted these to show which slices contributed most strongly tothe classification of Goodenough.

between the inferred distribution in the latent spaceand the prior distribution (normal distribution withmean 0 and variance 1) of the latent space. Becausethe data pass through the reduced dimensionality ofthe latent layer the information content is necessarilyreduced and only the stronger features of the initialsignal are retained. In this way noise can be removedfrom the original, the result of cleaning the signal withthe autoencoder is shown in Figure 7. The full neuralnetwork architecture and training routine are availablein the repository associated with this paper [34].

Figure 7 shows the experimental spectrum, withthe background subtracted (top left) and passedthrough a cleaning autoencoder (top right), above a

Page 11: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 11

class activation map (bottom left), which highlightsthe regions of this spectrum that were important forthe classification as a Goodenough structure. Theneural network classifier now correctly infers thatthe input signal corresponds to the Goodenoughmodel, with the class scores [0, 1] corresponding to[Y dimer, Y Goodenough]. Looking at the activation maps,in particular the averages per slice of the input (bottomright), the classification is most heavily influenced bythe signal in the slice corresponding to 35.6 to 40.4meV in the spectrum collected at Ei = 70 meV. Thisis the region at the top of the lower spin wave bandand is precisely the same region as was used in theoriginal paper to make the distinction. The activationmap also has higher input from this same region atthe top of the lower band in the spectra collected atEi = 100 and 140 meV, however this is not as strongas the contribution from the Ei = 70 meV spectrum.

These results demonstrate the importance ofbalancing complexity versus interpretability, when oneis interested in understanding why a network providesthe results that it does. In the case of the highlyaccurate network, which could take into accountexperimental noise we are able to obtain the correctresult, however the layers of abstraction within thenetwork make it difficult to explain these results inan interpretable way. When we construct a simplernetwork we have to remove some of the experimentalnoise in order to make it work properly, however theresult can be interpreted in a way which correspondsto human understanding. This is an example ofthe well-known trade-off between interpretability andcompleteness of explanations [49].

We have implemented the CAM networks in aninteractive jupyter notebook [34]. In the notebookboth the networks described above can be loaded andtested on the data before and after cleaning with theautoencoder. The notebook also features a tool forzooming in on slices of energy transfer space to look atthe difference between dimer and Goodenough modelspectra.

5. Future challenges

While we have addressed the classification problem inthis work, another particular area of interest whichwe have not covered is the use of NNs for regressionto obtain a set of exchange parameters for a givenmodel and given measured data. With conventionalfitting it might take a researcher some months (work inpractice sometimes spread over years) to obtain a set ofexchange parameters for a given model and measureddataset depending on the complexity of the data. Thisis due to the need first for a careful examination ofthe data for diagnostic regions and then a exhaustive

search of the parameter space. Despite the requiredtraining time (perhaps a few weeks on a cluster), NNsoffer a way to obtain these exchange parameter valueson much shorter timescales.

The challenge in applying these functionalities(classification and regression) more generally aretwofold. First, for users to take full advantage of thenew methods, the workflows must be relatively easyto use and this will involve chaining together severaldisparate software packages in several languages. Atthe ISIS Neutron and Muon Source this will beaddressed by the PACE project which aims to bringtogether data analysis tools for INS [50].

Second, as we noted in the introduction, one keyfeature of single-crystal INS data from modern spec-trometers is their large size. The Pr(Ca,Sr)2Mn2O7

dataset explored in this work is two orders of magni-tude smaller than those typical of full 4D wave vector- energy transfer space maps of excitations, and yetwe had to reduce it further to a 400×240 pixel im-age for input to the neural network (the full data con-sist of 32 million (Q, ~ω) bins although many of thesewill have zero counts). Thus INS data pose almostthe opposite problem to that often tackled by machinelearning, in that other applications can rely on a largenumber of datasets with each dataset being relativelysmall in size; for INS there are typically few experimen-tal datasets, necessitating the use of synthetic data fortraining, but each dataset is almost unmanageable insize.

New techniques will have to be developed toaddress this challenge, and it is likely that machinelearning methods will play a role. For example,although the total number of data bins is large, in manycases a significant fraction of these contain scatteringthat is not of relevance, for example, a slowly varyingfunction of wave vector and energy transfer that formsa “background” underlying what is known a priori tobe sharply defined set of excitations of interest, suchas the spin waves in the present work. If a reliableway can be found to identify, that is, segment this”background”, then we can treat only the signal fromthe non-”background” data. This signal might beamenable to Fourier analysis or to treatment usingsum rules, or may be transformed into a compressedrepresentation which might be more appropriate forinput to a neural network.

6. Conclusion

We have applied deep neural networks to analyseINS data. This is the first time (to our knowledge)that NNs have been applied directly for INS dataanalysis. We demonstrated that a CNN architecturecould be trained to classify spin Hamiltonians based

Page 12: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 12

on simulated data. We then demonstrated howuncertainty quantification can be included in theclassification using the DUQ method. DUQ highlightsthat an accurate calculation of the experimentalresolution is required to train a network using syntheticdata that then works well on real data. DUQ wasalso able to detect when data from a different materialto the training data but measured with the samespectrometer settings was passed to the network.

We built class activation maps to determinewhich parts of the input spectrum are importantfor the network classification and found that whilemore complex networks can classify correctly withnoisy data, the results are not physically interpretable.Simpler (fewer layer) networks, however, yield moreinterpretable class activation maps at the cost ofrequiring cleaner data which had been passed throughan autoencoder. In this case, the maps showed thatthe most diagnostic feature of the measured spectrumis the region at the top of the lower excitation band,which accords with previous work [26].

We believe that this work demonstrates howtrustable, interpretable ML can be a powerful tool inthe analysis of INS data. We have highlighted futuredirections that can be explored to further harnessthe potential of ML in this field. We hope thatthis work is an early step in using ML methods notjust for processing, but for better understanding INSexperiments.

Acknowledgements

Experiments at ISIS were supported by a beamtimeallocation RB1310483 from the Science and Tech-nology Facilities Council. TGP thanks co-authorsA.T.Boothroyd and D.Prabhakaran of ref. [26] for per-mission to use the datasets from those experiments.This work was partially supported by Wave 1 of TheUKRI Strategic Priorities Fund under the EPSRCGrant EP/T001569/1, particularly the “AI for Sci-ence” theme within that grant and The Alan TuringInstitute. The simulated datasets were generated us-ing computing resources provided by STFC ScientificComputing Department’s SCARF cluster.

Data Access Statement

All of the training data, trained neural networks andcode for generating the training data for this studyare openly available at https://zenodo.org/record/4270057#.X65xr1BpHIU.

A git repository containing the code usedto build and train the neural networks, as wellas notebooks to recreate the DUQ and CAMexperiments is available at https://github.com/

keeeto/interpretable-ml-neutron-spectroscopy

The software environment required to run thecodes in the git repository is available in the formof docker images from https://hub.docker.com/u/

mducle with instructions in the github repository.Additionally there are Conda environment filesprovided in the repository.

Author Contributions

KTB, MDL and TGP jointly conceived, planned andsteered the project. KTB built, trained and applied theneural networks; MDL produced the simulated trainingdatasets; TGP provided the experimental data. KTB,MDL and TGP wrote the manuscript together inan iterative fashion to ensure the contents presentneural networks and related methods in a mannerthat is accessible and useful to the condensed matterphysics community. JT was involved in conception andestablishing of the project and facilitated the work inthe paper.

[1] Hey T, Tansley S, Tolle K et al. 2009 Thefourth paradigm: data-intensive scientific dis-covery vol 1 (Microsoft Research Redmond,WA) ISBN 978-0-9825442-0-4 URL https:

//www.microsoft.com/en-us/research/publication/

fourth-paradigm-data-intensive-scientific-discovery/

[2] Agrawal A and Choudhary A 2016 Apl Materials 4 053208[3] Carrasquilla J and Melko R G 2017 Nature Physics 13 431–

434[4] Butler K T, Davies D W, Cartwright H, Isayev O and Walsh

A 2018 Nature 559 547–555[5] Bednik G 2019 Physical Review B 100 184414[6] Morita K, Davies D W, Butler K T and Walsh A 2020 arXiv

preprint arXiv:2005.05831[7] Hey T, Butler K, Jackson S and Thiyagalingam J 2020

Philosophical Transactions of the Royal Society A 37820190054

[8] Doucet E, Brown T, Chowdhury P, Lister C, Morse C,Bender P and Rogers A 2020 Nuclear Instruments andMethods in Physics Research Section A: Accelerators,Spectrometers, Detectors and Associated Equipment 954161201

[9] Islam F, Lin J Y Y, Archibald R, Abernathy D L, Al-Qasir I, Campbell A A, Stone M B and Granroth G E2019 Rev. Sci. Instrum. 90 105109 (Preprint https:

//arxiv.org/abs/1906.09482) URL https://doi.org/

10.1063/1.5116147

[10] Hui Y and Liu Y 2019 Science and Information Conference(Springer) pp 257–271 URL https://arxiv.org/abs/

1710.05994

[11] Samarakoon A M, Barros K, Li Y W, Eisenbach M,Zhang Q, Ye F, Sharma V, Dun Z L, Zhou H, GrigeraS A, Batista C D and Tennant D A 2020 NatureCommunications 11(1) 892 URL https://doi.org/10.

1038/s41467-020-14660-y

[12] Archibald R K, Doucet M, Johnston T, Young S R, Yang Eand Heller W T 2020 Journal of Applied Crystallography53

[13] Demerdash O N, Shrestha U, Petridis L, Mitchell J C,Smith J and Ramanathan A 2019 Frontiers in molecularbiosciences 6 64

Page 13: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable, calibrated neural networks for analysis and understanding of inelastic neutron scattering data 13

[14] Qian X and Yang R 2018 Physical Review B 98 224108[15] Lovesey S 1984 Theory ofneutron scattering from condensed

matter (Clarendon Press, Oxford)[16] Squires G 1978 Introduction to the theory of thermal

neutron scattering (Cambridge University Press, Cam-bridge)

[17] Chen T, Chen Y, Kreisel A, Lu X, Schneidewind A, QiuY, Park J, Perring T G, Stewart J R, Cao H et al. 2019Nature materials 18 709

[18] Wang M, Zhang C, Lu X, Tan G, Luo H, Song Y, Wang M,Zhang X, Goremychkin E, Perring T et al. 2013 Naturecommunications 4 1–10

[19] Kieslich G, Skelton J M, Armstrong J, Wu Y, Wei F,Svane K L, Walsh A and Butler K T 2018 Chemistryof Materials 30 8782–8788

[20] Li X, Liu P F, Zhao E, Zhang Z, Guidi T, Le M D,Avdeev M, Ikeda K, Otomo T, Kofu M et al. 2020 NatureCommunications 11 1–9

[21] Bewley R, Taylor J and Bennington S 2011 Nuclear In-struments and Methods in Physics Research Section A:Accelerators, Spectrometers, Detectors and AssociatedEquipment 637 128 – 134 ISSN 0168-9002

[22] Abernathy D L, Stone M B, Loguillo M J, Lucas M S,Delaire O, Tang X, Lin J Y Y and Fultz B 2012 Reviewof Scientific Instruments 83 015114 URL https://doi.

org/10.1063/1.3680104

[23] Kajimoto R, Nakamura M, Inamura Y, Mizuno F, NakajimaK, Ohira-Kawamura S, Yokoo T, Nakatani T, MaruyamaR, Soyama K, Shibata K, Suzuya K, Sato S, Aizawa K,Arai M, Wakimoto S, Ishikado M, Shamoto S i, FujitaM, Hiraka H, Ohoyama K, Yamada K and Lee C H 2011Journal of the Physical Society of Japan 80 SB025 URLhttps://doi.org/10.1143/JPSJS.80SB.SB025

[24] Ollivier J and Mutka H 2011 Journal of the Physical Societyof Japan 80 SB025 ISSN 0031-9015

[25] Ewings R, Buts A, Le M, van Duijn J, BustinduyI and Perring T 2016 Nuclear Instruments andMethods in Physics Research Section A: Accelerators,Spectrometers, Detectors and Associated Equipment 834132 – 142 ISSN 0168-9002

[26] Johnstone G E, Perring T G, Sikora O, PrabhakaranD and Boothroyd A T 2012 Phys. Rev. Lett.109(23) 237202 URL https://link.aps.org/doi/10.

1103/PhysRevLett.109.237202

[27] Tokura Y 2006 Reports on Progress in Physics 69797–851 URL https://doi.org/10.1088%2F0034-4885%

2F69%2F3%2Fr06

[28] Goodenough J B 1955 Phys. Rev. 100(2) 564–573 URLhttps://link.aps.org/doi/10.1103/PhysRev.100.564

[29] Kanamori J 1959 Journal of Physics and Chem-istry of Solids 10 87 – 98 ISSN 0022-3697 URLhttp://www.sciencedirect.com/science/article/pii/

0022369759900617

[30] Daoud-Aladine A, Rodrıguez-Carvajal J, Pinsard-GaudartL, Fernandez-Dıaz M T and Revcolevschi A 2002 Phys.Rev. Lett. 89(9) 097205 URL https://link.aps.org/

doi/10.1103/PhysRevLett.89.097205

[31] Toth S and Lake B 2015 Journal of Physics: CondensedMatter 27 166002

[32] Perring T G 1991 High energy magnetic excitations inhexagonal cobalt Ph.D. thesis University of Cambridge

[33] Tucker G S 2020 https://brille.github.io/stable/

index.html

[34] Butler K T and Le M D 2020 https://github.com/keeeto/

interpretable-ml-neutron-spectroscopy

[35] LeCun Y, Boser B, Denker J S, Henderson D, Howard R E,Hubbard W and Jackel L D 1989 Neural computation 1541–551

[36] Fukushima K and Miyake S 1982 Competition andcooperation in neural nets (Springer) pp 267–285

[37] Krizhevsky A, Sutskever I and Hinton G E 2012 Advancesin neural information processing systems pp 1097–1105

[38] Xie S, Girshick R, Dollar P, Tu Z and He K 2017 Proceedingsof the IEEE conference on computer vision and patternrecognition pp 1492–1500

[39] Deng J, Dong W, Socher R, Li L J, Li K and Fei-Fei L 20092009 IEEE conference on computer vision and patternrecognition (Ieee) pp 248–255

[40] van Amersfoort J, Smith L, Teh Y W and Gal Y 2020International Conference on Machine Learning URLhttps://arxiv.org/abs/2003.02037

[41] Neal R M 2012 Bayesian learning for neural networks vol118 (Springer Science & Business Media)

[42] Gal Y and Ghahramani Z 2016 international conference onmachine learning pp 1050–1059

[43] Lakshminarayanan B, Pritzel A and Blundell C 2017Advances in neural information processing systems pp6402–6413

[44] Kong L, Sun J and Zhang C 2020 arXiv preprintarXiv:2008.10546

[45] Farquhar S, Osborne M A and Gal Y 2020 stat 1050 7[46] Zhou B, Khosla A, Lapedriza A, Oliva A and Tor-

ralba A 2016 Proceedings of the IEEE Confer-ence on Computer Vision and Pattern Recognition(CVPR) URL https://www.cv-foundation.org/

openaccess/content_cvpr_2016/html/Zhou_Learning_

Deep_Features_CVPR_2016_paper.html

[47] Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh Dand Batra D 2017 Proceedings of the IEEE internationalconference on computer vision pp 618–626

[48] Kingma D P and Welling M 2013 arXiv preprintarXiv:1312.6114 URL https://arxiv.org/abs/1312.

6114

[49] Gilpin L H, Bau D, Yuan B Z, Bajwa A, Specter M andKagal L 2018 2018 IEEE 5th International Conferenceon data science and advanced analytics (DSAA) (IEEE)pp 80–89

[50] Buts A, Battam N, Ewings R A, Fair R L, Jackson A, LeM D, Marooney C, Perring T G, Saunders H and TuckerG S 2020 PACE: Proper Analysis of Coherent Excitationshttps://github.com/pace-neutrons

Page 14: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

Interpretable neural networks for analysis and understanding of neutron spectra

Keith T. ButlerSciML, Scientific Computing Department, Rutherford Appleton Laboratory, Harwell, OX110QX∗

Manh Duc Le and Toby G. PerringISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell, OX110QX

(Dated: November 23, 2020)

arX

iv:2

011.

0458

4v2

[co

nd-m

at.m

trl-

sci]

20

Nov

202

0

Page 15: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

2

MODEL TRAINING

Page 16: Interpretable, calibrated neural networks for analysis and ...z email: keith.butler@stfc.ac.uk arXiv:2011.04584v1 [cond-mat.mtrl-sci] 9 Nov 2020 Interpretable, calibrated neural networks

3

FIG. 1. Upper: the training curve for the classification model trained on the data with pre-computed resolution functions.Lower: the training curve for the discrimination model trained on Monte Carlo calculated resolution functions.