4
Nuclide identification algorithm based on K–L transform and neural networks Liang Chen , Yi-Xiang Wei Key Laboratory of Particle & Radiation Imaging (Tsinghua University), Department of Engineering Physics, Tsinghua University, Ministry of Education, China article info Article history: Received 24 June 2008 Received in revised form 11 September 2008 Accepted 20 September 2008 Available online 14 October 2008 Keywords: K–L transform Neural network Nuclide identification Linear associative memory ADALINE abstract Traditional spectrum analysis algorithm based on peak search is hard to deal with complex overlapped peaks, especially in bad resolution and high background conditions. This paper described a new nuclide identification method based on the Karhunen–Loeve transform (K–L transform) and artificial neural networks. By the K–L transform and feature extraction, the nuclide gamma spectrum was compacted. The K–L transform coefficients were used as the neural network’s input. The linear associative memory and ADALINE were discussed. Lots of experiments and tests showed that the method was credible and practical, especially suitable for fast nuclide identification. & 2008 Elsevier B.V. All rights reserved. 1. Introduction Artificial neural networks are widely used in pattern recogni- tion. Using artificial neural networks in gamma spectrum analysis was proposed in the early 1990s [1]. Unlike the classical methods, using artificial neural networks does not require searching, fitting and dealing with complex overlapped peaks. The spectrum is considered as a whole and the global shape is compared with stored patterns. Expert knowledge is not required, and human participation is not even necessary after the network is trained properly. Several papers related to this subject have been published in the years since artificial neural networks were introduced. Some of them used a multilayer linear perceptron network and back propagation algorithm [2,3]. Because of its bad fault-tolerance, lack of robustness, convergence to local minima and large computation, this algorithm is rarely used. Others have used optimal linear associative memory (OLAM) networks [1,2], which still are subject to computation and stability problems because the original or part of the original data is used as input. Current gamma spectrum analysis programs [4,5] still use the classical method based on peak search and matching. This research focuses on a nuclide identification algorithm used in portable radionuclide identification devices (RIDs) that meet the IAEA’s requirements [6]. RIDs required the rapid and accurate identification of 27 kinds of radionuclides. HGe detectors cannot be used in portable devices, so we choose a NaI detector. The peaks are usually overlapped due to bad resolution in the NaI detector spectrum, and it is hard to search for high background and great fluctuation. We used the Karhunen–Loeve transform (K–L transform) for extracting features from the gamma spectrum, followed by a neural network. A large amount of training and test data were obtained using a NaI detector (Hamamatsu Photonics K.K. CH201-03) and a multichannel analyzer (Canberra, DAS- 1000). The method has the advantages of speed, accuracy, wide tolerance and good robustness. 2. Feature extraction Feature extraction is a key step in pattern recognition, and the classical peak search algorithm is a specific kind of feature extraction. The features are used as the neural network’s input, so the features of different nuclides should be as different as possible, and the number of features should be as few as possible. Here we used K–L transform to extract the features. The K–L transform is a special orthogonal transform, and is mainly used in compacting data for 1D and 2D signals. The gamma spectrum can be treated as a wide stationary random vector. The vectors in this study were all column vectors. There covariance matrix is defined as C x ¼ Ex l x Þðx l x Þ T c 0;0 c 0;1 c 0;N1 c 1;0 c 1;1 c 1;N1 . . . . . . . . . . . . c N1;0 c N1;1 c N1;N1 2 6 6 6 6 6 4 3 7 7 7 7 7 5 (1) ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/nima Nuclear Instruments and Methods in Physics Research A 0168-9002/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.nima.2008.09.035 Corresponding author. Tel.: +8610 62784 529. E-mail address: [email protected] (L. Chen). Nuclear Instruments and Methods in Physics Research A 598 (2009) 450–453

Nuclide identification algorithm based on K–L transform and neural networks

Embed Size (px)

Citation preview

ARTICLE IN PRESS

Nuclear Instruments and Methods in Physics Research A 598 (2009) 450–453

Contents lists available at ScienceDirect

Nuclear Instruments and Methods inPhysics Research A

0168-90

doi:10.1

�Corr

E-m

journal homepage: www.elsevier.com/locate/nima

Nuclide identification algorithm based on K–L transform and neural networks

Liang Chen �, Yi-Xiang Wei

Key Laboratory of Particle & Radiation Imaging (Tsinghua University), Department of Engineering Physics, Tsinghua University, Ministry of Education, China

a r t i c l e i n f o

Article history:

Received 24 June 2008

Received in revised form

11 September 2008

Accepted 20 September 2008Available online 14 October 2008

Keywords:

K–L transform

Neural network

Nuclide identification

Linear associative memory

ADALINE

02/$ - see front matter & 2008 Elsevier B.V. A

016/j.nima.2008.09.035

esponding author. Tel.: +86 10 62784 529.

ail address: [email protected] (

a b s t r a c t

Traditional spectrum analysis algorithm based on peak search is hard to deal with complex overlapped

peaks, especially in bad resolution and high background conditions. This paper described a new nuclide

identification method based on the Karhunen–Loeve transform (K–L transform) and artificial neural

networks. By the K–L transform and feature extraction, the nuclide gamma spectrum was compacted.

The K–L transform coefficients were used as the neural network’s input. The linear associative memory

and ADALINE were discussed. Lots of experiments and tests showed that the method was credible and

practical, especially suitable for fast nuclide identification.

& 2008 Elsevier B.V. All rights reserved.

1. Introduction

Artificial neural networks are widely used in pattern recogni-tion. Using artificial neural networks in gamma spectrum analysiswas proposed in the early 1990s [1]. Unlike the classical methods,using artificial neural networks does not require searching, fittingand dealing with complex overlapped peaks. The spectrum isconsidered as a whole and the global shape is compared withstored patterns. Expert knowledge is not required, and humanparticipation is not even necessary after the network is trainedproperly.

Several papers related to this subject have been published inthe years since artificial neural networks were introduced. Someof them used a multilayer linear perceptron network and backpropagation algorithm [2,3]. Because of its bad fault-tolerance,lack of robustness, convergence to local minima and largecomputation, this algorithm is rarely used. Others have usedoptimal linear associative memory (OLAM) networks [1,2], whichstill are subject to computation and stability problems becausethe original or part of the original data is used as input. Currentgamma spectrum analysis programs [4,5] still use the classicalmethod based on peak search and matching.

This research focuses on a nuclide identification algorithmused in portable radionuclide identification devices (RIDs) thatmeet the IAEA’s requirements [6]. RIDs required the rapid andaccurate identification of 27 kinds of radionuclides. HGe detectorscannot be used in portable devices, so we choose a NaI detector.

ll rights reserved.

L. Chen).

The peaks are usually overlapped due to bad resolution in the NaIdetector spectrum, and it is hard to search for high backgroundand great fluctuation. We used the Karhunen–Loeve transform(K–L transform) for extracting features from the gamma spectrum,followed by a neural network. A large amount of training and testdata were obtained using a NaI detector (Hamamatsu PhotonicsK.K. CH201-03) and a multichannel analyzer (Canberra, DAS-1000). The method has the advantages of speed, accuracy, widetolerance and good robustness.

2. Feature extraction

Feature extraction is a key step in pattern recognition, and theclassical peak search algorithm is a specific kind of featureextraction. The features are used as the neural network’s input, sothe features of different nuclides should be as different aspossible, and the number of features should be as few as possible.Here we used K–L transform to extract the features.

The K–L transform is a special orthogonal transform, and ismainly used in compacting data for 1D and 2D signals. Thegamma spectrum can be treated as a wide stationary randomvector. The vectors in this study were all column vectors. Therecovariance matrix is defined as

Cx ¼ Efðx� lxÞðx� lxÞTg ¼

c0;0 c0;1 � � � c0;N�1

c1;0 c1;1 � � � c1;N�1

..

. ... ..

. ...

cN�1;0 cN�1;1 � � � cN�1;N�1

2666664

3777775 (1)

ARTICLE IN PRESS

L. Chen, Y.-X. Wei / Nuclear Instruments and Methods in Physics Research A 598 (2009) 450–453 451

where E{ � } is the average operator, lx=E{x} is the average vectorof signal x and the elements of Cx are given by

ci;j ¼ EfðxðiÞ � mxðiÞÞðxðjÞ � mxðjÞÞg ¼ cj;i (2)

The eigenvalues and corresponding eigenvectors of Cx are l0,l1,y,lN�1 and A0, A1,y,AN�1. By normalizing the eigenvectors weget a normalized orthogonal matrix A=[A0, A1,yAN�1]. Finally theK–L transform of the signal x can be expressed as y=ATx, where yis the K–L transform coefficient vector. If the eigenvalues aresorted in descending order, and only the m largest eigenvalues andtheir corresponding eigenvectors are reserved, we can obtain y0,the compact version of y. By recovering the original signal form y0,we can obtain the approximate version of x, x0=Ay0. It has beenproved that y0 reserves the maximum energy of the original signal[7], and the mean square error between x and x0, e=E{[x�x0]2} isminimized to the sum of the rejected eigenvalues. The K–Ltransform removes the correlation of the original signals, reversesthe maximum energy and minimizes the mean square error, so itis also called the optimized transform. In our nuclide identifica-tion algorithm, the m-dimensional vector y0 can be seen as the mfeatures of the gamma spectrum, and used as input for the neuralnetwork.

3. Neural network model

3.1. Linear associative memory network

A typical neural network is composed of an input layer, anoutput layer and sometimes one or more hidden layers. Each layerincludes neurons that are connected to all the neurons of asuccessive layer. There is a weight wij for each connection betweentwo neurons i and j. The weights are established by a trainingprocedure. The output is calculated summing of all the inputsweighted by corresponding elements of the weight matrix W, andit is then processed with a transfer function such as a linear orsigmoid function.

Fig. 1 shows the linear associative memory, a typical neuralnetwork processed with a linear transfer function and without ahidden layer, which is also the structure used in this paper.

The output of the network is given by

ai ¼ purelinXN

j¼1

wijpj

0@

1A ¼XN

j¼1

wijpj (3)

and the matrix form is given by

a ¼Wp (4)

Fig. 1. Structure of the linear associative memory network.

3.2. Hebb’s rule and its variations

The weight matrix W can be established by various kinds oftraining procedures and the Hebb rule is a basic one for a linearassociative memory network. According to the supervised Hebbrule [8], the weight matrix W is given as

W ¼ t1pT1 þ t2pT

2 þ � � � þ tQ pTQ ¼ TPT (5)

where p1, p2,y, pQ are the input vectors used for training and t1,t2,y, tQ are corresponding target vectors. As shown in Eqs. (4) and(5), when pk is input into the network, the output can becomputed as

a ¼Wpk ¼XQ

q¼1

tqpTq

!pk ¼

XQ

q¼1

tqðpTqpkÞ (6)

If the input vectors are orthogonal and normalized, Eq. (6) couldbe rewritten as a=Wpk=tk. The output of the network is equal tothe target.

If the input vectors are not orthogonal, the Hebb rule producessome errors. There are several procedures that can be used toreduce these errors. The core idea of variations of the Hebb rule isto minimize the difference between the network output andtarget vector. The mean square error can be expressed as:

FðWÞ ¼XQ

q¼1

ktq �Wpqk2 (7)

Usually the row number of pq (the number of features) is greaterthan the column number (the number of training samples). So thesolution of the square error problem is

W ¼ TPþ ¼ TðPTPÞ�1PT (8)

where P+ is the pseudoinverse of matrix P. Because of this Eq. (8)is also called the pseudoinverse rule.

Another solution to solve the non-orthogonal input vectorsproblem is to use the ADALINE and Least Mean Square (LMS)algorithm [8]. The ADALINE network is very similar to linearassociative memory, except that it has a bias vector. The output ofADALINE is given as

a ¼ purelin ðWpþ bÞ ¼Wpþ b (9)

Including the bias vector b as a column of the weight matrix W,and the bias input ‘‘1’’ as a component of the input vector, Eq. (9)could be rewritten as a=xz, where x=[W b] and z=[p 1]T. The meansquare error problem is

FðxÞ ¼ E½ðt� aÞ2� ¼ E½ðt� xzÞ2� ¼ E½t2� � 2xE½tz�

þ xE½zzT�xT ¼ c� 2xhþ xRxT (10)

The LMS algorithm uses an iteration method to search for thesolution x that makes F(x) minimum. From the steepest descentalgorithm with a learning rate constant, the iteration formula is

xkþ1 ¼ xk þ 2aeðkÞzðkÞT (11)

where k is the iteration step, a is the learning rate constant ande(k)=t(k)�a(k) is the difference between the desired and actualoutput of the network. It is also referred to as the delta rule or theWidrow–Hoff learning algorithm. The maximum learning rateconstant is ao2/lmax, where lmax is the largest eigenvalue ofmatrix R in Eq. (10).

If the input vectors are statistically independent, the expectedsolution would converge to x*=R�1h. If we set the bias vector band bias input ‘‘1’’ as zero, the convergence solution x* would beequivalent to Eq. (8). So the pseudoinverse and delta rules isessentially the same. The advantage of the delta rule is that it canupdate the weights after each new input pattern is presented,

ARTICLE IN PRESS

L. Chen, Y.-X. Wei / Nuclear Instruments and Methods in Physics Research A 598 (2009) 450–453452

whereas the pseudoinverse rule computes the weights in onestep after all of the input/target pairs are known. The sequentialupdating allows the ADALINE to adapt to a changing environment.

4. Nuclide identification

4.1. Train the networks

The selection of the training sample and the number offeatures has a great influence on the network performance. Therewere eight radioactive sources in our lab: 241Am, 133Ba, 60Co, 137Cs,152Eu, 226Ra, 232Th and natural uranium (NU). The activities rangefrom 0.3 to 10mCi. We measured twelve spectrums (1024channels) for each source and divided them into three groups:group 1 included two spectrums for a fluctuation error less than1%, group 2 included five measured for 1 min and group 3included five measured for 2 min. We chose one from group 1, twofrom group 2 and two from group 3. We used them for the K–Ltransform and reserved the 512 largest coefficients. About 90%energy was reserved in the first 50 coefficients, as shown in Fig. 2:

So we chose the first 64 features as the network input. Thepractical training steps were as follows:

(a)

TablPart

Data

Am2

Am2

Am2

Ba13

Ba13

Co60

Co60

Cs13

Cs13

Eu15

Eu15

Ra22

Ra22

Th23

Th23

NU_

NU_

NU+

Ra22

Choose the sample spectrums as mentioned above, subtract thenatural background, smooth and normalize the spectrums. Weused seven-point Gaussian window to smooth the spectrums.

0 100 200 300 400 500 600-1

-0.5

0

0.5

1

1.5K-L transform coeffients of Co60

Fig. 2. Typical K–L tran

e 1of the nuclide identification results using the linear associative memory and pseu

Nuclide

241Am 133Ba 60Co 137Cs

41_101 1 1.16E�09 �2.97E�11 1.46E�

41_203 1.0003 �0.0005 �0.0001 9.13E�

41_303 0.9996 0.0004 6.06E�05 4.28E�

3_203 0.0061 1.0008 0.0004 0.0013

3_303 �0.0029 0.9955 �0.0003 0.0002

_203 �0.0144 0.0394 0.9972 �0.0082

_303 0.0206 0.0294 1.0055 0.0039

7_203 �0.016 0.0325 0.0031 1.003

7_303 0.0094 0.0215 �0.0024 0.9939

2_203 �0.007 0.0129 �0.0016 0.0026

2_303 �0.0098 0.0081 0.0025 0.0039

6_203 0.0111 0.0119 �0.0042 �0.02

6_303 0.001 0.0105 8.97E�05 �0.001

2_203 �0.0051 0.03101 0.0013 �0.0035

2_303 0.0187 0.0342 �0.0016 0.013

203 0.0066 0.0093 0.0006 �0.0017

303 0.0004 0.0097 �0.0004 0.0002

Cs137 �0.0043 �0.0214 �0.0022 0.0962

6+Th232 �0.0234 �0.0123 �0.0002 �0.0054

(b)

0-4

-2

0

2

4

6

sform

doinv

11

05

05

7

Calculate the covariance matrix of the sample spectrum, andthe eigenvalues and eigenvectors of the matrix.

(c)

Reserve the 64 largest eigenvalues and their corres-ponding eigenvectors, and generate the feature matrixA1024 * 64.

(d)

Apply the K–L transform to the sample spectrums, whichmeans calculating the network training input: p64 * 1=ATx,where x1024 * 1 is the training spectrum.

(e)

Generate the input pattern {pi, ti}. The pattern is defined as241Am, 133Ba, 60Co, 137Cs, 152Eu, 226Ra, 232Th and NU. Forexample, if pi is the K–L coefficient vector of the 60Cospectrum, the third element of ti would be ‘‘1’’, and the otherelements would be ‘‘0’’.

(f)

Calculate the weight matrix W8 * 64 and bias vector b using thesupervised pseudoinverse rule or delta rule.

4.2. Test the performance of the networks

We use all the experiment spectrums to test the networkperformance. The test steps were as follows:

(a)

Subtract the natural background, and then smooth andnormalize the test spectrums.

(b)

Calculate the K–L transform of the test spectrums, whichmeans calculating the network input: p64 * 1=ATx, wherex1024 * 1 is the test data.

100 200 300 400 500 600

K-L transform coeffients of Ra226

coefficients.

erse rule

152Eu 226Ra 232Th NU

�1.30E�10 2.12E�10 7.55E�11 �1.88E�10

0.0001 0.0004 �5.96E�05 �5.19E�05

�0.0001 �0.0002 0.0002 �0.0002

�0.0015 0.0069 �0.0114 �0.0014

0.0125 �0.0044 �0.0002 �0.005

�0.0157 �0.0118 �0.0194 0.0523

�0.0258 �0.0119 �0.0136 0.0145

0.0128 �0.0416 0.0095 0.0057

�0.0192 �0.0014 0.0138 �0.0026

0.9915 �0.0249 0.02028 �0.0012

0.9951 �0.0112 0.0061 �0.0005

�0.0033 1.0127 0.0121 �0.0111

�0.0095 0.9755 0.0298 �0.0056

�0.0151 0.0217 1.0088 0.0226

�0.0306 �0.0152 0.9956 0.0023

�0.0018 �0.01 0.011 0.9921

�0.004 �0.0124 0.0123 0.9967

�0.0077 0.0149 0.0234 0.9365

0.044 0.3744 0.7276 �0.0145

ARTICLE IN PRESS

0 200 400 600 800 1000 12000

1000

2000

3000

4000

5000

6000

7000

Channel

Cou

nt

NU+Cs137

662keV from Cs137

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

12000

Channel

Cou

nt

Ra226+Th232

Fig. 3. Mixed nuclide spectrums for test.

L. Chen, Y.-X. Wei / Nuclear Instruments and Methods in Physics Research A 598 (2009) 450–453 453

(c)

Calculate the identification result, namely the output of thenetwork: R8 * 1=Wp=WATx or R8 * 1=Wp+b=WATx+b. The ele-ments of column vector R can be seen as the confidence of theexistence of a corresponding nuclide.

Note that WAT and b can be stored after the training process. Sothe main computation is just the product of matrix WAT

8�1024 andvector x1024 * 1 the computation is simple enough for a portabledevice.

5. Test results

We used both linear associative memory and ADALINE net-works to test the nuclide identification performance. In additionto the single nuclide, we also tested some mixed nuclidespectrums. To keep this article short, Table 1 only shows part ofthe test results.

The first column of Table 1 is the file name of the experimentaldata. The file name is composed of the nuclide name and anumber. The first digit of the number is the group number. Thefirst row of Table 1 is the data of 241Am, which was also used asthe training data. It shows that the confidence of 241Am is exactly‘‘1’’, and that of the other nuclides is almost ‘‘0’’. The other outputsof the training data give the same result. The outputs of the non-training data show that for the single nuclide data, theconfidences of the matched patterns range from 0.97 to 1.02,and those of the mismatched patterns range from �0.05 to 0.06.

The last two rows are mixed nuclide spectrums, and theconfidences of two matched patterns are also much greater thanthose of the mismatched patterns. The confidence of 137Cs in‘‘NU+Cs137’’ is smaller, because the activity of 137Cs is lowcompared to the NU activity, and the features of 137Cs are coveredup. The mixed nuclide spectrums are shown in Fig. 3.

6. Conclusion

Using the K–L transform coefficients of the original spectrumto train and test the neural networks, the linear associative

memory and ADALINE networks give the same result. There is aclear distinction between the confidences of the matched andmismatched nuclides. The mixed nuclide spectrum is a linearsuperposition of the component spectrums, and both networkshave a linear structure, so they can address mixed nuclideproblems.

We used different numbers of features and training samplesto test the network performance. In extreme cases, we reservedonly 32 features, and use two spectrums (from groups 1 and 2) ofeach nuclide for training, and there were little difference in theresults. Another aspect of the network performance is the outputof a never-trained pattern. We removed the 60Co and 152Euspectrums from the training sample, and then used them asinputs. The output vectors were very different from the trainedspectrums: the smallest element was smaller than �0.3 andusually smaller than �1.0, while that of the trained spectrums was�0.05. From the test results, it is easy to define a confidencethreshold for untrained, single nuclide and mixed nuclidespectrums. The single nuclide was the usual condition in ourapplication.

We can see that the neural networks have wide tolerance andgood robustness. This method is more stable than the traditionalmethod. By feature extraction based on K–L transform, thecomputation is limited at a very low level, so it is especiallysuitable for fast nuclide identification in portable devices.

References

[1] P. Olmos, J.C. Diaz, J.M. Perez, P. Gomez, V. Rodellar, P. Aguayo, A. Bru,G. Garcia-Belmonte, J.L. de Pablos, IEEE Trans. Nucl. Sci. 38 (4) (1991).

[2] P.E. Keller, L.J. Kangas, G.L. Troyer, S. Hashem, R.T. Kouzes, IEEE Trans. Nucl. Sci.42 (4) (1995).

[3] E. Yoshida, K. Shizuma, S. Endo, T. Oka, Nucl. Instr. and Meth. A 484 (2002) 557.[4] Genie-2000 Spectroscopy System Customization Tools, pp. 206–336.[5] SAMPO, Advanced Gamma Spectrum Analysis Software, Version 3.62, User’s

Guide, Version 1.1, pp. 51–61.[6] IAEA Nuclear Series No. 1, Technical and Functional Specification for

Border Monitoring Equipment, Technical Guidance, ISBN 92_0_100206_8, ISSN1816-937, pp. 39–76.

[7] G.-S. Hu, Digital Signal Processing, second ed., Tsinghua University Publication,2003, pp. 368–371.

[8] M.T. Hagan, H.B. Demuth, M. Beale, Neural Network Design, PWS PublishingCompany, 1996, pp. 7/1–7/14.