Improved Calibration for Inductively Coupled Plasma-Atomic Emission Spectrometry Using Generalized Regression Neural Networks

Improved Calibration for Inductively Coupled Plasma-Atomic Emission Spectrometry Using Generalized Regression Neural Networks

M I G U E L C A T A S U S , W A Y N E B R A N A G H , and ERIC D. SALIN* Institute of Materials and Reagents for Electronics, University of Havana, Cuba (M.C.}; and Department of Chemistry, McGill University, Montreal, Quebec, H3A 2K6, Canada (W.B., E. D. S.)

Artificial neural networks have been recently used in different fields of science in applications ranging from pattern recognition to semi-quantitative analysis. In this work, two types of neural networks were applied to the problems of spectral interferences, matrix effects, and the mea- surement drift in ICP-AES. Their performance was compared to that of the more conventional technique of multiple linear regressions (MLR). The two types of neural networks examined were "traditional" multilayer perceptron neural networks and generalized regression neural networks (GRNNs). The GRNN is comparable to, or better than, MLR for modeling spectral interferences and matrix effects covering several orders of magnitude. In the case of an Fe spectral interference on Zn, the GRNN reduced the error from 81% to 24%, while MLR reduced the average error to only 49%. For matrix effects caused by large backgrounds of Mg (0-I0,000 ppm) on Zn, average error was reduced to 55% from 67%. In the case of combinations of spectral overlaps and matrix effects, the GRNN reduced average error by approximately 10%. MLR performed poorly on systems involving matrix effects. GRNN is also a very promising tool for the correction of drift caused by fluctuations in power levels, reducing drift over a two-hour period from 2.3% to 0.6%. GRNNs, both by themselves and in multinetwork combinations, seem to be highly promising for the correction of nonlinear matrix effects and long-term signal drift in ICP-AES.

Index Headings: Calibration; Neural networks; Drift; ICP-AES.

INTRODUCTION

Artificial neural networks (ANNs), simple computer- based models of biological neurons, have been recently used for many purposes in different fields of science and technology, l Neural networks have found applications in various aspects of chemistry, 2 yet very little work has been published dealing with neural networks and analytical methods for atomic spectrometry. Glick and Hieftje 3 utilized neural nets to classify 37 Ni-based alloys and 15 Fe- based alloys according to their elemental composition. In this case, a two-layer feed-forward ANN was trained with the standard back-propagation algorithm 4 to distinguish between alloys using percent composition for seven main elemental components. Bos and Weber 5 applied ANNs trained with back propagation for calibration in X-ray fluorescence spectrometry. Schierle et al. 6 studied bi-di- rectional associative memory 7 for qualitative and semi- quantitative analysis in inductively coupled plasma- atomic emission spectroscopy (ICP-AES), while Schierle and Otto s compared two-layer perceptron ANNs to multiple linear regression (MLR) in ICP-AES. Scanned spectra of two interfering lines were used as input data for both the neural network and the MLR. Good agreement

Received 25 July 1994; accepted 20 February 1995. * Author to whom correspondence should be sent.

between the two approaches was found. With the excep- tion of the approach used in Ref. 3, the ANNs were used to model response functions in which more than one independent variable was present.

In ICP-AES three effects are always of concern when analytical procedures are being developed: spectral interferences, matrix effects, and the drift of the measurements. Neural nets seem to be a good approach for overcoming these problems since these effects are highly variable and a robust and flexible technique is necessary. In this paper, ANNs were studied for overcoming the fol- lowing:

1. Spectral interferences due to the overlapping of two spectral lines.

2. Matrix effects due to the presence of an alkaline-earth element (Mg).

3. The combination of the above two effects. 4. Drift in the analytical signal over time.

These problems were studied with the use of a type of neural network called a generalized regression neural network (GRNN). The type of data required by the network to train properly was also examined. The results generated by the network are compared to those generated by MLR and "standard" multilayer perceptron networks.

BACKGROUND

Multilayer Perceptron Neural Networks (MLPs). As stated earlier, ANNs are simple, computer-based models of biological neurons. The "atomic unit" of these networks is the neuron. In the standard ANN, each of these artificial neurons consists of a series of inputs, an internal activation function, and an output (Fig. 1). Along with each of the inputs to the network (I,), there is an associated connection weight (Wo) representing the "strength of connection". Each input is multiplied by this weight on its way into the neuron. All these weighted inputs are summed and passed through the activation function. The activation function can be considered to be the processing that a neuron does, and it determines what the output of a neuron will be for a particular set of inputs.

While a sigmoid-type function (Fig. 2) is the most com- monly used activation function, it is not the only one that can be used. Two other common firing functions are the linear and threshold functions (Fig. 2). The linear function can be used as a normalizing stage, and therefore it is often used in the input layer of the network, while the threshold function either "fires" or does not fire (output = 0 or 1), depending on the neuron's input values. For

798 Volume 49, Number 6, 1995 0003-7028/95/4906-079852.00/0 APPLIED SPECTROSCOPY © 1995 Society for Applied Spectroscopy

Inputs

12

In "W'n Neuron

Connection Weights

FIG. l. Artificial neuron.

Output

this reason, it is often used in pattern classification schemes in which one is trying to determine which discrete class a particular set of inputs corresponds to. The output of the activation function is the output of the neuron. In the case of multilayer networks, the neuron output can be used as input into other neurons.

In part of our work, a network consisting of two layers of neurons is used. Inputs are presented at the input layer and their outputs are fed to "hidden"-layer neurons. These neurons' outputs in turn are passed to the output neurons. This network is shown in Fig. 3. The input layer is not actually a layer of neurons but simply serves to distribute the inputs to each of the hidden layer's nodes.

Initially ANNs do not "know" how to perform their task. They must be trained. During training, a set of examples is presented to the network, and the connection weights between neurons are changed progressively until the network is capable of correctly performing the desired task with the training set. At this point, the network is presented with a test set of different examples. If the network has been trained and is performing the desired task correctly, then the network will be able to perform its task on the until-now-unseen test set. The most common forward-feed networks are usually trained with the use of the back-propagation algorithm." The supervised training algorithm can be explained as follows: Initially all the connection weights are set to small random values ( -0 .3 to 0.3). One presents a training example to the network and observes the resulting outputs of the network. If the outputs are what they should be, i.e., the actual outputs of the net match the correct outputs for the training example, then the connection weights are not changed. Usually this is not the case, and the actual outputs differ greatly from what they should be. Then the connection weights are adjusted and "learning" takes

Linear Threshold

X2 1 / - -1 Xl

I

x<=X1, y =-1 x>=X2, y =1 X1 <x<X2, y =mx

FIG. 2.

-1 Xl

x<=Xl, y =-1 x>Xl, y =1

Sigmoid

1

/ -1

y = l / ( l + e x p ( - x ) )

Typical threshold functions.

Input Hidden Output Layer Layer Layer

FIG. 3. Two-layer forward-feed multilayer perceptron.

place. One starts with the output-layer nodes and adjusts the connection weights between the output nodes and the hidden-layer nodes in order to minimize the difference between the actual and expected outputs.

wo,.(n + 1)= w.b(n) + ~a.x,, + e~(w.b(n) - w,,b(n - 1)) (1)

where w,,(n) is the connection weight between a neuron in layer a and a neuron in layer b at time n; x, is the output of node a; a:, is the error associated with neuron b's output; u is the learning rate; and o~ is the momentum.

These differences are then propagated back through the network with connection weights being changed layer by layer until the input layer is reached. This whole procedure is repeated again and again with each of the training examples until the desired level of training has been achieved. Computationally, this process is expensive for large networks.

With a forward-feed neural network, there are a number of parameters that can be adjusted in order to optimize the rate at which the network learns. If incorrect values for these parameters are chosen, the network may take a long time to train or may not train at all. The most important parameters include the size of the network and, for back-propagation training, the training momentum and the learning rate.

In the networks utilized here, the neurons were fully connected, i.e., the outputs of one layer were connected to each of the neurons in the layer in front of it. If there exist many more neurons than are needed, the network may never train properly? Similar problems exist for the momentum and the learning rate parameters. The learning rate is a measure of the degree to which each training example will affect the connection weights during each adjustment. The larger the learning rate, the more influence each example will have on the adjustment of the connection weights and the faster the network will learn. Typical values range from 0.15 to 0.35. If the learning rate is too large, the network will "bounce" around from example to example instead of training on the trends present in the data. The momentum term, as its name suggests, gives the network a bit of momentum as it learns. It is entirely possible that an erroneous value may be present in the training set, and anomalous examples should not have as great an influence on the network as valid examples. If the weights are being adjusted in a particular

APPLIED SPECTROSCOPY 799

Input Pattern Summation Output Layer Layer Layer Layer

FIG. 4. Generalized regression neural network.

manner, then it is likely that in the near future they will be adjusted in a similar way. A momentum term helps to ensure this pattern and also helps to minimize some of the "bounce" of the network. Typical values for the momentum term, if it is included, are from 0.70 to 0.90.

Generalized Regression Neural Networks. While considerable work has been done with MLPs trained with the use of the back-propagation algorithm, their long training times can be a serious disadvantage. The generalized regression neural network ~o,1~ is a novel type of neural network. First, a G R N N is an approximation network that performs a regression on the input data and will try to converge to the underlying regression surface.

In standard regression analysis, one must assume a form for the model used, for example, whether the data are linear or not. If the relationship in the data is not of the form assumed by the model, poor agreement between the actual values and the predicted values will occur. A G R N N makes no assumption about a model's form since it determines the appropriate form from the observed data. 1o The operation of the network is similar to the ANN described above with a training set, a validation set, and highly interconnected neurons each with inputs, outputs, and internal processing. The differences between GRNNs and MLPs lie in the training process, the interconnection weights, and the activation functions used.

The basic layout for a G R N N is shown in Fig. 4. There are four layers for each GRNN: the input layer, pattern layer, summation layer, and output layer. The input and pattern layers are fully connected with one pattern neuron present for each example present in the training set. The connections between each input node and each pattern layer node are made as in most neural networks, to distribute the input data to each second-layer neuron. In an MLP, weights are changed through the training process. In a GRNN, the weights are set only once with the examples in the training set. The incoming connection weights for the pattern layer correspond to examples pres-

TABLE I. Selected lines for interference studies.

Wavelength Interference Line (nm) line (nm) Concentration

Zn(I) 213.856" Fe(II) 213.859 0.05-5 ppm Fe(I) 259.940 -- 10-1000 ppm Mg(I) 279.553 -- 0-10 mg/mL

"The detection limit for Zn at this line, on the spectrometer used, is ~5 ppb.

TABLE II. Selected lines for drift correction study.

Excitation Ionization Line Wavelength (nm) potential (eV) potential (eV)

Ni(II) 231.60 6.39 7.63 Pb(II) 220.35 7.37 7.41

ent in the training set. Each pattern neuron's connection weights correspond to an example present in the training set.

The processing of each pattern neuron is also different from that of a traditional artificial neuron. The incoming "signal" or value is subtracted from the corresponding connection weight, and the squares of these differences are taken over all the connections:

Dj = ~ I(m,j - Io)Zl (2) i - I

where W,j is the connection weight between input neuron i and pattern neuron j, and I o is the incoming signal to neuron j, from neuron i. Then D r is fed into the nonlinear activation function:

J(Dj) = e x p ( - D j / Z a c O (3)

where ac is a smoothing constant. Finally, J(Dj) is output to the summation layer. The larger the value of % the smoother the resulting regression curve will be and the greater the generalization between the known points. Conversely, the smaller ~c is, the more detailed the final curve will be and the more closely the net will match the training data. Here some compromise must occur. One must balance accuracy against the number of samples used.

As illustrated in Fig. 4, the summation and pattern layers are fully connected. The summation layer consists of two types of neurons, "A" and "B", with one "A" neuron present for each training example. In the work discussed here, all the "B" weights were set to 1.0, ren- dering them effectively unimportant for this exercise. The "B" weights can be used to cluster examples if there are too many examples in the training set to include a node for each one. In this case, each "A" neuron would rep- resent a cluster, and the "B" neuron's weight is a measure of the number of examples that belong to the cluster. ~° The weights between the summation-layer "A" units and the pattern layer are the desired output values for each sample in the training set. For example, the weight from the first pattern-layer neuron to the first summation-layer "A" neuron is the desired output associated with the first training example. The second weight on the "A" neuron is the desired output associated with the second training example and so on.

Processing for summation layer neurons is very simple. The dot product is performed between the two weights and the output signals from the pattern-layer units and is passed on to the output layer, the final layer in the GRNN. Output neurons divide the dot products from an "A" neuron by the output of the "B" neuron. There is obviously one output neuron for each output desired. For a step-by-step implementation of the G R N N and assign- ment of the weights, refer to Caudill's article. ~

GRNNs have several attractive features. As stated

800 Volume 49, Number 6, 1995

above, in standard regression analysis, one must assume the form of the model. No such assumption needs to be made when this network is used. It is also simple to implement in software or hardware. ~° They are also attractive when compared to other types of neural networks. Training is accomplished for the GRNN with a single pass through the training set, while MLPs may require many thousands of exposures to the training set in order to converge to a solution. In addition, for MLPs, many hundreds of training samples covering the response space are required for the net to "learn". GRNNs on the other hand, will provide smooth transitions from one observed value to another, even with only sparse training data being supplied for a multidimensional space. Still, there is an advantage in including as many points as possible since, as the number of training examples increases, the optimal regression surface is obtained. 10 MLPs may also converge to a local minima in the process of their training, leading to a poor solution. GRNNs do not suffer from this problem.

TABLE III. Sample set: Zn/Fe system.

Standard Zn cone. Fe Cone. number (ppm) (ppm) Zn signal Fe signal

1 0.00 10 0.007 2.427 2 0.00 50 0.018 9.570 3 0.00 100 0.033 17.542 4 0.00 500 O. 158 81.070 5 0.05 0 0.050 0.018 6 0.20 0 0.202 0.019 7 0.50 0 0.888 0.150 8 2.00 0 2.407 0.015 9 5.00 0 5.665 0.018

10 0.05 10 0.056 2.429 11 0.50 100 0.952 17.627 12 5.00 500 5.675 81.210 13 0.30 300 0.404 50.150 14 3.00 30 4.054 6.690 15 0.06 1000 0.379 148.000 16 0.10 60 0.128 10.980 17 0.10 80 0.142 13.790 18 0.60 400 0.777 64.919 19 1.00 30 1,464 6.530 20 4.00 500 5.003 79.680

EXPERIMENTAL

A Thermo Jarrell Ash ICAP-61 direct-reading spectrometer was used on all of the experimental measurements. Solutions were introduced by a peristaltic pump connected to a L6g~re V-groove nebulizer and spray chamber system. ~2 In the first two studies, the spectral interference effect of Fe on Zn and the matrix effect of Mg on Zn were examined. In the third study, a combination of the effects from the first two studies was investigated. The selected lines and concentration ranges are presented in Table I. Note that Mg is present here in very high concentrations (0 to 10%). In order to measure the Mg concentration, the sensitivity of the Mg channel in the spectrometer was reduced by increasing the size of the integration capacitor to 0.1 #F.

Standard solutions for Zn and Mg were prepared with Spex Industries ICP Standards. Magnesium nitrate and ferric chloride (Analar grade) from BDH were used for Mg and Fe. All standards were prepared with 2% nitric acid. All the measurements were blank and background corrected (two-sided background correction) unless oth- erwise indicated. Integration times were 5 s with flush times of 30 s.

In the final study of ANNs and calibration, application of both GRNNs and MLPs to the correction of signal drift was attempted, and the results were compared to those of a more conventional method, the parameter related internal standard method (PRISM). The same spectrometer and nebulizer were used. The selected lines are presented in Table II. Integration and flush times were 5 and 30 s, respectively. Twenty-one measurements of the blank and standard solutions, separated by 4 min, were carried out to study the signal drift. The total duration of the experiment was 2 h after an initial full warm-up of 1 h. Standard solutions were prepared with ICP standards from Spex Industries. All measurements were background corrected.

All neural network software was written in our lab with the use of Borland International's C+ + 3.1 for DOS and run on a 33-MHz 486 computer with 8 megabytes of memory.

RESULTS AND DISCUSSION

Training-Validation Strategy. In order to have a neural network generate a good solution of a given problem, training is an essential part of the process. Unfortunately, deciding on the size of the training set and which examples are required is still an empirical process. The size of the training set can vary dramatically from 32,000 for the classification of mass spectra to only a few for simpler systems. 2 Two considerations are of interest in this respect. If neural networks are to be of practical application in time-consuming experiments, a vast number of sample runs cannot be required to generate a training set, yet the network must still do its job. Second, it is also interesting to consider how the neural network learning process takes place so that the proper "type" of training examples can be chosen. In an attempt to approach both problems, two sets of standards were prepared. These are shown in Ta- bles III and IV. The characteristics of the sets presented in Table III (Zn/Fe system) and Table IV (Zn/Mg system) are the same. In each of these data sets there are three "types" of standards present:

1. Type 1. Standards that contain only the analyte (Zn) or the interfering element (Fe or Mg): standards 1 to 9 for the Zn/Fe system (Table III) and standards 1 to 10 for the Zn/Mg system (Table IV).

2. Type 2. Standards that contain both analyte and in- terferent in random combinations but at the concentrations of Type 1 standards: standards 10 to 12 for the Zn/Fe system (Table III) and standards 11 to 16 for the Zn/Mg system (Table IV).

3. Type 3. Standards that contain both analyte and in- terferent in random combinations different than those of the type 1 standards: standards 13 to 20 for the Zn/ Fe system (Table III) and standards 17 to 24 for the Zn/Mg system (Table IV).

For the Zn/Fe/Mg system, because of its increased complexity, additional standards were added, consisting of random combinations of Zn/Fe and Zn/Mg in the same concentrations as the Type 1 standards. From these sets


TABLE IV. Sample set: Zn/Mg system.

Stan- dard Zn conc. Mg conc.

number (ppm) (mg/mL) Zn signal Mg signal

1 0.00 1.0 0.066 122.9 2 0.00 2.0 0.086 148.6 3 0.00 4.0 0.113 170.2 4 0.00 6.0 0.132 180.4 5 0.00 8.0 0.154 186.4 6 0.00 10.0 0.165 189.7 7 0.05 0.0 0.052 0.041 8 0.50 0.0 0.520 0.028 9 2.00 0.0 2.430 0.032

10 5.00 0.0 6.321 0.026 11 0.05 6.0 0.173 179.9 12 0.05 2.0 0.143 148.2 13 0.20 4.0 0.283 169.7 14 2.00 6.0 1.994 179.8 15 5.00 10.0 0.157 124.8 16 5.00 8.0 0.207 176.6 17 0.30 3.0 0.369 162.6 18 0.40 2.0 0.466 149.8 19 0.60 9.0 0.616 187.7 20 1.00 7.0 0.924 183.4 21 3.00 4.0 3.239 171.0 22 4.00 0.5 4.868 95.4 23 0.08 4.0 4.594 189.7 24 0.10 5.0 4.761 185.2

o f s t a n d a r d s (Tab le V), the t r a i n i n g a n d v a l i d a t i o n sets were se lec ted .

S p e c t r a l I n t e r f e r e n c e s Z n / F e System. Description of the Effect. A s a c o n s e q u e n c e o f the c lose p r o x i m i t y in w a v e - length o f the Z n 2 1 3 . 8 5 6 - n m a n d F e 2 1 3 . 8 5 9 - n m l ines, the p re sence o f F e in a s a m p l e p r o d u c e s an inc rease in the Zn l ine i n t ens i t y tha t d e p e n d s on the Fe c o n c e n t r a t i o n . In Fig. 5, the effect o f the a d d e d Fe on the Zn s ignal is p r e s e n t e d a long w i th a pu re Zn c a l i b r a t i o n g raph .

Measurements and Results. T h e ab i l i t y o f the two types o f n e u r a l n e t w o r k s to co r rec t for th is effect was i nves t i - ga t ed a n d c o m p a r e d to a m o r e t r a d i t i o n a l m u l t i p l e l inea r r eg res s ion a p p r o a c h . F o r the G R N N s , the two i n p u t s con- s i s t ed o f the Zn a n d the F e s ignals a n d the o u t p u t con- s i s t ed o f the c o r r e c t e d Z n c o n c e n t r a t i o n . F o r the M L P n e t w o r k , i n p u t s c o n s i s t e d o f n o r m a l i z e d Zn a n d nor -

.m o9

1 E3

Effect of Fe on Zn Signal for 1000 ppm Fe

1 E2

1E1

1E0 0.01 0.1 1 1

Zn Concentration

600 5oo 400

~300 o~ :200 ~100 0

Pure Zn Signal -~- Fe affected Zn signal ~- % Deviation from Pure Zn

FIG. 5. Effect of Fe on Zn signal for 1000 ppm Fe.

TABLE V. Sample set: Zn/Fe/Mg system.

Mg c o n c .

Standard Zn conc. Fe conc. (mg/ Zn Fe Mg number (ppm) (ppm) mL) signal signal signal

1 0.000 0 10 0.004 0.001 190.700 2 0.000 0 8 0.006 0.001 187.900 3 0.000 0 6 0.004 0.005 183.800 4 0.000 0 4 0.003 0.001 175.500 5 0.000 0 2 0.004 0.001 155.200 6 0.000 0 1 0.003 0.001 127.600 7 0.000 1000 0 0.385 175.000 0.005 8 0.000 500 0 0.212 98.600 0.005 9 0.000 10 0 0.009 2.040 0.004

10 0.000 1000 10 0.250 113.700 190.700 11 0.000 500 8 0.142 66.000 187.800 12 0.000 250 6 0.152 70.900 184.200 13 0.000 150 4 0.044 21.000 175.500 14 0.000 50 2 0.020 8.800 155.100 15 0.000 40 6 0.017 7.460 184.200 16 0.050 0 0 0.065 0.004 0.007 17 0.200 0 0 0.251 0.003 0.008 18 0.500 0 0 0.627 0.002 0.006 19 2.000 0 0 2.889 0.001 0.002 20 5.000 0 0 7.560 0.001 0.003 21 0.050 0 2 0.061 0.004 155.200 22 0.200 0 4 0.501 0.003 175.500 23 0.500 0 5 0.564 0.002 184.200 24 0.500 0 2 2.166 0.004 155.200 25 2.000 0 8 0.093 0.001 187.800 26 5.000 0 10 0.072 0.001 190.900 27 0.050 100 6 0.093 19.680 184.200 28 0.050 50 2 0.072 8.600 156.100 29 0.200 10 4 0.212 1.557 175.400 30 0.500 100 6 0.521 19.580 184.800 31 0.500 500 8 0.701 84.600 156.600 32 2.000 500 8 2.307 69.800 188.100 33 5.000 100 8 5.500 18.560 187.800 34 0.060 1000 7 0.376 130.000 187.200 35 0.100 80 5 0.127 12.060 182.600 36 0.100 60 1 0.137 10.940 131.500 37 0.900 800 3 1 . 1 6 8 116.400 170.600 38 1.000 300 3 1 . 5 1 8 58.300 172.100 39 3.000 900 10 3.560 110.600 191.300 40 3.000 30 5 3.660 4.600 182.700 41 4.000 40 9 4.290 5.370 190.400

m a l i z e d F e s ignals (i.e., the s ignals were d i v i d e d by a c o n s t a n t so they w o u l d h a v e a m i n i m u m va lue o f 1.0).

In ke e p ing wi th the a i m s o f th is work , the m i n i m u m n u m b e r o f t r a i n i n g e x a m p l e s was used in the t r a in ing set. S t a r t i ng w i th one r a n d o m l y chosen e x a m p l e , a t r a in ing set was g e n e r a t e d a n d used to t r a in each ne twork . I f the n e t w o r k c o u l d n o t s a t i s f ac to r i ly c lass i fy the tes t set (i.e., the res t o f the e x a m p l e s ) a f te r t r a in ing , t hen a n o t h e r r an - d o m e x a m p l e was a d d e d to the t r a in ing set. T h i s p rocess c o n t i n u e d un t i l the n e t w o r k c o u l d gene ra t e r e a s o n a b l e o u t p u t v a l u e s for the tes t set. E l even e x a m p l e s e v e n t u a l l y m a d e u p the t r a in ing set. T h e tes t set is l i s ted in T a b l e VI. In t e r m s o f m a t h e m a t i c a l o p e r a t i o n s , the A N N m u s t ca l cu la t e the ne t i n t ens i t y o f the Zn spec t ra l l ine a n d t r a n s f o r m it to the co r rec t c o n c e n t r a t i o n .

T h e u n c o r r e c t e d c o n c e n t r a t i o n va lues were b a s e d on a c a l i b r a t i o n g r a p h g e n e r a t e d b y p e r f o r m i n g a s i m p l e l i nea r r eg res s ion on Zn s t a n d a r d s wi th no Fe i n t e r f e r ence (s tan- d a r d s 5 to 9), a n d d i r e c t l y ca l cu la t ing the a p p a r e n t Zn c o n c e n t r a t i o n b a s e d on the Z n signal. T h e t r a i n i n g set was a l so used to p e r f o r m a m u l t i p l e l i nea r r eg ress ion a n d a p p l y i t to the tes t set to o b t a i n the M L R c o r r e c t e d con-


centrat ion values for the test set. All data in the MLR were equally weighted. Attempts to find the optimal training set for the M L R were not made.

The correction ability o f an MLP network with three hidden nodes is shown in Table VI. The MLP performed relatively poorly and overall did not interpolate all the values in the test set as accurately as the G R N N or the MLR, even with training sets containing almost all o f the examples and long training t imes of 10,000 cycles through the training set. The MLP consisted o f a single hidden layer with three to seven neurons, a m o m e n t u m value of 0.8, and a learning rate o f 0.3. A G R N N performed better and, overall, generated results comparable to or better than those generated by the MLR. A smoothing constant of 0.25 was used for the G R N N . I f the smoothing constant was increased or decreased, the outputs generated were not as accurate. As with all neural networks, the more examples that are available for training, the better the network should perform. I f a small smoothing constant is used (for example, Oc = 0.05) and all the standards are used except for four (13, 16, 19, and 20), then perfectly corrected results can be obtained for these remaining four standards. However , with such a small value used for the smoothing constant, the amount o f interpolation the network performs is negligible and would probably not model the effect accurately.

Investigations were carried out beyond the point o f simply correcting the spectral overlap. We were also in- terested in how the G R N N trained and what sort o f sample informat ion was needed to train the network. The G R N N performed poorly when presented with only Type 1 examples. Similarly, if only the Type 2 or " m i x e d " samples consisting of various concentrat ions o f Zn and Fe together were used as the training set, the network did not correctly identify the remaining standards. Satisfac- tory results were obtained only when the network was presented with a combinat ion o f these two types of examples. Fur thermore , it seems that the network also required that the training set contain mixed standards with examples that contain concentrat ions of components different f rom the concentrat ions used in Types 1 and 2 (Type 3 standard). As well, i f standards in the training set did not cover all the concentrat ion ranges, then the G R N N localized on a particular concentrat ion range, re- turning well-corrected values for test cases in that range and poorly corrected values for cases outside the 'range.

Adding training examples to the training set often had unusual results on the network's output. The effect on the outputs o f the network was not was smooth as would be expected. One would normal ly expect that, as standards were added to the training set, the corrected concentrations would more closely match the true concentrat ions and that this behavior would continue smoothly as the number o f samples added to the training set increased. This was not the case. In some instances, adding a standard caused the output of the net to swing drastically, leading to val idat ion standards that, previously interpolated correctly, were now being de termined completely incorrectly.

It was also observed that some standards could not be classified by interpolation. Omitt ing them from the training set meant that they would not be classified correctly. They had to be in the training set in order to be classified

TABLE VI. Zn/Fe system: corrected results using the GRNN, MLP, and MLR. a

GRNN True Uncor- corrected MLP MLR

Standard c o n e . rected cone. corrected corrected number (ppm) cone. (ppm) (~rc = 0.25) cone. cone.

2 0.00 0.006 0.067 0.080 0.006 6 0.20 0.176 0.093 0.071 0.179 7 0.50 0.774 0.369 0.204 0.785

10 0.05 0.049 0.064 0.058 0.047 11 0.50 0.829 0.360 0.260 0.824 15 0.06 0.330 0.064 0.295 0.188 16 0.10 0.112 0.082 0.083 0.102 17 0.10 0.124 0.089 0.069 0.112 19 1.00 1.275 0.957 0.546 1.288

Average error -- 81.0% 24.5% 8 4 . 1 % 49.3%

" For examples where the true concentration for Zn was not 0.0, the error was calculated as follows:

% Error = ,(1.0 - Corrected~me Z~o~c.Zn Conc.\.l) 00.

The average error reported is simply the average error of each of the examples where the true Zn concentration is nonzero.

correctly. Standards that were at the upper and lower extremes o f the concentrat ion ranges fell into this group, as did an occasional value in the middle o f a range. Since the net interpolates between points, i f sufficient informat ion about the end points is not present, interpolating near these end points will not work.

On the basis o f the hundreds of training sets examined, several comments for the selection o f training examples can be made about GRNNs:

1. It is necessary to include the three types o f standards (Types 1, 2, and 3) present in the training set over all o f the working concentrat ion ranges.

2. Some training values exist that change the average output o f the network.

3. Some standards cannot be interpolated with the training set used. They are not correctly identified unless they are included in the training set, even though other standards are present which have similar concentrations.

The first two comments are straightforward. For the training set size used, there were some " impor t an t " examples that affected the entire network. Using this training set size makes determining a priori the standards that need to be included in the training set difficult. These cases will likely be different for different systems and will need to be manually determined and included for each system investigated. Having many examples covering the range o f interest is a sensible alternative.

Applying the G R N N lowered the average error o f analysis for the test set f rom 81.0% to 24.5%. This result was considerably better than an average error o f 84.1% for correct ion with an MLP or 49.3% when MLR was used, indicating the potential o f the G R N N .

M a t r i x Effect: Z n / M g System. Description of the Effect. In Fig. 6, the variat ion o f the Zn line intensity due to the Mg signal is illustrated, t3 In this study, samples with up to 10 m g /m L of Mg were used. Even very low levels o f impurit ies will have quite a considerable effect at this level. This system is also more complex than the Zn/Fe


140 120 100

c~ 80

60 N 40

20 0

0.01

Effect of Mg on Zn Signal for 10% Mg

2 0 0

o 1+0 lOO ++

. . . . . . . . . . -0

, , , , , , , , i , , , , , , ~ t , , , , , , + ~ - - 5 0

0.1 1 0 Zn Concentration

• Pure Zn Signal +-Zn with Mg Effect] Max % Error I

FIo. 6. Effect of Mg on Zn signal for 10% Mg.

system because two different effects are present simulta- neously when signals are used that are not background corrected. The first effect is the apparent increasing of the Zn line intensity due to stray light. The second effect is the decreasing of the line intensity due to cooling of the plasma by the Mg. The superposition of these two factors finally determines the observed Zn line intensity.

Measurements and Results. Again the performances of an MLP and the G R N N were examined for the ability to correct the above effects. For the GRNNs, the two inputs consisted of the Zn and the Mg signals and the output consisted of the corrected Zn concentration. For the MLPs, inputs consisted of normalized Zn and normalized Mg signals and the output was a normalized Zn concentration.

In order to test the ability of the G R N N in this situ- ation, the minimum number of examples was again included in the training set to enable the network to return a reasonable value for the Zn concentration. The MLP network was the same as that used in the Zn/Fe study with the same number of input and output nodes. The background-uncorrected concentrations were obtained as described previously. The corrected concentrations for the MLR, the MLP, and the G R N N are tabulated in Table

TABLE VII. Zn/Mg system: corrected results using GRNN, MLP, and MLR.

Uncor- rected GRNN

True experi- corrected MLP MLR Standard conc. mental conc. corrected corrected number (ppm) eonc. (ppm) (,to = 0.1) conc. conc.

2 0.00 0.068 0.127 0.195 0.216 4 0.00 0.105 0.014 0.218 0.286 5 0.00 0.123 0.012 0.229 0.311

11 0.05 0.138 0.030 0.236 0.321 12 0.05 0.114 0.112 0.218 0.265 13 0.20 0.225 0.123 0.286 0.405 23 0.08 0.125 0.010 0.220 0.254 24 0.10 0.165 0.028 0.251 0.347 18 0.40 0.371 0.295 0.384 0.544 16 5.00 3.792 4.999 4.450 4.266

Average error -- 67.0% 55.5% 156.0% 227.1%

C o m b i n e d M g / F e E f f e c t o n Z n S i g n a l

for 10% Mg and 1000 ppm F e

1 E3

1 E2

1E1

1EO

2 5 0

0.01

OOo+oooO,+++o ~ + ~ 1 ~ ~ ~ J ~ J I ~ ~ ~ J ~ - 5 0

0.1 1 10 Zn Concentration

+ Pure Zn Signal +-Zn with Mg and Fe Effects Max % Error

Fio. 7. Combined effect of Mg and Fe on Zn signal.

VII. Again, the G R N N performed better than the MLR in correcting for the matrix effects. Given the wide concentration ranges that are being used here, the agreement is quite good. The example that contains a higher concentration of Zn, number 16, agrees extremely well with the true concentration. The MLP network performed worse than the G R N N in all but one case, but always better than the MLR.

Applying the G R N N lowered the average error of analysis for the test set from 67.0% to 55.5%. This result was better than an average error of 156.0% for correction with an MLP or 227.1% when MLR was used.

Combined Spectral Interference and Matrix Effect-- Zn/Fe /Mg System. Description of the Effect. Three factors affect the Zn line intensity in these experiments: Mg sup- pression and Fe and Mg spectral overlaps. There is a further complication because the Fe line intensity is also affected by the Mg matrix. The net effect on the Zn signal for the maximum concentration of Mg alone with a pure Zn signal is presented in Fig. 7. It is important to realize that the data presented in this figure present only those values acquired with a 10% Mg effect. The full data set is much larger (Table V).

Blank Correction

U n c o r r e c ed Blank Slgl

Blank Signal

Net Signal of Int. / / Standard A / Net Signal of Int Standard B

Analyte Sl

Drlft Correctlon

FIG. 8. Dual neural net for correction of drift.

Ied ~al


Measurements and Results. Inputs to the GRNNs consisted of Zn, Fe, and Mg signals. The estimated Zn concentration was generated at the output node. Inputs to the MLPs consisted of normalized values for each of the three signals, and a normalized Zn concentration was generated at the output.

With a relatively small training set of 18 examples, the GRNN and MLP network outperformed MLR in the majority of test cases (19 out of 23 test cases) (Table VIII) and in those cases where MLR interpolates better than the neural networks (standards 17, 30-32), it does not interpolate much better. Applying the GRNN lowered the average error of analysis for the test set from 58.2% to 48.6%. This result was better than an average error of 91.8% for correction with an MLP or 201.6% when MLR was used. As with the previous studies, a combination of the three types of examples was needed to get reasonable interpolations. Agreement between the GRNN corrected concentrations and the true concentrations was again best at the upper part of the Zn concentration range. Because of the stronger influence of effects at lower analyte concentrations, there is a high degree of complexity on the measured response surface, and more points are required to model the response surface correctly.

Correction of Drift. Description of the Effect. For ICP- AES, instrumental drift can cause serious degradation in long-term precision. Attempts to correct this limitation have been based on the concept of internal standardization. One such technique is the parameter related internal standardization method. 14 The PRISM is based on the observation that the majority of change in the instrumental signal for a given element in time can be traced to two parameters: the forward power and the sample introduction efficiency. Since each of these parameters will affect ionic and atomic lines differently, t4 using one internal standard for each of these parameters and mea- suring the effect that different operating conditions have on these standards, one can obtain the functions that correlate the changes in the analytes with changes in these standards:

KiA : (r~/r',- 1)/(rA/r'A- 1) (4)

w h e r e r ' n is the net response of internal standard A at the standard operating conditions; rA is the net response of the internal standard A, at any moment during the analysis; r', is the net response of the analyte I at the standard operating conditions; and r, is the net response of the analyte I at any moment during the process of measurements.

K~A is the measure of the change the analyte, L and internal standard, A, are undergoing at some point in time with respect to standard conditions. If internal standard A follows the variations of the forward power and internal standard B follows the variations of the sample uptake rate, the drift-corrected net response (R~c) is given by:

R,c=R1/((1 + KzA(RA/R°A- 1))(1 + Km(RB/R°s- 1)) (5)

where Rm is the net corrected analyte response; Ri is the potentially drifting analyte signal at time t; R°A is the response of the internal standard A at standard operating conditions; R. is the response of the internal standard A at time t; R°B is the response of the internal standard B

TABLE VIII. Zn/Mg/Fe system: corrected rseults using GRNN, MLP, and MLR.

Uncor- GRNN True Zn rected corrected MLP MLR

Standard conc. Zn conc. conc. corrected corrected number (ppm) (ppm) (ac = 0.1) conc. conc.

2 0.00 0.004 0.056 0.241 0.315 4 0.00 0.002 0.072 0.224 0.292 5 0.00 0.003 0.157 0.203 0.259

10 0.00 0.167 0.095 0.002 0.337 11 0.00 0.095 0.083 0.013 0.321 12 0.00 0.101 0.077 0.010 0.315 13 0.00 0.029 0.066 0.090 0.292 15 0.00 0.011 0.052 0.171 0.306 17 0.20 0.418 0.182 0.122 0.190 21 0.05 0.143 0.123 0.215 0.302 22 0.20 0.334 0.074 0.280 0.452 27 0.05 0.062 0.065 0.107 0.345 28 0.05 0.048 0.094 0.149 0.299 29 0.20 0.141 0.125 0.262 0.447 30 0.50 0.347 0.213 0.171 0.67 31 0.50 0.467 0.271 0.009 0.665 32 2.00 1.538 2.095 0.144 1.953 34 0.06 0.251 0.156 0.001 0.403 35 0.10 0.085 0.072 0.155 0.38 36 0.10 0.091 0.053 0.126 0.305 38 1.00 1.012 1.049 0.086 1.347 39 3.00 2.373 3.016 0.129 2.846 41 4.00 2.860 3.722 4.069 3.55

Average error -- 58.2% 48.6% 91.8% 201.6%

at standard operating conditions; R, is the response of the internal standard B at time t; KiA is the factor determined by Eq. 4 for internal standard A; and Km is the factor determined by Eq. 4 for internal standard B.

Measurements and Results. For the drift study, Ni was selected as the analyte while Pb was selected as an internal standard. The Pb and Ni lines selected are shown in Table II. Twenty-one measurements of the blank and the standard solutions separated by 4 min were carried out to study the drift of the signal. The total duration of the experiment was 2 h after a warm-up period of 1 h. Stan- dard solutions were prepared with ICP standards from Spex Industries. All measurements were background corrected with the use of a two-sided background correction.

Ionic lines were found to be more sensitive than atomic lines to changes in the forward power. The sample introduction efficiency affected the ionic and atomic lines in a similar fashion, producing a constant ratio and leading to the conclusion that the nebulizer efficiency did not contribute considerably to the drift on this instrument.

Two independent mathematical operations were im- plemented with the use of ANNs: blank subtraction and drift correction of the net signal. This procedure was accomplished with a two-stage neural network (Fig. 8). For the blank-correction stage, a single-layer perceptron network consisting of two input nodes and a single output node was trained with the use of synthetic data to perform a subtraction. After training, when presented with the analytical signal at one input and the blank signal at the other, this stage output the blank subtracted signal. The synthetic training data simply consisted of generated inputs in the appropriate range and their differences as outputs. A number of different networks were examined for possible use in the second stage of the drift correction.


T A B L E IX. Drift measurements and corrections.

Original uncor- Corrected

Standard rected with number signal PRISM

Corrected Corrected with two- with two- Corrected layer MLP layer MLP with GRNN (sigmoid) (linear) (~r~ = 0.3)

l 0.702 0.703 0.703 0.704 0.706 2 0.690 0.712 0.713 0.713 0.706 3 0.696 0.720 0.721 0.721 0.706 4 0.688 0.705 0.706 0.706 0.708 5 0.681 0.706 0.707 0.707 0.706 6 0.678 0.711 0.712 0.712 0.706 7 0.683 0.705 0.710 0.710 0.706 8 0.678 0.710 0.710 0.711 0.706 9 0.678 0.707 0.708 0.708 0.706

10 0.678 0.707 0.707 0.708 0.706 11 0.675 0.719 0.720 0.720 0.706 12 0.671 0.717 0.718 0.718 0.707 13 0.663 0.711 0.712 0.712 0.708 14 0.675 0.723 0.725 0.723 0.707 15 0.671 0.727 0.728 0.726 0.710 16 0.658 0.717 0.717 0.717 0.715 17 0.662 0.724 0.725 0.724 0.716 18 0.661 0.730 0.731 0.729 0.717 19 0.645 0.718 0.718 0.717 0.717 20 0.654 0.726 0.726 0.725 0.717 21 0.640 0.717 0.717 0.716 0.717

RSD 2.3% 1.1% 1.1% 1.0% 0.6%

MLPs and the G R N N were both considered for this purpose. This goal was achieved with a training set of five examples for the MLPs (measurements 1-5) and six examples (measurements 1-5, 21) for the GRNN. MLPs with two hidden nodes, a momentum value of 0.9, and a learning rate of 0.3 were used. Two types of activation functions were tested for these networks, the sigmoid function and the linear function. Both gave comparable results (Table IX). The two inputs for all of the networks consisted of the blank-corrected signal and the internal standard signal. Since the internal standard signal used to follow nebulizer efficiency didn't change, it was not included. The output consisted of the drift-corrected signal.

Over the 2-h experiment the signal showed moderate drift (Fig. 9) with an overall RSD of 2.3%. When the PRISM model was applied, the % RSD decreased to 1.1%. The % RSD value for the GRNN, the MLP with sigmoidal activation functions, and the MLP with linear activation functions were 0.6, 1.1, and 1.0. All of these were comparable or superior to results for the original PRISM method, with the G R N N providing the best perfor- m a n c e - a factor of four improvement.

GENERAL CONCLUSIONS

According to our results, the generalized regression neural network is comparable to, or better than, multiple linear regression for modeling spectral interferences and matrix effects covering several orders of magnitude. It is also a very promising tool for the correction of drift due to fluctuations in the forward power levels. Applications of these networks could lead to considerable improve- ments in accuracy.

A general point must be made here. The application of neural networks, both in the type and the arrangement of the network, is very closely related to the particular system it is being applied to. In the present study, good

0.74

0.72

"~ 0.7

0.68

0.66

0.64

Drift Correction Using GRNNs

Time (2 hours)

Uncorrected signal -~ PRISM corrected - - Linear MLP Corrected ~- GRNN corrected

FIG. 9. Drift correction using GRNNs.

results were obtained for the systems examined. In all cases, the training set included the minimum number of points necessary to have good agreement with the selected validation set and still, in the case of GRNNs, provide some interpolation capability.

Considering the complicated characteristics of the systems examined, especially in the Zn/Fe/Mg system, it was not surprising that it was necessary to include training examples from the entire range in order to get good results. The GRNN's output could not accommodate behavior that it had not seen before.

For dealing with spectral interferences, several mathematical methods have been developed. Methods have also been developed for dealing with the correction of matrix effects. ~4,~s It is out of the scope of this paper to discuss in detail each of these methods, but it is important to underline two advantages of the present approach to solving these problems. First, it is simple to implement. Second, using small networks as the "building blocks" of a problem solution is an exciting possibility. When problems are separable, a solution could be built up by con- necting separately trained neural networks, each handling a different, relatively simple problem aspect, into a large neural network system that could handle very complex systems. In the event that the effects are not completely separable, this "building block" approach could lead to difficulties. For instance, in the case where there was a component present in the background that was correlated either with the measured analyte or with drift, performing an initial background subtraction step could filter out necessary data for the network, leading to poorer results.

At this stage of the work, GRNNs both by themselves and in multinetwork combinations seem to be highly promising for the correction of nonlinear matrix effects and long-term signal drift in ICP-AES.

ACKNOWLEDGMENTS

The authors gratefully acknowledge financial support from the Natural Sciences and Engineering Research Council of Canada and the Ontario Ministry of the Environment and Energy. M.C. would like to acknowledge support from a Seagrams/McGill Cuban Fellowship.

1. J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory


of Neural Net Computation (Addison-Wesley, Redwood City, Cal- ifornia, 1991).

2. J. Zupan and J. Gasteiger, Anal. Chim. Aeta 248, 1 (1991). 3. M. Glick and G. Hieftje, Appl. Spectrosc. 45, 1706 (1991). 4. J. L. MeClelland and D. Rumelhart, Explorations in Parallel Dis-

tributed Processing: A Handbook of Models, Programs, and Exer- cises (MIT Press, Cambridge, 1988).

5. M. Bos and H. T. Weber, Anal. Chim. Acta 247, 97 (1991). 6. C. Schierle, M. Otto, and W. Wegscheider, Fresenius J. Anal. Chem.

343, 561 (1992). 7. B. Kosko, Appl. Opt. 26, 4947 (1987).

8. C. Schierle and M. Otto, Fresenius J. Anal. Chem. 344, 190 (1992). 9. P. Winston, lntroduction to ArtificiaI Intelligence (Addison-Wesley,

Redwood City, California, 1992), 3rd ed. 10. D. Specht, IEEE Trans. Neural Networks 2, 568 (1991). l l . M . Caudill, AI Expert 5, 28 (1993). 12. G. L6g~re and P. Burgener, ICP Inform. Newsl. 11, 447 (1985). 13. N. Kovacic, B. Budic, and V. Mudnick, J. Anal. At. Spectrom. 4,

33 (1989). 14. M. Thompson and M. Ramsey, J. Anal. At. Spectrom. 5, 701 (1990). 15. M. Thompson and M. Ramsey, Analyst 110, 1413 (1985).


Documents

Improved Calibration for Inductively Coupled Plasma-Atomic Emission Spectrometry Using Generalized Regression Neural Networks