Quantitative comparison of bidirectional and optimal associative memories for background prediction of spectra

Chemometrics and intelligent laboratory systems

Chemometrics and Intelligent Laboratory Systems 29 (1995) 51-61

Quantitative comparison of bidirectional and optimal associative memories for background prediction of spectra

Bus010 Wa Wabuyele ‘, Peter de B. Harrington *

Ohio Uniwrsity Center for Intelligent Chemical Instrumentation, Department of Chemistry, Clippinger Laboratories Athens,

OH 45701-2979, USA

Received 23 June 1994; accepted 20 January 1995

Abstract

Quantitative comparisons of a bidirectional associative memory (BAM), a modified BAM and an optimal associative memory (OAM) neural network are presented for background prediction of infrared (IR) spectra. These memories were evaluated using 2 cm-’ resolution IR spectra. The efficacies of these methods were quantitatively evaluated using root mean square prediction errors of 100% transmittance lines. In all cases, the OAM performed superiorly to the BAMs. The OAM has no retrieval error, because it stores patterns that are orthogonal. Binary encoding of spectra is advocated for BAMs, because the stored patterns are approximately orthogonal. Once the number of grids is large enough to differentiate stored spectra, the dependence on the number of resolution elements disappears. The OAM is a technique that can be applied to any type of data as long as two conditions are satisfied: the background spectra and the sample spectra must have points of intersection and the signal variations in the sample need to be different from the background variations.

1. Introduction

Background correction is important in automated

analysis of analytical data. Quantitative analysis may require that background variations are removed. Sloping or fluctuating baselines may prevent accu- rate determination of both peak intensity and loca- tion. Characteristic information contained in minor variances may be obscured by baseline errors. Back- ground correction typically subtracts a reference

measurement, that is a blank, from a sample measurement. In experiments for which background measurements are not readily available, baseline fitting methods are useful. For complex backgrounds such as those obtained in infrared spectrometry (IR),

baseline fitting may be inappropriate. These tech- niques may identify, estimate, and remove unmod- eled background constituents from the recorded data before quantitation.

One such method may use a bidirectional associative memory (BAM) [l-6] neural network for predicting background spectra. BAMs consist of inter- connected processing elements (PEs) that simulate neurons. The PEs in a network may be considered as switches that turn their output connection on or off.

* Corresponding author. ’ Present address: Biocontrol Technology, Inc., 300 Indian

Springs Rd., Indiana, PA 15701, USA.

0169-7439/95/$09.50 0 1995 Elsevier Science B.V. All rights reserved

SSDI 0169-7439(95)00014-3

52 B. W. Wabuyele, P. de B. Harrington /Chemometrics and Intelligent Laboratory Systems 29 (1995) 51-61

In the switch paradigm, a binary representation is used and the output will be either 0 or 1. Another ar- tificial neural network paradigm is a series of inter- connected true or false logic elements. Most BAMs use a bipolar representation and the PEs will output - 1 or + 1. Bipolar representations are advantageous because outputs of opposite sign may counteract each other, which is analogous to the inhibitory excitation that occurs in biological neural networks. The work

in this study demonstrates that binary encoding is the better method for encoding spectra.

BAMs are composed of two layers that are inter- connected and the strengths of the connections are represented by a weight matrix W. BAMs are con- tent addressable memories (CAM) and are associative [7-121. For IR, a single beam background spectrum may be stored in a memory and a single beam sample spectrum may be used to predict a background spectrum from a CAM. A single beam spectrum, which may have been acquired with multiple scans, will be referred to as a scan. If the logarithm of a sample scan is subtracted from the logarithm of the background scan an absorbance spectrum is obtained.

A detailed description of the theory of an optimal associative memory (OAM) has been reported [13]. This paper presents a quantitative comparison of a BAM and an OAM network for background prediction of IR spectra.

2. BAM

The spectra presented to a BAM are first normal- ized to the largest intensity. This corrects for the total radiation throughput. During the encoding process the digitized spectra are converted to bipolar matri- ces using vectors of spectral intensities. This procedure involves overlaying a spectrum with a grid and assigning a value of + 1 to a pixel containing an intensity value and - 1 if it is empty. The vertical grids (u) represent the resolution elements and the horizontal grids (h) represent the spectral intensities.

When the input pattern is the same as the output pattern (i.e. a background spectrum) a BAM behaves as an autoassociative CAM. Information in autoassociative BAMs is stored in a weight matrix or correla-

tion matrix of the input-output pairs in the training set. Training a BAM is achieved by constructing this matrix. The weight matrix (W) is obtained by

w= &Jr (1)

for n is the of bipolar encoded column vectors (y,) that are to be stored in the memory.

After W has been constructed, the BAM network

is capable of predicting a background by a recursive process referred to as resonance. The process is initi- ated by a sample scan spectrum z, that will recall a predicted background spectrum. For autoassociative memories the process is simply

Z a+ 1 = f(WTz,) (2) for which z, and z,+l are bipolarly encoded column vectors of input patterns at the ath and (a + 1)th states. This process terminates when z,, , converges

to z,. The function in Eq. (2) is a nonlinear transfer

function that is a hard limiter function. The function is applied to the vector product of WTz,. The hard limiter assigns positive product scores a value of + 1 and negative product scores a value of - 1. In the case of binary encoding the positive scores are assigned a value of + 1 while the negative scores are

assigned a value of zero. The nonlinear transfer function is required to compress the vector length of z, so that it does not diverge during recursion.

For high resolution data, the size of W may ex-

ceed the storage capacity of many computers. By de- composing W into its factored form Y, the amount of memory may be reduced. Resonance is achieved by

Z Cl+1 = f(YY TZ,) (3) for which z, and z,+l are the predicted backgrounds at the ath and (a + 11th states, respectively. The matrix of encoded background scans (Y> has L’ X h rows and n columns. The factored weight matrix reduces the demand for memory at the cost of additional computation.

Convergence is detected by the Hamming distance (DH) given by

DH= &z,+,I (4) i= 1

B. W. Wabuyele, P. de B. Harrington / Chemometrics and Intelligent Laboratory Systems 29 (1995) 51-61 53

for which k is the dimensionality of z, and is equal

to u X h. During resonance the network quickly evolves to a stable state that corresponds to a global

minimum. Thus convergence is obtained when the Hamming distance is zero. The maximum number of different backgrounds scans that can be stored may not exceed k.

After the network has converged, the bipolar output vector z, must be decoded back into a scan. This

process is the inverse of encoding. The vector com- ponents are rearranged into the original bipolar matrix. The columns and rows of the matrix correspond to resolution elements and intensity values, respectively. For each column, the intensity values of all the grids that have + 1 values are averaged. Therefore, a spectrum is obtained when this step has been com- pleted for all the resolution elements.

3. Modified BAM

A modified BAM procedure uses the same W factoring as the BAM, but has an additional data reduc- tion step which is accomplished by removing grid units that have - 1 values for all the stored patterns. Grid units with values of + 1 for all training backgrounds do not occur, because only one intensity value exists for each resolution value, and the intensities vary with respect to the horizontal grids among the spectrum. A second modification is that the vector obtained from WTzf is used for decoding. The bipolar encoded vector that is obtained after resonance is zf. By removing the hard limiter function, selectivity is increased. Therefore, the largest grid value for each column of the bipolar matrix is used to determine the scan intensity instead of the average of all the values that are greater than zero.

4. OAM

The OAM network has already been reported [2]. This network is similar to the modified BAM dis- cussed above. It differs in that Y is converted to an orthonormal basis, and the algorithm does not need the hard limiter function. A predicted background is reconstructed from the projections onto the basis. The prediction only requires a single step instead of the

recursive procedures used by the BAMs. The prediction process for the OAM is given by

Zf = V(VTZi) (5)

for which zf and zi are the predicted background and sample scans, respectively. V is the basis of the encoded background scans.

The efficacies of these networks were evaluated by three types of errors for predicting a 100% transmittance line. The total error value is given by

(6) for which uenC, a,,, and a_ are errors due to encoding, retrieving and recalling processes, respectively. The first error is introduced by the encoding and decoding processes, and is referred to as encoding error (oEenc). It is obtained by

i

Ix:‘= 1 [ lOO( .@J - loo]* u = e”C (7)

L’

for which xi and 2, are the intensity values for a background scan and the scan after encoding and decoding, respectively. The number of resolution elements in the spectrum is given by I’.

The second error is the retrieval error (a,,,). This error is introduced by retrieving a stored background scan. For this case, Eq. (7) is used and xi is a scan that is contained in the memory. The third error is the prediction or recall error (a,,,) which measures the ability of the memory to associate or fit patterns not stored in the memory. Eq. (7) is used and xi is a scan that is not contained in the memory. This error is introduced during the association process,

5. Experimental section

5.1. Spectroscopic measurements

Spectra were acquired from a Perkin Elmer Model 1600 FT-IR spectrophotometer equipped with a deuterated triglycine sulphate (DTGS) detector and a KBr beam splitter. Salad canola oil was supplied by Nabisco. Single-scan spectra were acquired from 450 cm-’ to 4400 cm ’ at 2 cm- ’ resolution and signal

54 B. W. Wabuyele, P. de B. Harrington / Chemometrics and Intelligent Laboratory Systems 29 (1995) 51-61

Background Scans

Sample IR Scan

Transmittance Spectrum

Predicted Background Scan

Fig. 1. Schematic diagram of the background prediction process using associative memories.

averaged over 64 scans. Ninety background scans background scans over a twelve-month period. Dur- were first collected using 32 X 3 mm KE%r cell win- ing this period the mirrors were realigned several dows with variable pathlengths. The difference in in- times to optimize throughput. Single scan spectra of strumental throughput was evaluated by collecting the canola oil were collected at the end of this period.

0.9

0.8

0.7

0.6

$ 0.5

8 0.4

% 0.3 e P 0.2

2 0.1

t $ 0.0 7

-0.1 _-._ ..__ s,_..-.-’

1

. . . .._.......... ..”

-0.2

“’ ” ” Conventional Method

- OAM Corrected

. . . . . .._._.._ ,._............ ‘.

: : .;.

I

4000 3000 2ooo 1000

Wavenumber (cm-‘)

Fig. 2. Infrared spectrum of canola oil with the standard conventional method (background collected just before the sample single scan spec-

trum as the reference). Infrared spectrum of canola oil with the OAM predicted background using 200 grids.


5.2. Spectral processing 6. Discussion of results

The canola oil scans were corrected by BAM,

modified BAM and OAM predicted background scans. A canola spectrum obtained from consecu- tively acquired sample and background scan was used as a control.

The 100% transmittance lines for BAM, modified BAM and OAM were compared. The comparative evaluation of prediction errors (encoding, retrieval and recall) with number of horizontal grids for BAM, modified BAM and OAM used scans acquired at 2 cm-’ resolution. The scans consisted of 3950 resolution elements per scan. The effect of the vertical grid number on the encoding and retrieval errors as a function of the horizontal number was evaluated. This

evaluation used ten background scans that were stored in an associative memory. Each background was collected one month apart.

A standard absorbance spectrum of canola oil was obtained by collecting the reference scan immedi-

ately after the sample scan. An OAM predicted background scan was obtained from a memory that ex- cluded the reference spectrum used above. Fig. 1 gives a schematic representation of predicting a background scan using associative memories. The IR sample scan is zi in Eq. (5). The standard and OAM corrected spectra are given in Fig. 2. The standard spectrum shows the presence of a sloping and fluctuating baseline. Note that the OAM furnishes a better baseline.

All spectral data were transferred to a 25 MHz Everex 80486 computer equipped with 8 Mbyte RAM and 128 kbyte RAM cache. The host operating sys- tem was MS-DOS 6.2. The BAM, modified BAM and OAM programs were developed and debugged using Borland C 3.1 and evaluated in 32 bit protected mode using Watcom C 9.5 version B.

BAM and modified BAM corrected spectra are given in Fig. 3. The BAM corrected spectra per-

formed poorly, because all the stored background scans are similar and not orthogonal. All the backgrounds contribute redundant information in predicting a background scan. Consequently, the predicted scan resembles the average scan of all backgrounds stored in the memory. This effect may be observed by the large carbon dioxide and water absorbance peaks in Fig. 3, because these bands vary the most among the background scans, and they are poorly character- ized by an average scan.

0.9

0.7

0.5

0.3

0.1

-0.1

3000 2000

Wavenumber (ems’)

Fig. 3. Infrared spectra of canola oil obtained with conventional and modified BAh4 predicted backgrounds

56 B. W. Wabuyele, P. de B. Harrington /Chemometrics and Intelligent Laboratory Systems 29 (1995) 51-61

0.9

0.7

2 0.5

B 0.3 2 k 4 0.1

- Smallest Euclidian Distance

Lamest Score Value (Bipolarly Encoded) I

Wavenumber (cm-‘)

Fig. 4. Infrared spectrum of canola oil obtained from the background from memory with the smallest Euclidian distance. Infrared spectrum

of canola oil obtained with a bipolarly encoded background that yielded the largest score.

1X0

160

140

8

2 120 .Z E 2 100

2 b

z 80

8 $ 60

a

40

20

0

A

l- Conventional BAM

I I I I i

4000 3000 2000 1000

Wavenumber (cm-‘)

IX0

160

140

20

0

B

..“‘..‘.’ OAM Modified BAM

4000 3000 2000 1000

Wavenumber (cm-‘)

Fig. 5. Plot of the BAM corrected 100% line (A). The modified BAM corrected and the OAM corrected 100% lines (9).

B. W. Wabuyele, P. de B. Harrington / Chemometrics nnd Intelligent Laboratory Systems 29 (1995) 51-61 57

The closest background scan from the training data to the canola oil scan was obtained using the Eu- clidean distance measure. This scan was used to cor- rect the canola oil scan and is given in Fig. 4. This experiment shows that the OAM is accomplishing more than simply retrieving the most similar background scan. To further show that the OAM prediction process is using associative properties to gener- ate an improved background scan, the bipolar encoded spectrum with the largest score value was determined using the canola oil spectrum from the modified BAM. This background scan was used for calculating the absorbance spectrum in Fig. 4.

To further evaluate the performance of these tech- niques 100% transmittance lines between predicted and measured backgrounds were examined. Exam- ples of 100% transmittance lines obtained from BAM, modified BAM and OAM are given in Fig. 5. The BAM and modified BAM yielded a 100% transmittance line that had a poor fit when compared to those furnished by the OAM. The BAM has a greater con- tribution from carbon dioxide and water absorption regions while the modified BAM has a significant

baseline slope. The 100% transmittance line provides a useful measure for evaluating background correction efficacy.

The number of horizontal grids (h) or number of grids is very important for bipolar encoding and af- fects the selectivity of the retrieval process of the associative memories. Selectivity is the ability of a network to retrieve an input spectrum without associat- ing other dissimilar spectra. Therefore selectivity is a parameter that requires investigation.

A quantitative study of number of grids (h = 200) based on the root mean square prediction error of 100% transmittance lines was conducted using the BAM, modified BAM and OAM. To characterize the accuracy of the encoding, retrieval and recalling processes, Eq. (7) was used to calculate the 100% line prediction errors associated with each process. En- coding, retrieval, and recall errors were quantitatively evaluated for 90 single scan spectra. These

spectra were collected over a period of one year. The average error and standard deviation are plotted as a function of number of grids (h).

The first step in the associative memory process is

- Avg. Enc.

.- Avg. Ret. (Mod. BAM- Binary)

--- Avg. Ret. (Mod. BAM-Bipolar)

STD. DEV

14 r

40 80 120 160 200

Number of grids

- Avg. Recall (OAM)

STD. DEV.

B

.I .., ..y,. :. . .‘.. ..,. ., ::

,. . . . .‘..’

40 80 120 160 200

Number of grids

Fig. 6. Root mean square prediction error for encoding (bipolar) and decoding processes as a function of number of grids for BAMs and OAM networks (A). Root mean square prediction error for the retrieval process (bipolar and binary) as a function of number of grids for the

modified BAM (A). Root mean square prediction error due to the recall processes as a function of number of grids for the OAM (B). In all

cases, 90 scans were in memory.

58 B. W. Wabuyele, P. de B. Havington / Chemometrics and Intelligent Laboraton: Systems 29 (1995) 51-61

bipolar encoding. Both BAMs and OAMs use this process and will have similar errors. The encoding error is introduced during the encoding and decoding steps. Fig. 6 shows that this error asymptotically ap- proaches zero as the number of grids increases. For more than 1.50 grids the error becomes negligible. This error does not vary much by scan as shown by the small standard deviations.

The next study examined the retrieval error as a function of number of grids. Because the BAM performs very poorly compared to the modified BAM and OAM, it was not considered in this evaluation. Fig. 6A also gives the retrieval for the modified BAM. This plot indicates that the modified BAM has a significant retrieval error, because of the high de- gree of cross talk or collinearity among the background scans. Note that when the number of grids is low (i.e. less than SO), the error increases with increases in number of grids. For the OAM network the retrieval error is zero, due to the implicit orthogonal- ization of the stored background scans. The plot of error is identical to the encoding error given in Fig. 6B. The retrieval error was also computed for modified BAM with binary encoding instead of bipolar encoding procedure. A comparison of these errors as

shown in Fig. 6A indicates that the modified BAM with binary encoding has a significantly lower error when compared with the one with bipolar encoding. Binary encoding makes the patterns approximately orthogonal. The OAM stores orthogonalized patterns, and the retrieval error is therefore independent of the encoding procedure. It should be noted that the

OAM retrieval error is less than the binary encoded BAM.

The recall error is introduced when a background

scan is not present in the memory. The recall error associated with each single scan spectrum was computed for the OAM, but not the modified BAM with bipolar encoding, because it had a substantial retrieval error. As for the modified BAM with binary encoding the recall error is expected to be similar to

the OAM, because the patterns are approximately orthogonal. With the number of grids fixed at 200, the errors associated with each spectrum were calculated

using the leave one out method of cross validation for the OAM. Each scan is removed from the memory, and then used for retrieval. The results are given in

Fig. 6B. The large standard deviation values show that the errors are dependent on the scans. Therefore, to investigate this further the error values for each

3 22 -

2 20-

a .% Is- E 2 16-

2 14-

2 12-

8 & lo-

& s-

2

r=

6 -

X X

X

X

X X

I I I I I 1 1 I I 1 I I

0 10 20 30 40 50 60 70 80 90

Scan Number

Fig. 7. Root mean square prediction error for recall process for each spectrum in the training set as a function of scan number with number

of grids fixed at 200.


I I

t 20

t u-

4ooo 3ooo 2ooo 1000

Wavenumber (cm-‘)

Fig. 8. The predicted 100% line using the spectrum with the maximum and minimum recall error values of RMS = 20.47 and RMS = 0.98,

respectively. The number of grids was fixed at 200.

scan are plotted as a function of scan number for 200 grids. See Fig. 7. Note that over 90% of the scans have low error values, and for the other 10% the

OAM performs detrimentally. Using the spectrum

with the lowest and the highest error values the 100% lines for each was reconstructed. The 100% lines with

Fig. 9. A plot of root mean

and 200 grids.

--- 50Grids

0.8 -

0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0

Resolution

square prediction error for retrieval process as function of spectrum resolution for OAM using 10, 20, 50, 150

60 B. W. Wabuyele, P. de B. Harrington/ Chemometrics and Intelligent Laboratory Systems 29 (1995) 51-61

greatest (20.47) and smallest (0.98) errors from Fig.

7 are given in Fig. 8. Note in Fig. 8 that for lack of fit most of the errors occur in regions of lowest signal to noise. This effect is typical for poor predic- tions, so for large errors, the predicted background may still be used for some spectral regions. Increas- ing the scope (i.e. the breadth of possible background scans) of the backgrounds stored in the OAM should reduce the average recall error.

An evaluation determined the effect of the number of resolution elements for IR spectra on retrieval error. Fig. 9 is a plot of retrieval error as a function of spectrum resolution with varying number of grids for the OAM. For the retrieval error there is a varia- tion of error with resolution when the horizontal grid number is small. But as the number of grids increases, the error becomes independent of the resolution elements of the spectra.

7. Conclusions

Background correction of IR spectra yields a rig- orous application for evaluating associative memories, because the background scans may contain significant variations (e.g., source intensity, water va- por, and carbon dioxide). Factoring the weight matrix allows high resolution data such as spectra to be used with associative memories on desktop computers. BAMs have a disadvantage of requiring orthogonal patterns for storage, a condition that is difficult to meet with chemical data. This work has shown that by using binary encoding for modified BAMs the input patterns are approximately orthogonal. An OAM was devised and used for background correction of high resolution data.

Both background retrieval and recall have important applications in chemistry. The OAM is associative which improves background prediction. The application of an OAM for predicting background scans yields a rugged method for calculating absorbance spectra. For a large number of horizontal grids the retrieval error for the OAM is independent of the number of spectral resolution elements. The OAM is a technique that can be applied to any type of data (e.g., chromatographic or electrochemical). Two conditions must be satisfied: the background spectra and the sample spectra must have points of intersection

and the signal variations in the sample need to be

different from the background variations. Future work in this area involves incorporating fuzzy logic and principal component analysis (PCA) in the pre- processing encoded spectra of near-IR data.

Acknowledgements

We gratefully acknowledge Nabisco for supplying the oils, and the US Army ERDEC contract DAA- DOS-92-T-8659 for providing financial support.

Appendix 1

A. Example of the encoding process for BAMS and

OAM For xi input intensities, the bipolar values are

given by yi. (See example below.) Assume a digitized scan at four wavenumbers is given by

XT = [13 2 l]

Using a 3 by 4 grid, the bipolar matrix tained.

yi is ob-


-1 +l -1 -1

-1 -1 +I -1

+l -1 -1 +1

-1 +1 -1 -1

yi= [ -1 -1 fl +1 -1 -1 -1 1 $1

A row vector is obtained by joining the rows from top to bottom.

yT = [-1 +1 -1 -1 -1 +1 -1

-1 fl -1 -1 +1]

B: Example of the decoding process for OAM

For a vector output zf E R&en by

ZT= [+0.9 -1.0 -0.8 +l.O -1.0 -0.6 +0.7 f -0.4 -1.0 +0..5 -1.0 -1.01

the matrix representation is shown below:

-1.0 +os -1.0 -1.0

-1 -0.6 +0.7 -0.4

+0.9 -1.0 -0.8 +l.O

For each column the largest value will correspond

to the predicted intensity and the decoded single scan spectrum is given by

z;=[1321]

References

[l] B. Kosko, App. Opt., 26 (1987) 4947-4960.

[2] B. Kosko, Byte, September (1987) 137-144.

[3] J. Zupan and J. Gasteiger, Neural Networks for Chemists,

VCH, Weinheim, 1993, pp. 65-78.

[4] C. Schierle, M. Otto and W. Wegscheider, Fresenius J. Anal.

Chem., 343 (1992) 561-565.

[5] A. Blum, Dr. Dobb’s Journal, April (1989) 16-20.

[6] A. Schneider and V.G. Sigillito, in O.M. Omidvar (Editor),

Progress in Neural Networks, Abex, Norwood, NJ, 1991, pp.

87-104.

[7] B. Kosko, IEEE Trans. Syst. Man Cybem., 18 (1988) 49-60.

[8] P.D. Wasserman, Neural Computing Theory and Practice,

Van Nostrand Reinhold, New York, 1989, pp. 98-119.

[9] S. Haykin, Neural Networks, Macmillan College Publishing

Company, New York, 1994, pp. 563-571.

[lo] T. Kohonen, Self-Organization Associative Memories, Springer, Berlin, 1988.

[ll] G. Hinton and R. Anderson, Parallel Models of Associative

Memory, Lawrence Erlbaum, Hillsdale, NJ, 1990, pp. 78-90.

[12] T. Kohonen, IEEE Transactions on Computers, 21 (1972)

353-359.

[13] B.W. Wabuyele and P.B. Harrington, Anal. Chem., 66 (1994)

2047-2051.

Documents

Quantitative comparison of bidirectional and optimal associative memories for background prediction of spectra