13
778 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994 Designing Bidirectional Associative Memories with Optimal Stability Tao Wang, Xinhua Zhuang, and Xiaoliang Xing Abstract-In this paper, a learning algorithm for bidirectional associative memories (BAM’s) with optimal stability is presented. According to an objective function that measures the stability and attraction of the BAM, we cast the learning procedure into a global minimization problem, solved by a gradient descent technique. This learning rule guarantees the storage of training patterns with basins of attraction as large as possible. We also investigate the storage capacity of the BAM, the convergence of the learning method, the asymptotic stability of each training pattern and its basin of attraction. To evaluate the performance of our learning strategy, a large number of simulationshave been carried out. I. INTRODUCTION RTIFICIAL neural networks, characterized by massive A parallelism, robustness, and learning capability, have recently attracted a great number of researchers from vari- ous fields. Being one kind of content addressable memories (CAM’s), the BAM was extended from Hopfield associative memories (HAM’S) [ 101 by Kosko [ 111. Bidirectionality, or forward and backward information flow, is used to produce a two-way associative search for a pattern pair. An important attribute of the BAM is its ability to retrieve one stored pattern from its noisy or partial input. The BAM has two major advantages over other associative memories. First, the BAM is unconditionally stable [ l l ] , [16]. In other words, any connection matrix is bidirectionally stable. But, the HAM requires a symmetric connection matrix with zero diagonal elements to ensure the global stability. Second, the BAM can converge to a stable state in a synchronous mode where the HAM may have oscillatory states [15], [20]. The synchronous stability is very appealing when considering a massively parallel machine implementation. This means that the BAM can be used in real-time applications. There are many applications of the BAM. For example, Bavarian [2] used the BAM to increase the reliability of a control system through a fault isolation. Wang et al. [19] showed how the BAM can be used for pattem recognition. Dunning et al. [7] have applied the BAM to image processing. For a brief review of applications and implementations of the BAM, readers are referred to [16]. Manuscript received March 23, 1991; revised September 20, 1993 and June 11, 1993. T. Wang and X. Xing are with the Department of Computer Science and Engineering, Zhejiang University, Hangzhou 3 10027, People’s Republic of China. X. Zhuang is with the Department of Electrical & Computer Engineering, University of Missouri-Columbia, Columbia, MO 6521 1 USA. IEEE Log Number 9400639. To encode the BAM, many learning methods have been proposed. Kosko [ 111 used an outer-product rule to determine each individual weight by levels of activity between connected units. It does not work well if training patterns are not mutually orthogonal. There is no guarantee that correlated training patterns are accurately stored. A sequential multiple training (SMT) [19, 201 achieves recall of a single pattern. It was extended to linear programming/multiple training (LPMT) [20] by using a generalized correlation matrix. A dummy augmentation [ 191 guarantees the recall of all training pattems by the addition of some dummy elements. However, none of these learning rules consider basins of attraction seriously. This will severely restrict the application of the BAM as a CAM. In fact, what we really need is the ability of recalling a pattern from its imperfect input. To construct a well-defined BAM, one should take the following optimality criteria [ 131 into account: (Cl) Each training pattern is stable in the BAM; (C2) Around each stored pattem there exists a nontrivial basin of attraction. The size of the basin of attraction controls the extent of attraction of a stable pattern; (C3) The number of spurious states is minimal. By a spurious state [4], we mean a stable state that is not in the training set. According to (Cl) and (C2), we derive an objective func- tion that measures stability and attraction of the BAM in a quantitative way. Obviously, (Cl) is related to the storage capacity of the BAM, and (C2) to its noise correction ability. A larger basin of attraction always implies lower percentages of misclassification of the stored pattems and stronger tolerance to the inputs of low signal-to-noise ratios. That the learning rule is optimal means that it is designed based on the optimality criteria (Cl) and (C2). In terms of the objective function, the learning rule is formed as a global minimization problem with simple constraints. It is solved by a gradient descent technique. This learning approach ensures that each training pattern is stored with a basin of attraction as large as possible, namely, optimal stability or attraction. A large number of simulations have been implemented. By comparison with the outer-product method, the SMT, and the LPMT, the proposed learning method shows improved storage capacity and recall. The rest of this paper is organized as follows. Section I1 briefly describes the architecture of the BAM. In section 111, we show how to estimate connection weights that ensure the storage of all training patterns, and discuss the storage 0018-9472/94$0f.00 0 1994 IEEE

Designing bidirectional associative memories with optimal stability

Embed Size (px)

Citation preview

Page 1: Designing bidirectional associative memories with optimal stability

778 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994

Designing Bidirectional Associative Memories with Optimal Stability

Tao Wang, Xinhua Zhuang, and Xiaoliang Xing

Abstract-In this paper, a learning algorithm for bidirectional associative memories (BAM’s) with optimal stability is presented. According to an objective function that measures the stability and attraction of the BAM, we cast the learning procedure into a global minimization problem, solved by a gradient descent technique. This learning rule guarantees the storage of training patterns with basins of attraction as large as possible. We also investigate the storage capacity of the BAM, the convergence of the learning method, the asymptotic stability of each training pattern and its basin of attraction. To evaluate the performance of our learning strategy, a large number of simulations have been carried out.

I. INTRODUCTION RTIFICIAL neural networks, characterized by massive A parallelism, robustness, and learning capability, have

recently attracted a great number of researchers from vari- ous fields. Being one kind of content addressable memories (CAM’s), the BAM was extended from Hopfield associative memories (HAM’S) [ 101 by Kosko [ 111. Bidirectionality, or forward and backward information flow, is used to produce a two-way associative search for a pattern pair. An important attribute of the BAM is its ability to retrieve one stored pattern from its noisy or partial input.

The BAM has two major advantages over other associative memories. First, the BAM is unconditionally stable [ l l ] , [16]. In other words, any connection matrix is bidirectionally stable. But, the HAM requires a symmetric connection matrix with zero diagonal elements to ensure the global stability. Second, the BAM can converge to a stable state in a synchronous mode where the HAM may have oscillatory states [15], [20]. The synchronous stability is very appealing when considering a massively parallel machine implementation. This means that the BAM can be used in real-time applications.

There are many applications of the BAM. For example, Bavarian [2] used the BAM to increase the reliability of a control system through a fault isolation. Wang et al. [19] showed how the BAM can be used for pattem recognition. Dunning et al. [7] have applied the BAM to image processing. For a brief review of applications and implementations of the BAM, readers are referred to [16].

Manuscript received March 23, 1991; revised September 20, 1993 and June 11, 1993.

T. Wang and X. Xing are with the Department of Computer Science and Engineering, Zhejiang University, Hangzhou 3 10027, People’s Republic of China.

X. Zhuang is with the Department of Electrical & Computer Engineering, University of Missouri-Columbia, Columbia, MO 6521 1 USA.

IEEE Log Number 9400639.

To encode the BAM, many learning methods have been proposed. Kosko [ 111 used an outer-product rule to determine each individual weight by levels of activity between connected units. It does not work well if training patterns are not mutually orthogonal. There is no guarantee that correlated training patterns are accurately stored. A sequential multiple training (SMT) [19, 201 achieves recall of a single pattern. It was extended to linear programming/multiple training (LPMT) [20] by using a generalized correlation matrix. A dummy augmentation [ 191 guarantees the recall of all training pattems by the addition of some dummy elements.

However, none of these learning rules consider basins of attraction seriously. This will severely restrict the application of the BAM as a CAM. In fact, what we really need is the ability of recalling a pattern from its imperfect input. To construct a well-defined BAM, one should take the following optimality criteria [ 131 into account:

(Cl) Each training pattern is stable in the BAM; (C2) Around each stored pattem there exists a nontrivial

basin of attraction. The size of the basin of attraction controls the extent of attraction of a stable pattern;

(C3) The number of spurious states is minimal. By a spurious state [4], we mean a stable state that is not in the training set.

According to (Cl) and (C2), we derive an objective func- tion that measures stability and attraction of the BAM in a quantitative way. Obviously, (Cl) is related to the storage capacity of the BAM, and (C2) to its noise correction ability. A larger basin of attraction always implies lower percentages of misclassification of the stored pattems and stronger tolerance to the inputs of low signal-to-noise ratios. That the learning rule is optimal means that it is designed based on the optimality criteria (Cl) and (C2).

In terms of the objective function, the learning rule is formed as a global minimization problem with simple constraints. It is solved by a gradient descent technique. This learning approach ensures that each training pattern is stored with a basin of attraction as large as possible, namely, optimal stability or attraction.

A large number of simulations have been implemented. By comparison with the outer-product method, the SMT, and the LPMT, the proposed learning method shows improved storage capacity and recall.

The rest of this paper is organized as follows. Section I1 briefly describes the architecture of the BAM. In section 111, we show how to estimate connection weights that ensure the storage of all training patterns, and discuss the storage

0018-9472/94$0f.00 0 1994 IEEE

Page 2: Designing bidirectional associative memories with optimal stability

WANG et a!.: DESIGNING BIDIRECTIONAL ASSOCIATIVE MEMORIES 119

In the recall procedure, the next states Xl and yj’ are specified by

X: = sgn ~ i j ~ , i = 1 , 2 , . . . , N (2a)

(2b)

[i: ] j = 1 , 2 , . . . , P

where the threshold logic function sgn(z) is defined as

X-Layer Y-Layer Fig. 1. Structure of the BAM.

capacity of the BAM. To establish large basins of attraction around the training patterns, we examine how to characterize basins of attraction using simple constraints in section IV. Section V proposes an optimal learning algorithm based on an unconstrained minimization, and section VI develops a learning method based on a constrained minimization. Section VI1 gives computer simulations. The final section is the conclusion and discussion.

11. BIDIRECTIONAL ASSOCIATIVE MEMORIES (BAM’s)

The BAM, shown in Fig. 1, is a two-layer feedback neural network. It is composed of an X-layer of N units and an Y- layer of P units. The state of each unit Xi and Yj takes on a bipolar value -1 or +1, and thus the state of the network is a bipolar ( N + P)-tuple. Between each pair of unit Xi and y3 , there is a connection weight Wij, which is specified during a learning phase and fixed during a recall phase.

The BAM defined above is sometimes called a discrete BAM [16, 191, for one to distinguish it from the continuous BAM [ 111 that operates with analog state values in continuous time.

It is currently acknowledged that constructing the BAM consists of the following two phases. The first phase involves a learning rule. It is, in essence, designed to calculate the weights Wij. The second phase involves a recall procedure, which retrieves a stored pattern given an imperfect or corrupted pattern as input.

Let {A1&B1, A2&B2,. . . , AM&BM} denote M training pattern pairs, each of which is an N-dimensional bipolar vector, Ak = [ & I , Ak2, . . . , A ~ N ] and a P-dimensional

conditions on M (storage capacity limit) will be addressed in the next section.

The connection matrix W = [Wij] is conventionally deter- mined by using the outer-product rule [ 111,

Vector Bk = [Bkl, Bk2,. . . , Bkp] for k = 1 , 2 , . . . , M . The

where T denotes the transpose of a vector or matrix.

+1 i f a : > O -1 i f a : < O sgn(x) = (3)

It seems reasonable that the BAM should recall one of the nearest patterns A&Bk in the sense of the Hamming distance, when any pattern X&Y is inputted to the system. Because the CAM permits the recall of information on the basis of a partial knowledge of its content, one always wants to obtain a stored pattern that most strongly resembles the input.

Starting from any input X & Y, we can generate a finite sequence X & Y‘, X’ & Y’, X’ & Y“, X” & Y”, . . . , using the recall rule (2), where

(4) Y’ = sgn(XW) and X’ = sgn(Y’WT)

until a stable state X * & Y* is reached,

Y* = sgn(X*W) and X* = sgn(Y*WT) (5)

It was shown [ 111 that every BAM matrix is bidirectionally stable. In other words, for any connection matrix W , starting anywhere, the network will converge to a stable state within a finite number of updating unit states. This conclusion follows from the construction of an energy function (i.e., Lyapunov function or a Hamiltonian),

E ( X , Y ) = -XWYT (6)

whose value is reduced or remains constant during the recall process. From this viewpoint, if A&Bk does not form a local minimum in E ( X , Y), this pattern will never be retrieved even if the initial condition is x & Y = Ak&Bk.

Note that the outer-product approach (1) does not ensure that all training patterns are at local minima [19, 201, and the criterion (Cl) is mandatory for the BAM to operate properly.

111. OPTIMAL LEARNING STRATEGY BASED ON OPTIMALITY CRITERION (c1)

In this section, we describe a learning algorithm that guaran- tees the storage of all training patterns based on the optimality criterion (Cl), and study the storage capacity of the BAM.

According to the recall rule (2), a pattern X&Y is a stable state if and only if the following conditions hold

i = 1 , 2 , . . . , N (7a)

= sgn [ $ WijX;] j = 1 , 2 , . . . , P (7b)

Page 3: Designing bidirectional associative memories with optimal stability

A k N

A k N . . . . . .

I B k l B k 2 . . ' B k p

B k l B k 2 ' ' ' B k p s K = [ . . .

t K = [ .. . . . .

B k l B k 2 ' " B k p N x N p . . . A k l A k 2

. . . A k l A k 2

A k l A k 2 . . . A k N

C =

P x N P

- (A l lT

( A M I T ( B d T

. . .

. . .

Hence, the number of stable states in the BAM is equal to the number of pattern pairs that satisfy (7). From this necessary and sufficient condition, we propose the following lemma to define a stable state in the BAM.

A suficient condition for the pattern X & Y to be a stable state in the BAM is that

Lemma

/ P \

z=

Wt,Y, X , > o Z = 1 , 2 , ...,IV (8a) )

S1

S M

. . .

t l

- t M

. . .

/ A' \

/3, = W , , X , Y, > 0 j = 1 , 2 , . . . , P (8b) L ) Proofr See Appendix A.

Combining the above lemma with the optimality criterion (CI), we get the following corollary.

Corollary A suficient condition for each of M training patterns { A1&B1, A2&B2, . . . , A M & B M } to be a stable state in the BAM is that

i = 1 , 2 , . . . , N ; k = 1 , 2 , . . . , M

( 9 4

j = l , 2 ,..., P ; k = 1 , 2 , . . . , M (9b)

Therefore, to store all the patterns, the learning algorithm must find the connection matrix W that best fits the linear inequalities (9). Unfortunately, the outer-product learning rule does not really meet this fundamental requirement.

To investigate the storage capacity of the BAM, we first convert the linear inequality problem (9) into a linear equality problem. Let a k i = a k i and p k j = b k j , where a k i and b k j are arbitrary positive values, for example, a k i = b k j = + l . We obtain the expanded form,

where the last step is based on the fact that A k i , B k j = - 1 or + l .

Arranging the connection matrix W into vector form V, we can re-express (10) using the matrix-vector equation,

zv=c

v = [!.I N P x l

M ( N + P ) x 1

Theorem 1 If M 5 N P / ( N + P ) and the row vectors of Z are linearly independent, there exists at least one solution to the matrix-vector equation (1 I ) . In this case, each training pattern is a stable state in the BAM.

Proot See Appendix B. The meaning of the theorem is that, under the linear

independence condition, the BAM can learn as many as [ N P / ( N + P ) ] training patterns by solving ( l l ) , where 1x1 denotes the integer part of 2. That is, the storage capacity can reach as high as N P / ( N + P ) ] . As argued in the construction of the HAM [5, 141, the linear independence condition is weak. Note that theorem 1 is not restricted to a specific learning algorithm.

Because the matrix inverse is involved, solving the matrix- vector equation (1 1) is computationally expensive, especially when N , P and M are large. To reduce the computational burden, we have developed an efficient iterative approach that

Page 4: Designing bidirectional associative memories with optimal stability

WANG ef al.: DESIGNING BIDIRECTIONAL ASSOCIATIVE MEMORIES 78 1

minimizes an objective function. The objective function J ( W ) measures the stability of each training pattern,

convergence theorem [6], Le., its gradient descent procedure can converge to the right solution if there exists a solution.

where the error sets S, and S, are defined as follows,

s, = { ( k , i ) ( a k i 2 0, k = 1 , 2 , . . . , M ; i = 1 ,2 , . . . , N }

(134

(13b)

Obviously, the sum in (12) is over all the terms that satisfy f f k i I 0 or P k j I 0, i.e., violate the inequalities (9) (this is what we mean ‘error’).

The objective function J ( W ) has the following properties: 1) J (W) is a function of the connection matrix W and the

2) If all I lk i > 0 and P k j > 0, then J ( W ) = 0 because S,

3 ) If there exists some (Yki 2 0, for example, then J ( W ) =

Hence, in order to memorize all the training patterns, i.e., hold (Yki > 0 and P k j > 0, it is natural to define a global minimization as

s y = { ( k , j ) I P k j 5 0, k = 1 , 2 , . . . , kf; j = 1 ,2 , . . . , P }

training patterns;

and S, are empty (no error exists);

a k i 2 0.

W* = m$J(W) (14)

This gradient descent dynamics was successfully used for the HAM [9].

The iterative procedure (1 5 ) is an extension of the percep- tron’s gradient descent process to the BAM. It will be further generalized in later sections to consider basins of attraction in the BAM where its convergence will be shown in theorem 4. Because the sets S, and S, change with each iteration, one has to calculate (16a) using the following two steps:

a) Given the t-th estimate Wij(t), find the sets S, and S,

b) Calculate (16a) by summing the terms in S, and S,. This procedure can also be directly implemented using (16b)

without establishing S, and S,. The weakness of the iterative procedure (15) is that it has

to sum the partial contributions of all training patterns before modifying the connection matrix W . This makes it impossible to carry out (15) in real time, especially when M is large.

An alternate way is to update the connection matrix after a training pattern has been presented to the BAM. In this situation, a learning sequence consisting of all the training patterns can be arranged as follows:

based on (13);

Ai&Bl, A z & B 2 , . . . , A M & B M , . ’ . > A 2 M & B 2 M , A ~ M + I & B ~ M + I , 1 ’ .

where the ( m M + k)-th pattern A m M + k & B m M + k is equal to the k-th pattern A k & B k for m = 0 , 1 , . . . .

The learning phase involves sequentially presenting these training patterns to the BAM and then updating the connection

We follow the gradient descent approach to estimate the connection matrix W . Starting from Wij(t = 0) = 0, the iterative procedure is

matrix: For the currently presented pattern A t & B t , we update the weights using

where W;j(t) is the t-th estimate, and the derivative of J ( W ) w.r.t. W;j is given by

aJ(W)/aWij = ( - a a k i / a W i j ) + ( - a p k j / a W i j )

k , i E S , k , j E S y

= ( - A k i B k j ) + ( - A k i B k j ) (164 k , i E S , k , j € S ,

M

= - A k i B k j [ S ( a k i ) + s ( P k j ) ] (16b) k=1

where S(z) = 0 if IC > 0, and S(z) = 1 if x 5 0. The purpose of using S(z) is merely to simplify the

expression. It restricts the terms that satisfy ( k , z ) E S, or (k,j) E S, in the sum of (16). The procedure defined by (15) aims at reducing the error since the summation in (16) is over the error sets S, and S,.

The objective function (12) is essentially a perceptron cri- terion function (PCF) [3, 61. The main difference between the PCF and other error criterion functions such as the minimum- squared-error (MSE) is that the former considers only the misclassified pattern samples and the latter considers the entire pattern samples. The major advantage of using the PCF is its

obtained by dropping the sum in (16b). When the learning method (17) converges ((17) is a special

case of (22) whose convergence is proven in theorem 4), it means that there is no change of any weight Wij after presenting every training pattern to the BAM. At this time the linear inequalities (9) hold. That is because, if there exists some ati 5 0, for example, S(ati) = 1, and AWaj = Wij(t + 1) - W;j(t) = A t i B t j # 0, which is contradictory to the meaning of the convergence. Thus, according to the corollary, each training pattern is stored in the BAM, and a global minimum where J ( W ) = 0 is reached, since the error sets S, and S, are empty.

There are three points to be emphasized. First, the conver- gence of the iterative learning algorithm (17) means that a global minimum of J ( W ) has been reached, and a solution to (9) has been found. Second, theorem 1 gives the conditions whether a solution to (9) exists. These conditions depend only on training patterns. Hence, a good learning algorithm is able to find such a solution if it exists. The learning method proposed here possesses this property (see theorem 4). Third, because the learning process (17) is based on the optimality criterion (Cl), that is, the storage of each training pattern, the

Page 5: Designing bidirectional associative memories with optimal stability

782 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994

system can only assure to output if the initial state is x & Y = Ak&Bk. attraction.

In fact, what we really want from the BAM is its ability to recall a stored pattern from its noisy input, i.e., asymp- totic stability. This requires that each training pattern have a nontrivial basin of attraction around it.

small number of the training patterns allow a large basin of

The meaning of theorem 2 is that, under a normalization of Wij to limit its magnitude, for example, IWijl 5 W,, larger values of D, and D, imply larger basins of attraction in the BAM.

IV. A s m m o n c STABILITY ANALYSIS v. OPTIMAL LEARNING STRATEGY FORMED AS AN UNCONSTRAINED MINIMIZATION In this part, we discuss the asymptotic stability of each

training pattern and its basin of attraction. The Hamming distance (HD) seems to be a good choice to measure the size of basin of attraction. Let H ( A k , X ) and H ( B k , Y ) be the Hamming distance between A&Bk and X & Y .

This section presents a learning method based on an uncon- strained minimization, and shows its convergence.

We require the Euclidean norm of the connection matrix W be fixed, namely, ,/- = 1. This means of normalization is common [l, 121. In this case, to realize a nontrivial basin of attraction around each training pattern, it

Theorem 2 If

0 < 2H(Ak, X ) W m 5 D, 5 P k j j = 1,2 , . . . , P (18a) is desired that Q k i 2 D, > 0 and ,&j 2 D, > 0 with the constraint Jn = 1

The objective function of the learning rule can be defined as

and

0 < 2H(Bk, y)wm 5 Dz 5 akz i = 132, * . N (18b)

$ J(w) = ~ ( D x - a k t ) + x ( D y - P k j ) where Wm= max,,, IW2,1, then X&Y will converge to Ak&Bk in one iteration, and the basin of attraction of Ak&Bk is H, = D,/(2Wm) and H, = D,/(2Wm), respectively. k , iET , k , j E T v

Proof: See Appendix C. subject to Since any input X & Y that falls into the basin of attraction

H, and H, of Ak&Bk converges to Ak&Bk at one iteration, the minimal number of input patterns attracted by Ak&Bk is

k=O 2 (3 5 1=0 (7). Obviously, increasing D, and D, or decreasing W, will

improve the performance of the BAM in terms of the asymp- totic stability and the reduction of spurious states. However, there exists an upper limit of the basin of attraction.

Theorem 3 The upper limit of the basin of attraction is P /2 for Y-layer and N / 2 for X-layer.

Proof: See Appendix C. Intuitively, this conclusion is reasonable because the com-

plement (-Ak)&(-Bk) of Ak&Bk is also a stable state of the BAM [ 111 according to the lemma. If H ( A k , X) > N / 2 and

P/2 , and X & Y will converge to (-Ak)&(-Bk). When H ( A k , X ) = N / 2 and H ( B k , Y ) = P / 2 , X & Y converges

On the basis of the above analysis, the total number of the input patterns that can be attracted by either one of M training patterns is lower bounded by

H ( B k , Y ) > P/2 , then H ( - A k , X ) 5 N / 2 , H ( - B k , Y ) 5

to either Ak&Bk Or (-Ak)&(-Bk).

Since the total number of the inputs is 2N+P, it holds that

(19) 1=0 k=O

Thus, a large number of the training patterns imply a small basin of attraction around each stored pattern. Conversely, a

where T, and T, are of form

TX = { ( k , i ) l a k i < D,, IC = 1 , 2 , . . . , M ; i = l , 2 , . . . , N }

k = 1,2 , . . . , M ; j = 1,2 , . . . , P }

(21a) Ty = { (k , j ) l@kj < D y ,

(21b)

The learning process involves sequentially presenting the training patterns to the BAM and updating the connection matrix. The gradient descent equation can also be derived using the technique in section 111. For the presented pattern At&Bt, we update the connection matrix W as follows,

wij(t + 1) = wij(t) + qAtiBtj[S(ati - 0,) + S(Ptj - Dy)] (22)

where q is a learning rate. There is no constraint (20b) that is imposed upon the

weights W;j during the iteration (22). Thus it is an uncon- strained minimization. The normalization (20b) is performed only at the end of the iterative procedure.

If (22) converges at time t = L, the degree of attraction (i.e., the size of the basin of attraction) can be measured by

Page 6: Designing bidirectional associative memories with optimal stability

WANG et al.: DESIGNING BIDIRECTIONAL ASSOCIATIVE MEMORIES 783

ations.

matrix W , (a) if ati < D,, update the z-th row of the connection

W;j(t + 1 ) = Wij(t) + qAt;Btj j = 1,2 , . . . , P (24a)

because S(at; - 0,) = 1. Otherwise, leave the weights unchanged.

(b) if Pt j < D,, update the j-th column of the connection matrix W ,

Wij(t + 1 ) = Wij(t) + qAtiBtj i = 1 , 2 , . . . , N (24b)

because S ( & - D,) = 1. Otherwise, leave the weights unchanged.

Theorem 4 The iterative learning algorithm (24) converges within a finite number of steps whenever there exists a solution.

Proof: See Appendix D. It is noticed that the learning rate q affects the upper limit

of Ne (see Appendix D). The larger the value of q, the fewer the number of iterations required by (24).

When the learning algorithm (24) converges, Q k i 2 D, and P k j 2 D,, and hence a global minimum of J ( C ) = 0 is reached because the sets T, and T, are empty. From the viewpoint of optimization, (24) assures to find a global minimum of J ( W ) if a solution exists.

Theorem 5 The degree of attraction that results from the optimal learning algorithm (24) is lower bounded,

Oz(L) > h(q)Oz(*) (254 0, ( L ) > h ( q P , (*I (25b)

where h(q) = Dmin/(qNrnax + 2DmaX). Proof: See Appendix D.

Since h(q) is a monotonically decreasing function with respect to q, a smaller value of q implies a larger basin of attraction. When q approaches zero, h(q) approximates

The choice of the learning rate q is important in the construction of the BAM. There is a tradeoff for selecting q between the convergence rate and the degree of attraction. A feasible selection is that q = l / N m a x . The values of D, and D , are determined by computing

(264

= min H ( B k , B1)/2 (26b)

Dmin/( 2Dmax).

Ha = min H ( & , A1)/2 k#l

k#l

and setting D , = 2HbW,, D, = 2H,W,.

VI. OPTIMAL LEARNING STRATEGY FORMED AS A CONSTRAINED MINIMIZATION

In this section, we describe a learning algorithm, which imposes a constraint on the connection weights during the iterative process. To build the basin of attraction as large as possible, the maximal values of D, and D , are automatically selected.

e J ( W ) = C C ( o x - a k i ) + ~ ~ ( D y - / j k j ) (27a) k , iET , k , j E T p

subject to

lwijl 5 W, or wij2 5 wm2 (27b)

where T, and T, are given in (21). This is a constrained minimization problem, and the con-

straint (27b) defines a subspace of the multidimensional pa- rameter space which is called the feasibility region. If D , = D, = 0 and W, = +ea, we will obtain the objective function (12).

To solve this constrained minimization problem, we convert it into an equivalent unconstrained problem using a pseudocost function E(W) [17, 181,

E ( W ) = J ( W ) + yP(W) (28)

where P ( W ) is a penalty function and y is a real parameter called a penalty multiplier.

In order to form penalty dominance outside the feasibility region and to guarantee the minimum of the pseudocost func- tion E ( W ) inside this region, the penalty multiplier y should be chosen large enough. Moreover, a valid penalty function P ( W ) has to monotonically increase when the connection weights move away from this feasibility region. The square penalty function fulfills this requirement,

i j

where T ( z ) = 1 if z > 0 and T ( z ) = 0 if z 5 0.

iterative procedure, According to the gradient descent method, we obtain the

where q is a learning rate. Since P ( W ) is a piecewise differentiable function and is not

differentiable at Wij = *W,, we define aP(W)/aWij = 0 for Wij = *W, in (30c). Using T ( z ) is to impose a penalty upon E ( W ) only when any weight W;j is outside the feasibility region. In other words, when Wij > W,, the penalty (30c) tends to decrease Wij, and when W;j < -W,, the penalty (30c) tends to increase Wij. This exactly fulfills the effect of the penalty that one really needs.

Page 7: Designing bidirectional associative memories with optimal stability

784 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5 . MAY 1994

In fact, (27) is a linearly constrained minimization since

(31)

(27b) can be rewritten as

-W, 5 W;j and - W, 5 -Wij

This can also be solved by the model active set method [8]. To avoid the computation of the penalty function (30), we

propose a very simple iterative learning rule as follows: For the currently presented pattern At&&,

(a) if ati < D,, update the i-th row of the connection matrix W ,

W;j(t + 1) = @{Wig(t) + qAt;Btj} j = 1 , 2 , . . . , P (32a)

Otherwise, leave the weights unchanged.

matrix W , (b) if Ptj < D,, update the j-th column of the connection

W;j( t+l ) =@{W;j(t)+qAt;Btj} i = 1 ,2 , . . . , N (32b)

Otherwise, leave the weights unchanged. @(z) is a hard limiter-type function,

-W, i f x < - W , @(x) = z if - W, Iz 5 W, (33) { W, i f z > W ,

The use of @(z) is to exactly fulfill the constraint (27b) and to remove the calculation of the penalty function in (30). In fact, the iterative procedure (32) is simply a modification of (24) by adding @(z) to limit the magnitude of any weight.

If the constrained problem (27) does not admit any solution, the iteration procedure (32) will not converge. This property motivates the construction of the basin of attraction as large as possible. We need to choose the maximal values of D, and D,, with which (32) can converge.

As can be imagined, an appropriate global choice for D, and D, may not be always possible since the distribution and the number of the training patterns are unpredictable. The weakly correlated patterns allow a large basin of attraction around each stored pattern, while the strongly correlated patterns allow a small basin of attraction. When D, and D, are selected too large, the constrained problem (27) does not admit a solution. Conversely, when D, and D, are selected too small, the basin of attraction attained by the optimal learning algorithm will be small.

An efficient way to solve this problem is to start from small D, and D,, and to gradually increase D, and D, until the learning rule fails to converge. The basin of attraction of each training pattern is gradually enlarged as D, and D, increase. This is suitable to any set of training patterns because there is no need of guessing D, and D,.

Now, we summarize this optimal learning algorithm as follows:

Step I

Step 2

Initialization. Encode the training patterns, and set the initial weights Wij(0) = 0, D, = D, = 0 and Wm ; Iteration with D, and D,. Given the t-th estimates W;j(t), calculate W;j(t + 1) based on (32), and count the number of iterations n( D,, D,) for current values of D, and D,;

Fig. 2. Training Patterns.

Step 3 If the iteration (32) converges with the iterative number n( D,, D,) less than a prespecificed number, then increase D, by A D , and D, by AD,, and go to step 2. Otherwise, carry out a retrieval process with the maximal values D, = D, - A D , and D, = D, - AD,.

VII. EXPERIMENTAL RESULTS In this section, we describe a large number of simulation

results. The optimal learning method has been compared with the outer-product rule, the SMT with the upper bound U = 100 [20], and the LPMT method. We did not make comparison with the dummy augmentation since it has to add some dummy elements (that is, units of specific states) that change the structure of the BAM.

In every experiment, we set W, = 10 and A D , = A D , = W, for the optimal learning algorithm.

A. Histogram of Successful Recalls f o r Weakly Correlated Patterns

In a pattern recognition problem, one wants to recognize airplanes, tanks, and helicopters. One needs to train the BAM in advance with these specific patterns, and then to recall them from noisy inputs. Fig. 2 shows the three training patterns for this application, which are similar to the data set used in [19].

We generate 1500 noisy inputs (500 for every pattern) by randomly inverting each bit from +I to -1 and vice verse with the probability less than or equal to R = 0.5. For a noisy input, if the corresponding stored pattern pair is recalled, it is a success; otherwise it is a failure.

The results of successful recalls are given in Fig. 3. The optimal learning algorithm terminates with the maximal values of D, = D, = 160, and the total number of iterations is 36. By one iteration, we mean presenting each training pattern once to the BAM. It is noted that, if the outer-product method can store all the training patterns, it will have the same performance as the SMT since both learning rules obtain the same connection matrix W .

From the experimental results, one can find that our method has a stronger noise correction capability. More specifically,

Page 8: Designing bidirectional associative memories with optimal stability

WANG et al.: DESIGNING BIDIRECTIONAL ASSOCIATIVE MEMORIES 785

100%

9 0%

80%

7 0%

60%

5 0%

40%

30%

20%

10%

00% 0.00 0.05 0.10 0.15 0.20 0 . 2 5 0.30 0.35 0 .40 0.45 0.50R

The error rate ( R )

Fig. 3. Percents of succesfull recalls. “0”: the outer-product rule; “0”: LPMT “X’: the optimal learning algorithm.

the outer- product method works well (more than 90 percent of inputs recall the stored patterns) if the error ratio R 5 0.05. This ratio is increased to R 5 0.1 by the LP/MT, and to R 5 0.35 by the optimal learning rule. Some recall results of the BAM trained by our learning rule are shown in Fig. 4.

B. Histogram of Successful Recalls for Strongly CorrelatedPatterns

Suppose that six training patterns ( M = 6) of size N = P = 35, shown in Fig. 5, need to be stored. A Hamming distance similarity matrix that shows the correlations of the training patterns is given in Table I. Compared with the number of the units ( N + P = 70), the minimal Hamming distance between two different patterns is small.

The BAM trained by the outer-product method (1) is unable to store any patterns. Taking these training patterns as inputs, we reach the same output which is a spurious state, shown in Fig. 6. Both the SMT and the L P M T method also can not store all these training patterns. This means that those methods are incapable of coping with strongly correlated patterns.

Using 16 iterations, the optimal learning approach converges with the maximal values of D, = D, = 20. For 3000 noisy inputs, the percents of successful recalls are given in Table 11. This learning algorithm realizes the correct recalls of more than 90 percent of inputs when the error rate R 5 0.15.

It is evident that our learning strategy not only stores all these highly correlated patterns, but also forms large basins of attraction. Some examples of the recall procedure are depicted in Fig. 7.

Comparing with the experiment A, we see that the optimal learning algorithm automatically chooses the maximal values of D, and D,. It assigns larger D , and D, to the weakly correlated training patterns than to the strongly correlated ones.

C. Storage Capacity Test In this part, we demonstrate the storage capacity improve-

ment for the optimal learning algorithm. The experimental data are set up in the same way as Wang et al. [19].

In the simulations, we adopt randomly generated training patterns. With fixed N = P = 50, the number of training patterns M is varied from 1 to 25 in 2 increments. For each parameter ( N , P, M ) , 100 sets of the training data are generated randomly. With respect to each set, if all the patterns can be retrieved from noise-free inputs, we declare it a success; otherwise a failure. The percentages of successes are shown in Fig. 8, and the average number of iterations for the optimal learning rule is given in Table 111.

Compared with the outer-product rule, the SMT, and the LPMT, our learning approach offers higher storage capacity. For instance, the outer-product method realizes over 90 percent of successful recalls if M 5 5. The LPMT extends it to M 5 7, and the optimal learning rule improves it to M 5 23 (theorem 1 states that the storage capacity in the case of linear independence is N P / ( N + P ) = 25).

VIII. CONCLUSION AND DISCUSSION The learning method, which depends on some optimality

criteria, has been presented. These optimality criteria includes not only storing all training patterns, but also storing them

Page 9: Designing bidirectional associative memories with optimal stability

786 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994

P t

TM

Fig. 4. Recall results with respect to (a) R = 0. 1; (b) R = 0. 1; (c) R = 0.1; (d) R = 0.3; (e ) R = 0.3 in the BAM trained by the optimal learning method.

TABLE I THE HAMMING DISTANCE SIMILARITY MATRIX, WHERE EACH

ELEMENT EQUALS H(Ck, CI)=H(Ak, AI)+H(Bk, €31)

c2 c3 22 24 14 20 c4 18 18 14 23 20 C5 23 19 23 23 13 c6 18 26 20 20 13 0

with optimal stability. In the global minimization, the gradient descent rule is employed. The theoretical development, which involves the storage capacity of the BAM, the asymptotic stability of each training pattem and its basin of attraction, and the convergence of the learning rule, has provided the substantial basis for our learning strategy.

A

Fig. 5.. Training patterns.

E 3 Fig. 6. Recall results in the BAM trained by the outer-product learning rule.

TABLE I1 PERCENTS OF SUCCESSFUL RECALLS IN THE BAM

TRAINED BY THE OPTIMAL LEARNING RULE

R 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 100%99% 94% 91% 82% 74% 55% 39% 21% 14% 4%

TABLE I11

OF M IN THE OPTIMAL LEARNING RULE AVERAGE NUMBER OF ITERATIONS AS A FUNCTION

M 1 3 5 7 9 11 13 15 17 19 21 23 25 1.0 1.4 1.9 2.0 2.0 2.1 2.2 2.6 3.1 3.7 4.4 6.5 12.4

The results of simulations have been reported in comparison to the outer-product rule, the SMT, and the LPMT, in terms of the ability of dealing with both weakly and strongly correlated patterns, fault correction of noisy inputs and the storage capacity. For these purposes, the probabilities of successful recalls as a function of the Hamming distance between the initial condition and the corresponding training pattern are recorded, and the percentages for storing all training patterns produced randomly with respect to their number are given, together with the average number of iterations. From all these simulations, the optimal learning method shows great advantages over other approaches.

The simplicity of the proposed learning strategy leaves ample room for some improvements which can increase the global performance. One improvement involves assigning dif- ferent values of D, and D, to all training patterns. This may establish the different size of basin of attraction around each pattern: the training patterns sharing fewer common bits with others may have larger basins of attraction, whereas the training patterns sharing more common bits with others may have smaller basins of attraction. Another improvement involves increasing D, and D, in sequence and independently, instead of at the same time. This improvement is important to represent the respective degree of correlations of the X-layer and Y-layer, especially when N and P are quite different.

It has already been argued in [15] that the BAM and the HAM are not very different. With little modification, the learning algorithm can be applied to the HAM.

Although finding large values of D, and D, is helpful in reducing spurious states, it is still an open problem to minimize the number of spurious states. Our further work is to include the criterion (C3) in the design of the learning algorithm.

Page 10: Designing bidirectional associative memories with optimal stability

WANG ef al.: DESIGNING BIDIRECTIONAL ASSOCIATIVE MEMORIES 787

A B

(e) (0 Fig. 7. by the optimal learning method.

Recall results with respect to (a) R = 0.2; (b) R = 0.2; (c) R = 0.2; (d) R = 0.2 (e) R = 0.2; (0 R = 0.2 in the BAM trained

APPENDIX A APPENDIX C

Proof of Lemma Based on the value of X i , (8a) involves Proof of Theorem 2 Let H1 = H(Ak, x ) and H2 = two cases: H(Bk,Y), then

(1) X i = +1 and Cj W;jY, > 0; (2) X i = -1 and Cj WijY, < 0. In both case, there holds

xi = - A k i if i E 11 = { j ~ , j ~ , . . . , j ~ 1 } { Aki otherwise

otherwise

According to the recall equation (2), the next state X l is

This means that, when one starts from the initial state X&Y, there is no change of the state Xi . Using the same step, we can show from (8b) that there is no change of the state Yj.

Hence, X&Y is a stable state in the BAM. Q.E.D. = Sgn [ $ WijBkj - 2 WijBkj j € I 2

APPENDIX B

Proof of Theorem 1 Based on the theorem hypotheses, the (C.1)

following rank condition holds, for i = 1 , 2 , . . . , N , which can be satisfied if

Rank(2) = min(M(N + P ) , N P ) = M ( N + P ) (B.1)

The expansion matrix of 2 is U = [Z ,C] , which con- catenates the matrix 2 and the vector C. The rank of U satisfies

(C.2)

From the assumption (18b), we obtain the following two inequalities

Rank( 2) 5 Rank( V ) 5 M ( N + P ) (B.2) f f k i = 1 $ WijBkji Dx (C.3)

Combining (B.l) with (B.2), we get

and Rank(2) = Rank(U) = M ( N + P ) 03.3)

In fact, if M ( N + P ) < N P , equation (11) represents an (C.4)

where in the last step wm = max lWijl is used. Obviously, (c.2) holds, if

underdetermined system, and thus has an infinite number of solutions. If M ( N + P ) = N P , it has a unique solution.

Thus, the matrix-vector equation (11) has at least one solution V (or W). According to the corollary, all training patterns are stored in the BAM as stated. Q.E.D. Dx L 2HzWm (C.5)

Page 11: Designing bidirectional associative memories with optimal stability

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994 788

Percents of successful recalls

1 3 5 7 9 11 13 15 17 19 21 23 2 5 M

The number of training patterns (MI Fig. 8. Results of the storage capacity test. “0’: the outer-product rule; “A”: SMT method; “o”LP/MT rule; “X”: the optimal learning algorithm.

which is given by (18b). The same proof can be made for (18a). Thus, after one iteration, X & Y converges to Ak&Bk.

The stored pattern (Ak , Bk) is asymptotically stable with the basin of attraction lower bounded by H , and H,. Q.E.D.

Proof of Theorem 3 From the conditions in theorem 2, we obtain

P

(C.6)

This inequality associates the important values D, and W,, and then the upper limit of H ( B k , Y ) is

(C.7)

Q.E.D.

H ( B k , Y ) 5 D,/(2W,) I p/2

The same proof is applicable for the X-layer.

APPENDIX D

Proof of Theorem 4 Let W(*) = [Wij(*)] be an optimal connection matrix with the degree of attraction being

such that ski(*) 2 D, and &(*) 2 D,. It is assumed that, up to the time of t , the number of

cyT; < D, is N, ( T , z), and the number of Prj < D, is N,(T, j ) . Therefore, the total number of updating the connection matrix

W is equal to

Obviously, to show this theorem, we need only to prove that Ne is finite. If the initial weight Wij(0) = 0, the connection matrix Wij(t) at the time t equals

M

w i j ( t ) = qc{NZ(T,z) + N Y ( T > j ) } A T i B T j (D.4) r=l

Let us consider

i j

that represents the overlap between W ( t ) and W(*). Substituting (D.4) into (DS), we obtain the lower limit of

H ( t ) *

N Z ( T > i , w i j ( * ) B T j A T i

+ Ny(T,d Wij(*)ATiBTj

j

r j i

Page 12: Designing bidirectional associative memories with optimal stability

WANG et al.: DESIGNING BIDIRECTIONAL ASSOCIATIVE MEMORIES 789

Both the Schwartz inequality and (D. 1) provide the upper limit of H ( t ) ,

where G(t ) is defined as

i j

According to (24a), if ari < D,, the change of the weight Wij equals AWij = qA,iBrj for j = 1 , 2 , . . . , P. The corresponding change of G(t ) with respect to AWij is

AG = c { ( W i j +AWij)’- Wij’}

j

= q2P + 2 q a r i < q2P + 2 q D x (D.9)

The change of G(t ) caused by PTj < D, is equal to

AG = c { ( W i j +AWij)’ - W;}

= q2N + 2 q ~ 7 - j < q ’ ~ + 2 q ~ ,

z

(D.IO)

Based on (D.3), G(t ) is upper bounded by

r i

r j

5 ( q 2 N m a x + 2 q ~ m a x ) ~ e

where NmaX = max { N P ) and Dmax = max{D,, Dy}.

(D. 1 1)

Using (D.6), (D.7) and (D.ll), we obtain

Ne < D z ( q 2 N m a x + 2 q D m a x ) / [ q 0 m i n o , ( * ) ] 2 (D.12)

Therefore, the iterative procedure (24) converges within a Q.E.D.

Proof of Theorem 5 From (D.l l ) and (D.12), we get finite times of updating the connection matrix W .

< D x ( q N m a x + 2Dmax)/{Dminoz(*)} (D.13)

Then according to (23a), there holds

ox(^) = ~ z / & P 7 > D m i n o x ( * ) / ( q N m a x + 2Dmax) (D. 14)

The same proof is applied to (25b) by using (D.2) in place of (D.l) in (D.7) and using (23b) in place of (23a) in (D.14).

Q.E.D.

ACKNOWLEDGMENT

REFERENCES

[ l ] L. F. Abbott and T. B. Kepler, “Optimal learning in neural network memories, ”J. Phys. A: Math. Gen., vol. 22, L711-717, 1989.

[2] B. Bavarian, “Introduction to neural networks for intelligent control,” IEEE Control Systems Magazine, pp. 3-7, April, 1988.

[3] S. Z. Bow, Pattem Recognition and Image Preprocessing. Marcel Dekker, INC., 1992.

[4] J. Bruck and V. P. Roychowdhury, “On the number of spurious memories in the hopfield model,” IEEE Trans. Inform. Theory, vol. 36, no. 2, pp. 393-397, 1990.

[5] S. Diederich and M. Opper, “Learning of correlated patterns in spin- class networks by local learning rules,” Phys. Rev. Let., vol. 58, pp. 949-952, 1987.

[6] R. 0. Duda and P. E. Hart, “Pattern Classification and Scene Analysis.” New York: Wiley, 1973.

[7] G. Dunning, E. Marom, Y. Owechko and B. Soffer, “Optical Holo- graphic associative memory using a phase conjugate resonator,” Proc. of the SPZE, vol. 625, pp. 205-213, 1986.

[8] P. E. Gill, W. Murray and M. H. Wright, “Practical Optimization, Academic Press,” 1981.

[9] M. Griniasty and H. Gutfreund, “Learning and retrieval in attractor neural networks above saturation,” J. Phys. A: Math. Gen., vol. 24, pp. 715-734, 1991.

[lo] J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proc. Natl. Acad. Sci. USA, vol. 79, pp. 2554-2558, 1982.

[ l 11 B. Kosko, “Bidirectional associative memories,” IEEE Trans. Syst., Man., Cybern., vol. 18, no. 1, pp. 49-60, 1988.

1121 W. Krauth and M. Mezard, “Learning algorithms with optimal stabilility in neural networks,” J. Phys. A: Math. Gen., vol. 20, L745-752, 1987.

[13] A. N. Michel and J . A. Farrell, “Associative memories via artificial neural networks,” IEEE Control Systems Magazine, April, pp. 6-1 7, 1990.

1141 L. Personnaz, I. Guyon and G. Dreyfus, “Collective computational properties of neural networks: new learning mechanisms,” Phys. Rev. A, vol. 34, no. 5, pp. 4217-4228, 1986.

[ 151 P. K. Simpson, “Higher-ordered and intraconnected bidirectional asso- ciative memories,” IEEE Trans. Syst., Man, Cybem., vol. 20, no. 3, pp.

[16] P. K. Simpson, “Artificial Neural Systems: Foundations, Paradigms, Applications and Implementations.” Elmsford: Pergamon Press, 1990.

[ 171 G. V. Vanderplaats, “Numerical Optimization Techniques for Engineer- ing Design: With Applications,” New York, McGraw-Hill, 1984.

1181 A. R. Vazquez, R. D. Castro, A. Rueda, J. L. Huertas and E. S. Sinen- cio, “Nonlinear switched-capacitor neural networks for optimization problems,” IEEE Trans. Circuits and Sysr., 37, no. 3, pp. 38k397, 1990.

[19] Y. F. Wang, J. B. Cruz, Jr and J. H. Mulligan, Jr, “Two coding strategies for bidirectional associative memory,” IEEE Trans. Neural Networks, vol. 1, no. 1, pp. 81-92, 1990.

[20] Y. F. Wang, J. B. Cruz, Jr and J. H. Mulligan, Jr, “Guaranteed recall of all training pairs for bidirectional associative memory,” IEEE Trans. Neural Networks, vol. 2, no. 6 , pp. 559-567, 1991.

637-653, 1990.

Tao Wang was bom in Hangzhou, China, on November 14, 1967. He received the B.S. degree in Computer Science & Engineering from Zhejiang University, Hangzhou, China, in 1989, where he was directly admitted to pursue the M.S. degree without taking admission exams.

From 1989 to 1991, he worked as a research assis- tant at the Computer Vision Laboratory, Department of Computer Science & Engineering, Zhejiang Uni- versity. Since 1991, he has been a research assistant at the Artificial Intelligence Institute, Zhejiang Uni-

versity. He is now a Ph.D. candidate in Computer Science & Engineering. His research interests include computer and robot vision, artificial neural networks, signal and image processing, and pattern recognition.

The authors are to reviewers for their suggestions to improve this paper.

Page 13: Designing bidirectional associative memories with optimal stability

790 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS, VOL. 24, NO. 5, MAY 1994

Xiaoling Xing received the Ph.D. degree in Computer Science from University of Strasbourg I, France in 1987, and received DEA dInformatique (equivalent to M.A.Sc. in computer science) in 1984 from University of RENNES I, France. In 1982 he received the B.Sc. in computer science from Zhejiang University, China.

From 1988 to 1989 he was a Lecturer in the Department of Computer Science & Engineering, Zhejiang University, China. Since 1990, he has been an Associate Professor and a Director of Computer

Vision Laboratory, Department of Computer Science & Engineering, Zhejiang University. His main interests include image processing, compuetr vision, and neural networks.

Xinhua Zhuang graduated from the Peking Uni- versity, China, in 1963, after accomplishing an undergraduate and a graduate program in mathe- matics.

Before 1983, he served as a senior research engineer in The Research Institute for Computer Technology, Hangzhou, China. He was a visiting scientist of Electrical Engineering at the Virginia Polytechnic Institute and State University, Blacks- burg, VA, from 1983-1984, a visiting scientist of Electrical and Computer Engineering at the Univer-

sity of Michgan, Ann Arbor, MI, granted by the machine Vision International, Ann Arbor, MI, from 1984-1985. He was selected as a consultant at Advisory Group for Aerospace Research and Development, NATO, in 1985. He was a visiting research professor of the Coordianted Science Laboratory and a visiting professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign, from 1985-1986, a full professor of Computer Science and Engineering, Optical Instrument Engineering, and the director of The Center For Intelligent Systems Research at the Zhejiang University, Hangzhou, China, from 1987-1989. He was a visiting professor of Electrical Engineering at the University of Washington at Seattle in 1989. Currently, he is an associate professor of Electrical and Computer Engineering at the University of Missouri-Columbia.

His professional interests lie in artificial intelligence, pattern recognition, computer and robot vision, signal and image processing, computer architec- ture, neural net computing, and applied mathematics. He is a contributor of three books, all published in the United States.