AI & NN Notes Chapter 9 Feedback Neural Networks

AI & NN Notes

Chapter 9Feedback Neural Networks

§9.1 Basic Concepts

Attractor: a state toward which the system evolves in time starting from certain initial conditions.

Basin of attraction: the set of initial conditions which initiates the evolution terminating in the attractor.

Fixed point: if an attractor is in a form of a unique point in state space.

Limit Cycle: if an attractor consists of a periodic sequence of states.

Hopfield Network and its Basic Assumptions

1

2

n

v

v

v

1

2

n

T

T

T

1

2

n

i

i

i

n

2

1w w

ww

21 12

n1w

wn2

1n

2n

1. 1 layer, n neurons2. T -- Threshold of neuron i3. w -- weight from j to i4. v -- output of neuron j

5. i -- external input to the i-th neuron

ij

i

j

The total input of the i-th neuron is

net = i J=1

ji

nw v + i - T = W V + i - T , for I=2, …, n

ij j i i it

i i

where

W =

w

w

wi

i1

i2

in

..

. V = v

vv1

2

n

The complete matrix description of the linear portionof the system shown in the figure is given by

net = WV + i - twhere

net = net

net

net

i =

i

i

i1

2

n

1

2

n

are vectors containing activation, external inputto each neuron and threshold vector, respectively.

t =

TT

T

1

n

2

W is an nn matrix containing network weights:

W =

w

ww

=

wwwwwwwwwwww

00

001

2

nt

t

t12 13 1n

21 23 2n

31 32 3n

n1 n2 n3

w = w ij ji

w = 0iiand

§9.2 Discrete-Time Hopfield Network

Assuming that the neuron’s activation function is sgn,the transition rule of the i-th neuron would be

-1, if net < 0 (inhibited state)

+1, if net > 0 (excitatory state)v

If, for a given time, only a single neuron is allowed to update its output and only one entry in vector v is allowed to change, this is an asynchronous operation,under which each element of the output vector isupdated separately while taking into account the mostrecent values for the elements that have already beenupdated and remain stable.

(*)

Based on (*), the update rule of a discrete-time recurrent network, for one value of i at a time, becomes

v = sgn(w v + i - T ) for random i, i=1, 2, …, n and k=0, 1, 2, ...

i

K+1

it k

i i

where k denotes the index of recursive update. This is referred as asynchronous stochastic recursion of theHopfield network. This update process will continue until all n entries of v have been updated. The recursive computation continues until the outputnode vector remains unchanged with further iterations. Similarly, for synchronous operation, we have

i

K+1

v = T[Wv + i - t], for all neurons, k=0, 1, ...K+1 k

where all neurons change their output simultaneously.

Geometrical ExplanationThe output vector v is one of the vertices of the n-dimensional cube [-1, 1] in E space. The vector moves during recursions from vertex to vertex, untilit is stabilizes in one of the 2 vertices available.

The movement is from a vertex to an adjacent vertexsince the asynchronous update mode allows for a single-component update of an n-tuple vector at a time.

The final position of v as k, is determined by weights, thresholds, inputs, and the initial vector v as well as the order of transitions.

n n

0

n

To evaluate the stability property of the dynamical system of interest, the computational energy functionis defined in n-dimensional output space v .

If the increments of a certain bounded positive-valuedcomputational energy function under the transitionrule are found to be non-positive, then the function can be called a Lyapunov function, and the systemwould be asymptotically stable.

The scalar-valued energy function for the discussedsystem is a quadratic form:

n

E = - 1/2 V WV - i V + t Vtt t

or E = - 1/2 w v v - i v + t v

i=1

n

j=1ji

n

ij i ji=1 i=1

n

i i

n

i i

The energy function in asynchronous mode. Assume that the output node I has been updated at the k-th instant so that v - v = v . Computing the energygradient vector:

ik+1 k

i

E = - 1/2 (W + W) v - i + t = - Wv - i + t vt t t t t

W = Wt

The energy increment becomes

E = ( E) v = (-W v - i + t ) v t t t ti i i i

This is because only the i-th output is updated.

Therefore we have

( v) = [0 … v … 0]it

This can be rewritten as

E = - ( w v + i - t ) v for j i,ii i jij

n

j=1or briefly

E = - net v i iNote that when net < 0, then v 0 when net > 0, then v 0 thus (net v ) is always non-negative. In other words,any corresponding energy changes E are non-positiveprovided that w = w .

i i

i i

i i

ij ji

Further we can show that the non-increasing energyfunction has a minimum.

Since W is indefinite because of its zero diagonal, then E has neither a minimum nor maximum in unconstrained output space. However, E is obviously bounded in n-dimensional space consisting of the 2 vertices of n-dimensional cube, Thus, E has to reach its minimum finally under the update algorithm.

n

Example of recursive asynchronous update of computed digit 4:

(a) (b) (c) (d) (e)

where (a) k=0, (b) k=1, (c) k=2, (d) k=3, (e) k=4.The initial map is a destroyed digit 4 with 20% of thepixels randomly reversed. For k>4, no changes are produced at the network output since the system arrived at one of its stable states.

§9.3 Gradient-Type Hopfield NetworkConsider the continuous-time single-layer feedbacknetworks. One of its model is given below.

c c c c

gggg

1

1

2

2

3

3

n

n

1 2 3 nvv v v

wn1

1nw

w32

i iii1 2 3 n

uu

uu1

2

3

n

1 2 3 nv v v v

It consists of n neurons, each mapping its input u intothe output v through the activation function f(u ).Conductance w connects the output of the j-th neuronto the input of the i-th neuron. It is also assumed thatw = w and w = 0.

The KCL equation for the input node having potentialu can be obtained as

i

i i

ij

ij ji ii

i

i + w v - u ( w + g ) = C i j=1jj

n

ij j ij=1jj

n

ij i i

dudt

i

Defining G = w + g , C = diag[C , …, C ],

G = diag[G , …, G ]

j=1jj

n

ij i 1 n

1 n

Then we have

C du(t)

dt = Wv(t) - Gu(t) + i and

v(t) = f(u(t))

It can be shown that

dEdt

= - (c )dudt

t dvdt

It follows thus that the change of E, in time, are in the general direction toward lower values of he energy function in v space -- the stability condition.n

< 0

§9.4 Feedback Networks for Computational Applications

In principle, any optimization problems whose objectivefunction can be expressed in the form of energy functioncan be solved by feedback networks convergence.

Take the Traveling Salesman Problem as an example.

Given a set of n cities A, B, C, … with pairwise distancesd , d , … try to find a close tour which visits each city once, returns to the starting city and has a minimum total path length.

AB AC

This is an NP-complete problem.

To map this problem onto the computational network,we require a representation scheme, in which the final location of any individual city is specified by the outputstates of a set of n neurons.

E.g., n=5, the neuronal state (5 neurons) shown below would represent a tour:

2

CityOrder

ABCDE

1 2 3 4 5

0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0

In the nn square representation, this means that in anoutput state describing a valid tour there can be onlyone “1” in each row and each column, all other entriesbeing “0”.

In this scheme, the n symbols v will be described bydouble indicies, v : x stands for city name, j for the position of that city in tour.

2i

To enable the N neurons to compute a solution to theproblem, the network must be described by an energy function in which the lowest energy state correspondsto the best path of the tour.

xj

An appropriate form for this function can be found by considering the high gain limit, in which all final normal output will be 0 or 1. The space over which theenergy function is minimized in this limit is the 2 corners of the N-dimensional hypercube defined by v =0 or 1.

N

i

Consider those corners of this space which are the localminima stable states) of the energy function

E = 1

A2

v v + B2 v v +

C2 ( v - n)

x i ji xi xj xi yx xi yi x i

2xi

where A, B, C are positive constants, v {0, 1}.i-- The first term is 0 iff each city row x contains no more than one “1”.

-- The second term is 0 iff each position in tour column i contains no more than one “1”.-- The third term is 0 iff there are n cities of “1” in the entire matrix.

Thus, this energy function evaluated on the domain of the corners of the hypercube has minima with E = 0 for all stable matrices with one “1” in each row and column. All other states have higher energy.

1

Hence, including these terms in an energy function describing a TSP network strongly favors stable states which are at least valid tour in the TSP problem.

Another requirement, that E favors valid tours representing shout path, is fulfilled by adding one additional term to E . This term contains informationabout the length of the path corresponding to a giventour, and its form can be

1

E = D22 d v (v + v )

x yx i xy xi y,i+1 y,i-1

where subscripts are defined modulo n, in order toexpress easily “end effects” such as the fact that the n-th city on a tour is adjacent in the tour to both city (n-1) and city 1, i.e., v = v . Within the domain ofstates which characterizes a valid tour, E is numeric-ally equal to the length of the path for that tour.

y,n+j y,j

2

If A, B, and C are sufficiently large, all the really low energy states of a network described by this function will have the form of a valid tour. The total energy ofthat state will be the length of the tour, and the stateswith the shortest path will be the lowest energy states.

Using the row/column neuron labeling scheme described above for each of the two indicies, the implicitly defined connection matrix is

T = - A () “inhibitory connections within each row”

- B“inhibitory connections with each column”

- C “global inhibition”

- D “data term”

xy ij

ij xy

xy j,i+1 j,i-1

xi,yj

The external input are I = 2C “excitation bias”xi n

The “data term” contribution, with D, to T is theinput which describes which TSP problem (I.e., where the cities actually are) is to be solved.

xi,yj

The term with A, B, C provide the general constraintsrequired for any TSP problem.

The “data term” contribution controls which one of then! Set of these properly constrained final states is actually chosen as the “best” path.

The problem formulated as shown below has beensolved numerically for the continuous activation function with =50, A=B=D=250, and C=100, for 10 n 30. Quite satisfactory solution has been found.

ABCDEFGHIJ

1 2 3 4 5 6 7 8 9 10

Path = D-H-I-F-G-E-A-J-C-B-D

§9.5 Associative Memory

By properly selecting the weights, it is possible to make the stable states of the network just be the ones, M, we want to store.

Under this condition, the network’s state should not change if the network is initially in the state M; whereas if not in M, it is expected that the network’sstable state should be the ones, in M, closest to theinitial state (in the sense of Hamming distance).

There are two categories of AM:1) Auto-AM: If input x’=x +v, where x {x , …, x }, then output y=x .

x +v x

2) Hetero-AM:If x’=x +v, where x y , …, x y stored then output y=y .

x +v

x y

y

One of the tasks is how to find the suitable weights such that the network perform a function of AM?

The most frequently used rule for this purpose is the Outer Product Rule. Assume that-- Consider an n-neuron network;-- Each activity state x {-1, 1};-- Hebbean rule is observed: w = x x , > 0.

i

ij

i j

The outer product rule is as follows:For given vectors M={U , …, U }, where U =(x , …,x )write

1 m kt k k

1 n

W = (U U - I) = m

k=1 k kt

0 x x x x x x 0 x x x x x x 0

m

k=1

1 1

1

1

2

2

2

n

n n

n2

k k k k

k k k k

k k k k

=

x x

x x

0

0k=1

k=1

m

m

k k

k k

n 1

1 n

and this can be implemented by following procedure: (1) Set W = [0] (2) For k=1 to m, input U , do w = w + x x for all connected pair (i, j)Check to see if it is reasonable: 1) Suppose that U , …, U are orthogonal and m<n

kk

kij ij i j

1 m

WU = (U U - I )U + (U U - I )U 1 1 1t

1 k=2

m

k k

t

1

= U U U - IU + (U U U - IU )k kk=2

m

1 1 1 1 1 1t t

= U n - U + ( -IU ) = U (n-1) - (m-1)U = (n-m) U1 1 1 1 1 1k=2

m

Hence Sgn(WU ) = Sgn[(n-m)U ] = U 1 1 1

i.e., U is exactly a stable state of the network, and Wthus determined is reasonable.

Example Given n=3, m=3 U =(1 1 -1), U =(-1 1 1), U =(1, -1 1)Thus

1 2 3t t t

U U -I = U U -I =

U U -I = W = (U U -I) =

1 1t 0 1 -1

1 0 -1-1 -1 0

2 2t

3 3

t

0 -1 -1-1 0 1-1 1 0

0 -1 1-1 0 -11 -1 0 k k

t

k=1

3 0 -1 -1-1 0 -1-1 -1 0

WU = 1

00-2

Sgn(WU ) = 11-1

= U1 1

Similarly

WU = 00

-2Sgn(WU ) = 22 2

-111

= U

WU = 0-20

Sgn(WU ) = 1-11

= U33 3

Clearly, U , U , and U are stable memories. The structure of the network is as below:

u

u u3

2

1

-1-1-1

Applications:

a) Classification Given input U = (1 1 1), whether U {U , U , U }?x x 1 2 3

WU =x -2-2

-2Sgn(WU ) =

-1

-1-1 Ux

Hence U does not belong to the set {U , U , U }.1x 2 3

b) Associative MemoryGiven a noisy input U = (0 1 -1) , what is U ?

t

xt

x

WU = x

01-1

Sgn(WU ) = x

11-1

= U .1 U U .x 1

In general, if n-dimensional vector U , …, U are orthogonal, n>m, then

1 m

WU = (U U - I)U + (U U - I)U kk k k kt t

i i

m

i=1ik

= U (U U - IU ) + (U U U - IU )k k k k k kt tm

i=1 i iik

= U n - U + (m-1)(-IU ) = (n-1)U - (m-1)U

= (n-m)U

kk k k k

k

Sgn(WU ) = Sgn[(n-m)U ] = U , k = 1, 2, …, mk k k

Hence {U } are stable states.k

Documents

AI & NN Notes Chapter 9 Feedback Neural Networks