Upload
hanguyet
View
225
Download
0
Embed Size (px)
Citation preview
Hopfield networks
Lluıs A. [email protected]
Soft Computing Research Group
Dept. de Llenguatges i Sistemes Informatics (Software department)
Universitat Politecnica de Catalunya
2010-2011
Hopfield networks
Introduction
Content-addressable autoassociative memory
Deals with incomplete/erroneous keys: “Julia xxxxazar: Rxxxela”H
ENERGYFUNCTION
Basins of attraction
Attractors (memories)
A mechanical system tends to minimum-energy states (e.g. a pole)
Symmetric Hopfield networks have an analogous behaviour:
low-energy states ⇔ attractors ⇔ memories
Hopfield networks
Introduction
A Hopfield network is a single-
layer recurrent neural network,
where the only layer has n neurons
(as many as the input dimension)
Hopfield networks
Introduction
The data sample is a list of keys (aka patterns) L = {ξi | 1 ≤ i ≤ l}, with
ξi ∈ {0,1}n that we want to associate with themselves.
What is the meaning of autoassociativity? when a new pattern Ψ is shown to
the network, the output is the key in L closest (in Hamming distance) to Ψ.
Evidently, this can be achieved by direct programming; yet in hardware
it is quite complex (and the Hopfield net does it quite distinctly)
We therefore set up two goals:
1. Content-addressable autoassociative memory (not easy)
2. Tolerance to small errors, that will be corrected (rather hard)
Hopfield networks
Intuitive analysis
The state of the network is the vector of activations e of the n
neurons. The state space is {0,1}n:
ξ1
ξ2
ξ3
.
.
.
001
000010
100
111
011
101
110
When the network evolves, it “traverses” the hypercube
Hopfield networks
Formal analysis (I)
For convenience, we change from {0,1} to {−1,+1}, by s = 2e − 1.
Dynamics of a neuron:
si := sgn
n∑
j=1
wijsj − θi
, with sgn(z) =
{
+1 : z ≥ 0−1 : z < 0
Let us simplify the analysis by using θi = 0, and uncorrelated keys ξi
(from a uniform distribution).
Hopfield networks
Formal analysis (II)
How are neurons chosen to update their activations?
1. Synchronous: all at once
2. Asynchronous: there is a sequential order or a fair probability dis-tribution
The memories do not change; the end attractor (the recovered me-mory) does
We will assume asynchronous updating
When does the process stop?
−→ When there are no more changes in the network state (we are ina stable state)
Hopfield networks
Formal analysis (III)
PROCEDURE:
1. Formula for setting the weights wij to store the keys ξi ∈ L
2. Sanity check that the keys ξi ∈ L are stable (they are autoassociated
with themselves)
3. Check that small perturbations of the keys ξi ∈ L are corrected (they
are associated with the closest pattern in L)
Hopfield networks
Formal analysis (IV)
Case l = 1 (only one pattern ξ):
Present ξ to the net: si := ξi, 1 ≤ i ≤ n
Stability:
si := sgn
n∑
j=1
wijsj
= sgn
n∑
j=1
wijξj
= ... should be ... = ξi
One way to get this is: wij = αξiξj, α > 0 ∈ R (called Hebbian learning)
sgn
n∑
j=1
wijξj
= sgn
n∑
j=1
αξiξjξj
= sgn
n∑
j=1
αξi
= sgn (αnξi) = sgn(ξi) = ξi
Hopfield networks
Formal analysis (V)
Take α = 1/n. This has a normalizing effect: ‖wi‖2 = 1/√
n
Therefore Wn×n = (wij) = 1nξ × ξ, i.e. wij = 1
nξiξj
If the perturbation consists in b flipped bits, where b ≤ (n − 1) ÷ 2,
then the sign of∑n
j=1 wijsj does not change!
=⇒ if the number of correct bits exceeds that of incorrect bits, the
network is able to correct the error bits
Hopfield networks
Formal analysis (VI)
Case l ≥ 1:
Superposition: wij = 1n
l∑
k=1ξkiξkj and Wn×n = 1
n
l∑
k=1ξk × ξk
Defining En×l = [ξ1, . . . , ξl], we get Wn×n = 1nEEt
Note Wn×n is a symmetric matrix
Hopfield networks
Formal analysis (VII)
Stability of a pattern ξv ∈ L:
Initialize the net to si := ξvi, 1 ≤ i ≤ n
Stability: si := sgn
(n∑
j=1
wijξvj
)
= sgn(hvi)
hvi =n∑
j=1
wijξvj = 1n
n∑
j=1
l∑
k=1
ξkiξkjξvj = 1n
n∑
j=1
(l∑
k=1,k 6=v
ξkiξkjξvj
)
+ ξvi ξvjξvj︸ ︷︷ ︸
+1
= ξvi +1n
n∑
j=1
l∑
k=1,k 6=v
ξkiξkjξvj = ξvi + crosstalk(v, i)
If |crosstalk(v, i)| < 1 then sgn(hvi) = sgn(ξvi + crosstalk(v, i)) = sgn(ξvi) = ξvi
Hopfield networks
Formal analysis (and VIII)
Analogously to the case l = 1, if the number of correct bits of a
pattern ξv ∈ L exceeds that of incorrect bits, the network is able to
correct the bits in error
(the pattern ξv ∈ L is indeed a memory)
When is |crosstalk(v, i)| < 1? If l << n nothing is wrong, but as we
have l closer to n, the crosstalk term is quite strong compared to ξv:
there is no stability
The million euro question: given a network of n neurons, what is the
maximum value for l? What is the capacity of the network?
Hopfield networks
Capacity analysis
For random uncorrelated patterns (for convenience):
lmax =
{ n2ln(n)+lnln(n)
: perfect recall
≈ 0,138n : small errors (≈ 1,6%)
Realistic patterns will not be random, though some coding can make
them more so
An alternative setting is to have negatively correlated patterns:
ξv · ξu ≤ 0, ∀v 6= u
This condition is equivalent to stating that the number of different bits is at least
the number of equal bits
Hopfield networks
Spurious states
When some pattern ξ is stored, the pattern −ξ is also stored. In
consequence, all initial configurations of ξ where the majority of bits
are incorrect will end up in −ξ!
−→ Hardly surprising: from the point of view of −ξ, there are more correct bits than
incorrect bits
The network can be made more stable (less spurious states) if we set
wii = 0 for all i
Wn×n = 1n
l∑
k=1ξk × ξk − l
nIn×n = 1n(EEt − lIn×n)
Hopfield networks
Example
ξ1 = (+1 − 1 + 1), ξ2 = (−1 + 1 − 1), E3×2 =
+1 −1−1 +1+1 −1
+1−1+1
(+1 − 1 + 1) =
+1 −1 +1−1 +1 −1+1 −1 +1
−1+1−1
(−1 + 1 − 1) =
+1 −1 +1−1 +1 −1+1 −1 +1
−→ 13
0 −2 +2−2 0 −2+2 −2 0
(+1 + 1 + 1)(−1 − 1 + 1)(+1 − 1 − 1)
−→ (+1 − 1 + 1),
(+1 + 1 − 1)(−1 − 1 − 1)(−1 + 1 + 1)
−→ (−1 + 1 − 1)
The network is able to correct 1 erroneous bit
Hopfield networks
Practical applications (I)4. Otros paradigmas Conexionistas 2. Memorias asociativas: 20
Convergencia y capacidad de la red de Hopfield
ENERGIA DE LA RED O FUNCION DE LYAPUNOV: para
PROPIEDAD 1:
PROPIEDAD 2: ¡acotada!
PROPIEDAD 3:
CONVERGENCIA: La red de Hopfield converge en un numero finito de iteraciones.
CAPACIDAD:
Redes Neuronales Artificiales . 2001-2002 Facultat d’Informatica – UPV: 18 de diciembre de 2001
4. Otros paradigmas Conexionistas 2. Memorias asociativas: 21
Ejemplo del comportamiento de una red de Hopfield
Una red de 120 nodos (12x10)
CONJUNTO DE OCHO PATRONES
SECUENCIA DE SALIDA PARA LA ENTRADA "3" DISTORSIONADA
0 1 2 3 4 5 6 7
Redes Neuronales Artificiales . 2001-2002 Facultat d’Informatica – UPV: 18 de diciembre de 2001
4. Otros paradigmas Conexionistas 2. Memorias asociativas: 22
Memorias heteroasociativas: memoria bidireccional asociativa
Capa Y
Capa Xw ij
1 2 m
1 2 n
APRENDIZAJE DE LOS PESOS:Dado con ,
para y
Redes Neuronales Artificiales . 2001-2002 Facultat d’Informatica – UPV: 18 de diciembre de 2001
4. Otros paradigmas Conexionistas 2. Memorias asociativas: 23
Funcionamiento sıncrono de la BAM
Entrada:
Inicializacion: , y ,
Iteracion: Propagacion de a
Propagacion de a
Convergencia: No se produzcan cambios en ni en :
Salida:
Redes Neuronales Artificiales . 2001-2002 Facultat d’Informatica – UPV: 18 de diciembre de 2001
A Hopfield network that stores and recovers several digits
Hopfield networks
Practical applications (II)
Ex: pattern-recognition– Handwritting recognition
– Reconeixement d’objectius militars
Una xarxa neu-ronal que haaprès a memo-ritzar formes otrossos de for-mes dels avionsserbis, desco-breix un aviómilitar amagatsota un avió civil.
A Hopfield network that stores fighter aircraft shapes is able to reconstruct them. The
network operates on the video images supplied by a camera on board a USAF bomber
Hopfield networks
The energy function (I)
If the weights are symmetric, there exists a so-called energy function:
H(s) = −n∑
i=1
n∑
j>i
wijsisj
which is a function of the state of the system.
The central property of H is that it always decreases (or remains
constant) as the system evolves
Therefore the attractors (the stored memories) have to be the local
minima of the energy function
Hopfield networks
The energy function (II)
Proposition 1:
A change of state in one neuron i yields a change ∆H < 0
Let hi =n∑
j=1wijsj. If neuron i changes is because si 6= sgn(hi)
Therefore ∆H = −(−si)hi − (−sihi) = 2sihi < 0
Proposition 2:
H(s) ≥ −n∑
i=1
n∑
j=1|wij| (the energy function is lower-bounded)
Hopfield networks
The energy function (III)
The energy function H is also useful to derive the proposed weights
Consider the case of only one pattern ξ: we want H to be minimum when the correlationbetween the current state s and ξ is maximum and equal to 〈ξ, s〉2 = 〈ξ, ξ〉2 = ‖ξ‖2 = n
Therefore we choose H(s) = −K 1n〈ξ, s〉2, where K > 0 is a suitable constant
For the many-pattern case, we can try to make each of the ξi into local minima of H:
H(s) = −K1
n
n∑
v=1
〈ξv, s〉2 = −K1
n
n∑
v=1
(n∑
i=1
ξvisi
)2
= −K1
n
n∑
v=1
(n∑
i=1
ξvisi
)
n∑
j=1
ξvjsj
= −K
n∑
i=1
n∑
j=1
(
1
n
n∑
v=1
ξviξvj
)
sisj = −K
n∑
i=1
n∑
j=1
wijsisj = −1
2
n∑
i=1
n∑
j>i
wijsisj
Hopfield networks
The energy function (and IV)
Exercises:
1. In the previous example,
a) Check that H(s) = 23(s1s2 − s2s3 + s1s3)
b) Show that H decreases as the network evolves towards the stored
patterns
2. Prove Proposition 2
Hopfield networks
Spurious states (continued)
We have not shown that the explicitly stored memories are the only
attractors of the system
We have seen that the reversed patterns −ξ are also stored (they have
the same energy than ξ)
There are stable mixture states, linear combinations of an odd num-
ber of patterns, e.g.:
ξ(mix)i = sgn
α∑
k=1
±ξvk,i
, 1 ≤ i ≤ n, α odd
Hopfield networks
Dynamics (continued)
Let us change the dynamics of a neuron by using an activation with
memory:
si(t + 1) =
+1 if hi(t + 1) > 0hi(t) if hi(t + 1) = 0−1 if hi(t + 1) < 0
where hi(t + 1) =n∑
j=1wijsj(t)
Hopfield networks
Application to optimization problems
Problem: find the optimum (or close to) of a scalar function subject
to a set of restrictions
e.g.: The traveling-salesman problem (TSP) or graph bipartitioning (GBP); oftenwe deal with NP-complete problems.
TSP(n) = tour n cities with the minimum total distance; TSP(60) ≈ 6,93 · 1079
possible valid tours. In general, TSP(n) = n!2n
.
GBP(n) = given a graph with an even number n of nodes, partition it in two
subgraphs of n/2 nodes each such that the number of crossing edges is minimum
Idea: setup the problem as a function to be minimized, setting the
weights so that the obtained function is the energy function of a
Hopfield network
Hopfield networks
Application to graph bipartitioning
Representation: n graph nodes = n neurons, with
si =
{
+1 : node i in left part−1 : node i in right part
Connectivity matrix: cij = 1 indicates a connection between nodes i and j.
Hopfield networks
Application to graph bipartitioning
H(S) = −n∑
i=1
n∑
j>i
cijSiSj
︸ ︷︷ ︸
connections between parts
+ α
n∑
i=1
Si
2
︸ ︷︷ ︸
bipartition
=⇒ H(S) = −n∑
i=1
n∑
j>i
(cij − α)SiSj ≡ −n∑
i=1
n∑
j>i
wijSiSj +n∑
i=1
θiSi
=⇒ wij = cij − α, θi = 0
(general idea: wij is the coefficient of SiSj in the energy function)
Hopfield networks
Application to the traveling-salesman problem
Representation: 1 city −→ n neurons, coding for position;
hence n cities = n2 neurons.
Sn×n = (Sij) (network states); Dn×n (distances between cities, given)
with Sij = +1 if the city i is visited in position j and −1 otherwise.
Restrictions on S:
1. A city cannot be visited more than oncen∑
i=1
n∑
j=1
n∑
k>j
SijSik = 0
−→ every row i must have only one +1: R1(S)
2. A city can only be visited in sequence (no ubiquity gift
−→ every column j must have only one +1: R1(St)
3. Every city must be visited
(n∑
i=1
n∑
j=1
Sij − n
)2
= 0
−→ there must be n +1s overall: R2(S)
Hopfield networks
Application to the traveling-salesman problem
4. The tour has to be the shortest possiblen∑
k=1
n∑
i=1
n∑
j>i
DijSik(Sj,k−1 + Sj,k+1)
Total sum of distances in the tour: R3(S, D)
H(S) = −n∑
i=1
n∑
j=1
n∑
k>i
n∑
l>j
wij,klSijSkl+
n∑
i=1
n∑
j>i
θijSij ≡ αR1(S)+βR1(St)+γR2(S)+εR3(S, D)
=⇒ wij,kl = −αδik(1 − δjl) − βδjl(1 − δik) − γ − εDik(δj,l+1 + δj,l−1) and θij = −γn
This is a continuous Hopfield network: Sij ∈ (−1,1) since Sij := tanhχ(hij − θij)
Problem: set up good values for α, β, γ, ε > 0, χ > 0
Hopfield networks
Application to the traveling-salesman problem
Example for n = 5: 120 valid tours (12 different), 25 neurons
225 potential attractors; actual num-ber of them: 1,740 (among them, thevalid 120). ⇒ 1,740 - 120 = 1,620spurious attractors (invalid tours)
The valid tours correspond to thestates of lower energy (correct de-sign)
But ... we have a network with 30times more invalid tours than validones: a random initialization will endup in an invalid tour with probability0.966
Moreover, we aim for the best tour!A possible workaround is to use stochastic unitswith simulated annealing