Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 1/22

Chapter 5Unsupervised learning



Introduction

• Unsupervised learning

– Training samples contain only input patterns

• No desired output is given (teacher-less)

– Learn to form classes/clusters of sample

patterns according to similarities among them• Patterns in a cluster would have similar features

• No prior knowledge as what features are important

for classification, and how many classes are there.



Introduction

• NN models to be covered

– Competitive networks and competitive learning• Winner-takes-all (WTA)

• Maxnet

• Hemming net

– Counterpropagation nets

– Adaptive Resonance Theory – Self-organizing map (SOM)

• Applications – Clustering

– Vector quantization – Feature extraction

– Dimensionality reduction

– optimization



NN Based on Competition

• Competition is important for NN – Competition between neurons has been observed in biological nerve systems

– Competition is important in solving many problems

• To classify an input patterninto one of the m classes – idea case: one class node

has output 1, all other 0 ; – often more than one class

nodes have non-zero output

– If these class nodes compete with each other, maybe only

one will win eventually and all others lose (winner-takes-

all). The winner represents the computed classification of

the input

C_m

C_1

x_n

x_1

INPUT CLASSIFICATION



• Winner-takes-all (WTA):

– Among all competing nodes, only one will win and all

others will lose

– We mainly deal with single winner WTA, but multiple

winners WTA are possible (and useful in some

applications)

– Easiest way to realize WTA: have an external, centralarbitrator (a program) to decide the winner by

comparing the current outputs of the competitors (break

the tie arbitrarily)

– This is biologically unsound (no such external arbitrator exists in biological nerve system).



• Ways to realize competition in NN

– Lateral inhibition (Maxnet, Mexican hat)

output of each node feedsto others through inhibitory

connections (with negative weights)

– Resource competition

• output of node k is distributed to

node i and j proportional to wik

and w jk , as well as xi and x j

• self decay

• biologically sound

x j x i 0, jiij ww

x i x j

x k

jk wik w

0ii

w 0 jj

w



Fixed-weight Competitive Nets

– Notes:

• Competition: iterative process until the net stabilizes (at most

one node with positive activation)• where m is the # of competitors

• too small: takes too long to converge

• too big: may suppress the entire network (no winner)

• Maxnet – Lateral inhibition between competitors

otherwise 0

0if )(

:functionnode

otherwise if :weights

x x x f

jiw ji

,/10 m



Fixed-weight Competitive Nets

• Exampleθ = 1, ε = 1/5 = 0.2

x(0) = (0.5 0.9 1 0.9 0.9 ) initial inputx(1) = (0 0.24 0.36 0.24 0.24 )x(2) = (0 0.072 0.216 0.072 0.072)x(3) = (0 0 0.1728 0 0 )x(4) = (0 0 0.1728 0 0 ) = x(3)stabilized



Mexican Hat

• Architecture: For a given node,

– close neighbors: cooperative (mutually excitatory , w > 0) – farther away neighbors: competitive (mutually

inhibitory,w < 0)

– too far away neighbors: irrelevant (w = 0)

• Need a definition of distance (neighborhood): – one dimensional: ordering by index (1,2,…n)

– two dimensional: lattice



)0(),(distanceif

)0(),(distanceif

)0(),(distanceif

weights

33

122

11

ck jic

cck jic

ck jic

wij

:functionramp

maxmax

max000

)(

functionactivation

x i f

x i f x x i f

x f



• Equilibrium:

– negative input = positive input for all nodes

– winner has the highest activation;

– its cooperative neighbors also have positive activation;

– its competitive neighbors have negative (or zero)

activations.

)0.0,39.0,14.1,66.1,14.1,39.0,0.0()2(

)9.0,38.0,06.1,16.1,06.1,38.0,0.0()1(

)0.0,5.0,8.0,0.1,8.0,5.0,0.0()0(

:example

x

x

x



Hamming Network

• Hamming distance of two vectors, of dimension n,

– Number of bits in disagreement.

– In bipolar:

y x and

n y x

y x

n y xnn y xnad

n y xana y x

and

y xd

y xad a y x y x

T

T T

T

T

i ii

T

and bydetermied

becanand betweendistance(negative)

5.0)(5.0)(5.0

)(5.02

distancehamming

andindifferring bitsof number is

andinagreementin bitsof number is:where



Hamming Network

• Hamming network: net computes – d between an input

i and each of the P vectors i1 ,…, i P of dimension n

– n input nodes, P output nodes, one for each of P stored

vector i p whose output = – d (i, i p)

– Weights and bias:

– Output of the net:

n

n

i

iW

T

P

T

2

1,

2

1 1

k T k k

T P

T

iiniio

nii

niiWio

and betweendistancenegativetheis)(5.0where

,2

1 1



• Example:

– Three stored vectors:

– Input vector:

– Distance: (4, 3, 2)

– Output vector

)1,1,1,1,1(

)1,1,11,1(

)1,1,1,1,1(

3

2

1

i

i

i

)1,1,1,1,1( i

2]5)11111[(5.0

]5)1,1,1,1,1)(1,1,1,1,1[(5.0

3]5)11111[(5.0

]5)1,1,1,1,1)(1,1,11,1[(5.0

4]5)11111[(5.0

]5)1,1,1,1,1)(1,1,1,1,1[(5.0

3

2

1

o

o

o

– If we what the vector with smallest distance to I to win, put a Maxnet on top of the Hamming net (for WTA)

• We have a associate memory: input pattern recalls thestored vector that is closest to it (more on AM later)



Simple Competitive Learning

• Unsupervised learning

• Goal: – Learn to form classes/clusters of examplers/sample patterns

according to similarities of these exampers.

– Patterns in a cluster would have similar features

– No prior knowledge as what features are important for

classification, and how many classes are there.• Architecture:

– Output nodes:

Y_1,…….Y_m,

representing the m classes – They are competitors

(WTA realized either by

an external procedure or

by lateral inhibition as in Maxnet)



• Training:

– Train the network such that the weight vector w j associatedwith jth output node becomes the representative vector of a

class of input patterns. – Initially all weights are randomly assigned

– Two phase unsupervised learning

• competing phase :

– apply an input vector randomly chosen from sample set. – compute output for all output nodes:

– determine the winner among all output nodes (winner isnot given in training samples so this is unsupervised)

• rewarding phase :

– the winner is reworded by updating its weights to becloser to (weights associated with all other output nodesare not updated: kind of WTA)

• repeat the two phases many times (and gradually reduce

the learning rate) until all weights are stabilized.

* j jl j wio

l i

l i* j

w



• Weight update:

– Method 1: Method 2

In each method, is moved closer to il

– Normalize the weight vector to unit length after it isupdated

– Sample input vectors are also normalized

– Distance

)( jl j wiw l j iw

j j j www

j j j www /

w j

i l

i l – w j

η (i l - w j )

w j + η(i l - w j )

jw

i l

w j w j + ηi l

ηi l

i l + w j

l l l iii /

i i jil jl jl wiwiwi 2

,,2

)(



• is moving to the center of a cluster of sample vectors after

repeated weight updates

– Node j wins for three training

samples: i1 , i2 and i3

– Initial weight vector w j(0)

– After successively trained

by i1 , i2 and i3 ,the weight vector

changes to w j(1),

w j(2), and w j(3),

jw

i2

i1

i3

w j(0)

w j(1)

w j(2)

w j(3)



• A simple example of competitive learning (pp. 168-170)

– 6 vectors of dimension 3 in 3 classes (6 input nodes, 3 output nodes)

– Weight matrices:

Examples

Node A: for class {i2, i4, i5}

Node B: for class {i3}

Node C: for class {i1, i6 }



Comments

1. Ideally, when learning stops, each is close to the

centroid of a group/cluster of sample input vectors.2. To stabilize , the learning rate may be reduced slowly

toward zero during learning, e.g.,

3. # of output nodes:

– too few: several clusters may be combined into one class

– too many: over classification

– ART model (later) allows dynamic add/remove outputnodes

4. Initial :

– learning results depend on initial weights (node positions) – training samples known to be in distinct classes, provided

such info is available

– random (bad choices may cause anomaly)

5. Results also depend on sequence of sample presentation

jw

jw

jw

)()1( t t



Example

will always win no matter the sample is from which class

is stuck and will not participate

in learning

unstuck:

let output nodes have some conscience

temporarily shot off nodes which have had very highwinning rate (hard to determine what rate should be

considered as “very high”)

2w

1w

w 1

w 2



Example

Results depend on the sequence

of sample presentation

w 1

w 2

Solution:

Initialize w j

to randomlyselected input vector il thatare far away from each other

w 1

w 2

Documents

Neural Network Ch5 1