22
Chapter 5 Unsupervised learning 

Neural Network Ch5 1

Embed Size (px)

Citation preview

Page 1: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 1/22

Chapter 5Unsupervised learning 

Page 2: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 2/22

Introduction

• Unsupervised learning

 – Training samples contain only input patterns

•  No desired output is given (teacher-less)

 – Learn to form classes/clusters of sample

 patterns according to similarities among them• Patterns in a cluster would have similar features

• No prior knowledge as what features are important

for classification, and how many classes are there.

Page 3: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 3/22

Introduction

• NN models to be covered

 – Competitive networks and competitive learning• Winner-takes-all (WTA)

• Maxnet

• Hemming net

 – Counterpropagation nets

 – Adaptive Resonance Theory – Self-organizing map (SOM)

• Applications – Clustering

 – Vector quantization – Feature extraction

 – Dimensionality reduction

 – optimization

Page 4: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 4/22

NN Based on Competition

• Competition is important for NN  – Competition between neurons has been observed in biological nerve systems

 – Competition is important in solving many problems

• To classify an input patterninto one of the m classes – idea case: one class node

has output 1, all other 0 ; – often more than one class

nodes have non-zero output

 – If these class nodes compete with each other, maybe only

one will win eventually and all others lose (winner-takes-

all). The winner represents the computed classification of 

the input

C_m 

C_1 

x_n 

x_1 

INPUT  CLASSIFICATION 

Page 5: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 5/22

• Winner-takes-all (WTA):

 –  Among all competing nodes, only one will win and all

others will lose

 –  We mainly deal with single winner WTA, but multiple

winners WTA are possible (and useful in some

applications)

 –  Easiest way to realize WTA: have an external, centralarbitrator (a program) to decide the winner by

comparing the current outputs of the competitors (break 

the tie arbitrarily)

 –  This is biologically unsound (no such external arbitrator exists in biological nerve system).

Page 6: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 6/22

• Ways to realize competition in NN

 – Lateral inhibition (Maxnet, Mexican hat)

output of each node feedsto others through inhibitory

connections (with negative weights)

 – Resource competition

• output of node k is distributed to

node i and j proportional to wik  

and w jk  , as well as xi and x j 

• self decay

•  biologically sound

x  j x i 0,   jiij ww

x i  x  j 

x k 

  jk wik w

0ii 

w  0 jj 

Page 7: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 7/22

 Fixed-weight Competitive Nets

 –  Notes:

• Competition: iterative process until the net stabilizes (at most

one node with positive activation)•   where m is the # of competitors

• too small: takes too long to converge

• too big: may suppress the entire network (no winner)

• Maxnet – Lateral inhibition between competitors

otherwise 0

0if  )(

 :functionnode

otherwise if  :weights

 x x x  f  

  jiw  ji

 ,/10 m 

 

Page 8: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 8/22

 Fixed-weight Competitive Nets

• Exampleθ = 1, ε = 1/5 = 0.2

x(0) = (0.5 0.9 1 0.9 0.9 ) initial inputx(1) = (0 0.24 0.36 0.24 0.24 )x(2) = (0 0.072 0.216 0.072 0.072)x(3) = (0 0 0.1728 0 0 )x(4) = (0 0 0.1728 0 0  ) = x(3)stabilized 

Page 9: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 9/22

Mexican Hat

• Architecture: For a given node,

 –  close neighbors: cooperative (mutually excitatory , w > 0) –  farther away neighbors: competitive (mutually

inhibitory,w < 0)

 –  too far away neighbors: irrelevant (w = 0)

•  Need a definition of distance (neighborhood): – one dimensional: ordering by index (1,2,…n) 

 –  two dimensional: lattice

Page 10: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 10/22

)0(),(distanceif 

)0(),(distanceif 

)0(),(distanceif 

 

weights

33

122

11

ck   jic

cck   jic

ck   jic

wij

:functionramp

maxmax

max000

)( 

functionactivation

x i f  

x i f  x x i f  

x f  

Page 11: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 11/22

• Equilibrium:

 –  negative input = positive input for all nodes

 –  winner has the highest activation;

 –  its cooperative neighbors also have positive activation;

 –  its competitive neighbors have negative (or zero)

activations.

)0.0,39.0,14.1,66.1,14.1,39.0,0.0()2( 

)9.0,38.0,06.1,16.1,06.1,38.0,0.0()1( 

)0.0,5.0,8.0,0.1,8.0,5.0,0.0()0( 

:example

Page 12: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 12/22

Hamming Network 

• Hamming distance of two vectors, of dimension n,

 –  Number of bits in disagreement.

 –  In bipolar:

y x and

n y x

 y x

n y xnn y xnad 

n y xana y x

and 

 y xd 

 y xad a y x y x

T T 

i ii

 and bydetermied

  becanand betweendistance(negative)

5.0)(5.0)(5.0

)(5.02

distancehamming

andindifferring bitsof number is

andinagreementin bitsof number is:where

Page 13: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 13/22

Hamming Network 

• Hamming network: net computes –  d  between an input

i and each of the P vectors i1 ,…, i P of dimension n 

 –  n input nodes, P output nodes, one for each of  P stored

vector i p whose output = –  d (i, i p)

 –  Weights and bias:

 –  Output of the net:

 

 

 

 

 

 

 

 

n

n

i

iW 

 P 

2

1,

2

1 1

k T k k 

T  P 

iiniio

nii

niiWio

 and betweendistancenegativetheis)(5.0where

,2

1 1

 

 

 

 

Page 14: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 14/22

• Example:

 –  Three stored vectors:

 –  Input vector:

 –  Distance: (4, 3, 2)

 –  Output vector 

)1,1,1,1,1(

)1,1,11,1(

)1,1,1,1,1(

3

2

1

i

i

i

)1,1,1,1,1( i

2]5)11111[(5.0

]5)1,1,1,1,1)(1,1,1,1,1[(5.0

3]5)11111[(5.0

]5)1,1,1,1,1)(1,1,11,1[(5.0

4]5)11111[(5.0

]5)1,1,1,1,1)(1,1,1,1,1[(5.0

3

2

1

o

o

o

 –  If we what the vector with smallest distance to I to win, put a Maxnet on top of the Hamming net (for WTA)

• We have a associate memory: input pattern recalls thestored vector that is closest to it (more on AM later)

Page 15: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 15/22

Simple Competitive Learning

• Unsupervised learning

• Goal: –  Learn to form classes/clusters of examplers/sample patterns

according to similarities of these exampers.

 –  Patterns in a cluster would have similar features

 –  No prior knowledge as what features are important for 

classification, and how many classes are there.• Architecture:

 –  Output nodes:

Y_1,…….Y_m,

representing the m classes –  They are competitors

(WTA realized either by

an external procedure or 

 by lateral inhibition as in Maxnet)

Page 16: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 16/22

• Training:

 –  Train the network such that the weight vector w j associatedwith jth output node becomes the representative vector of a

class of input patterns. –  Initially all weights are randomly assigned

 –  Two phase unsupervised learning

• competing phase :

 –  apply an input vector randomly chosen from sample set. –  compute output for all output nodes:

 –  determine the winner among all output nodes (winner isnot given in training samples so this is unsupervised)

• rewarding phase :

 –  the winner is reworded by updating its weights to becloser to (weights associated with all other output nodesare not updated: kind of WTA)

• repeat the two phases many times (and gradually reduce

the learning rate) until all weights are stabilized.

* j  jl   j wio

l i

l i* j

w

Page 17: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 17/22

• Weight update:

 –  Method 1: Method 2

In each method, is moved closer to il 

 –  Normalize the weight vector to unit length after it isupdated

 –  Sample input vectors are also normalized

 –  Distance

)(   jl   j wiw l   j iw

  j  j  j www

  j  j  j www /

w  j 

i l 

i l   – w  j 

η (i l - w  j )

w  j + η(i l - w  j )

 jw

i l 

w  j w  j + ηi l  

ηi l  

i l +  w  j  

l l l  iii /

i i  jil   jl   jl  wiwiwi 2

,,2

)(

Page 18: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 18/22

• is moving to the center of a cluster of sample vectors after 

repeated weight updates

 –  Node j wins for three training

samples: i1 , i2 and i3

 –  Initial weight vector w j(0)

 –  After successively trained

 by i1 , i2 and i3 ,the weight vector 

changes to w j(1),

w j(2), and w j(3),

 jw

i2

i1

i3

w j(0)

w j(1)

w j(2)

w j(3)

Page 19: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 19/22

• A simple example of competitive learning (pp. 168-170)

 –  6 vectors of dimension 3 in 3 classes (6 input nodes, 3 output nodes)

 –  Weight matrices:

Examples

 Node A: for class {i2, i4, i5}

 Node B: for class {i3}

 Node C: for class {i1, i6 }

Page 20: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 20/22

Comments

1. Ideally, when learning stops, each is close to the

centroid of a group/cluster of sample input vectors.2. To stabilize , the learning rate may be reduced slowly

toward zero during learning, e.g.,

3. # of output nodes:

 –  too few: several clusters may be combined into one class

 –  too many: over classification

 –  ART model (later) allows dynamic add/remove outputnodes

4. Initial :

 –  learning results depend on initial weights (node positions) –  training samples known to be in distinct classes, provided

such info is available

 –  random (bad choices may cause anomaly)

5. Results also depend on sequence of sample presentation

 jw

 jw

 jw

)()1( t t 

Page 21: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 21/22

Example

will always win no matter the sample is from which class

is stuck and will not participate

in learning

unstuck:

let output nodes have some conscience

temporarily shot off nodes which have had very highwinning rate (hard to determine what rate should be

considered as “very high”) 

2w

1w

w 1 

w 2 

Page 22: Neural Network Ch5 1

7/27/2019 Neural Network Ch5 1

http://slidepdf.com/reader/full/neural-network-ch5-1 22/22

Example

Results depend on the sequence

of sample presentation

w 1 

w 2 

Solution:

Initialize w j

to randomlyselected input vector il  thatare far away from each other 

w 1 

w 2