51
Project reminder • Deadline : Monday 16. 5. 11:00 • Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during the Data mining lecture/exercise.

Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Embed Size (px)

Citation preview

Page 1: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Project reminder

• Deadline: Monday 16. 5. 11:00

• Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during the Data mining lecture/exercise.

Page 2: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Self-Organizing Map(SOM)

Page 3: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• Unsupervised neural networks, equivalent to clustering.

• Two layers – input and output– The input layer represents the input variables.– The output layer: neurons arranged in a single line

(one-dimensional) or a two-dimensional grid.• Main feature – weights

Page 4: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• Learning means adopting the weights.• Each output receives inputs through the weights.

weight vector has the same dimensionality as the input vector

• The output of each neuron is its activation – weighted sum of inputs (i.e. linear activation function).

2

w11

w21

u = x1w11 + x2w21

Page 5: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• The objective of learning: project high-dimensional data onto 1D or 2D output neurons.

• Each neuron incrementally learns to represent a cluster of data.

• The weights are adjusted incrementally – the weights of neurons are called codebook vectors (codebooks).

Page 6: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Competitive learning• The so-called competitive learning (winner-

takes-all) .• Competitive learning will be demonstrated on

simple 1D network with two inputs.

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 7: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• First, number of output neurons (i.e. clusters) must be selected.– Not always known, do reasonable estimate, it is

better to use more, not used can be eliminated later.

• Then initialize weights.– e.g. small random values– Or randomly choose some input vectors and use

their values for the weights.• Then competitive learning can begin.

Page 8: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• The activation for each output neuron is calculated as weighted sum of inputs.

• E.g. for the output neuron 1, its activation u1 = w11x1 + w21x2. Generally

• Activation is the dot product between input vector x and weight vector wj.

1

n

j ij ii

u w x

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 9: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• Dot product is not only , but also

• If |x| = |wj| = 1, then uj = cosθ.• The closer these two vectors are (i.e. the

smaller θ is), the bigger the uj is (cos 0 = 1).

1

n

j ij ii

u w x

cosj j u x w

x

w

θ

Page 10: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• Say it again, and loudly:The closer the weight and input vectors are, the bigger the neuron activation is. Dan na Hrad.

• A simple measure of the closeness – Euclidean distance between x and wj.

21

n

j j i iji

d x w

x w

Page 11: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• Scale the input vector so that its length is equal to one. |x|=1

• An input is presented to the network.• Scale weight vectors of individual output

neurons to the unit length. |w|=1• Calculate, how close is input vector x to each

of weight vector wj (j is 1 … # output neurons).• The neuron which codebook is closest to the

input vector becomes winner (BMU, Best Matching Unit).

• Its weights will be updated.

Page 12: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Weight update

• The weight vector w is updated so that it moves closer to the input x.

x

w

d

Δw

jd j jΔw x w

β – learning rate

Page 13: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Recursive vs. batch learning• Conceptually similar to online/batch learning• Recursive learning:

– update weights of the winning neuron after each presentation of input vector

• Batch learning:– the weight update for each input vector is noted– the average weight adjustment for each output neuron

is done after the whole epoch• When to terminate learning?

– mean distance between neurons and inputs they represent is at a minimum

– distance stops changing

Page 14: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Example

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 15: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 16: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

epoch

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 17: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 18: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 19: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Topology is not preserved.

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 20: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Meet today’s hero

Teuvo Kohonen

Page 21: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Self-Organizing Maps

• SOM, also Self-Organizing Feature Map (SOFM), Kohonen neural network.

• Inspired by the function of brain:– Different brain regions correspond to specific

aspects of human activities.– These regions are organized such that tasks of

similar nature (e.g. speech and vision) are controlled by regions that are in spatial proximity each to other.

– This is called topology preservation.

Page 22: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• In SOM learning, not only the winner, but also the neighboring neurons adapt their weights.

• Neurons closer to the winner adjust weights more than farther neurons.

• Thus we need1. to define the size of neighborhood2. to define a way how much neighboring neurons

adapt their weights

Page 23: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Neighborhood definition

neighborhood radius r

1

2 3

12

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

1 12 2

Page 24: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Training in SOM• Follows similar manner of standard winner-takes-

all competitive training.• However, new rule is used for weight changes.• Suppose, that the BMU is at position {iwin, jwin} on

the 2D map.• Then all codebook vectors of BMU and neighbors

are adjusted to w’j according to

where NS is the neighbor strength varying with the distance to the BMU. β is learning rate.

' NSj j j w w x w

Page 25: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Neighbor strength

• When using neighbor features, all neighbor codebooks are shifted towards the input vector.

• However, BMU updates most, and the farther away the neighbor neuron is, the less its weights update.

• The NS function tells us how the weight adjustment decays with distance from the winner.

Page 26: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during
Page 27: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Slide by Johan Everts

Page 28: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Linear

Gaussian

Exponential

2 2NS Exp 2ijd

NS Exp ijkd

Sand

hya

Sam

aras

ingh

e, N

eura

l Net

wor

ks fo

r App

lied

Scie

nces

and

Eng

inee

ring,

200

6

Page 29: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

2D Side Effects

source: http://www.cis.hut.fi/projects/somtoolbox/download/sc_shots2.shtml

Page 30: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Shrinking neighborhood size• Large neighborhood – proper placement of neurons in

the initial stage to broadly represent spatial organization of input data.

• Further refinement – subsequent shrinking of the neighborhood.

• The size of large starting neighborhood is reduced with iterations.

0Expt t T σ0 … initial neighborhood size σt … neighborhood width at iteration t

T … total number of iterations bringing neighborhood to zero (i.e. only winner)

0 1t t T

linear decay exponential decay

2 2

220

NS Exp 2

2

ij

ij

d

Exp d Exp t T

Page 31: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Learning rate decay• The step length (learning rate β) is also reduce with

iterations.• Two common forms: linear or exponential decay

• Strategy: start with relatively high β, decrease gradually, but remain above 0.01

0Expt t T 0 1t t T

T … constant bringing β to zero (or small value)

1 NS , 1j j jt t t d t t t w w x w

Weight update incorporating learning rate and neighborhood decay

Page 32: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Recursive/Batch Learning

• Batch mode, no neigborhood – equivalent to K-means

• Neighbor incorporating – topology preservation– Regions closer in input space are represented by

neurons closer in the map.

Page 33: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Two Phases of SOM Training• Two phases

1. ordering2. convergence

• Ordering– neighborhood and learning rate are reduced to small values– topological ordering– start β high, gradually decrease, remain above 0.01– neighborhood – cover whole output layer

• Convergence– fine tuning with the shrunk neighborhood– small non-zero (~0.01) learning rate, NS no more than 1st

neighborghood

Page 34: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Example contd.

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 35: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

0

2NS Exp 0.1 1

3t td t Tt

neighborhood drops to 0 after 3 iterations

Sand

hya

Sam

aras

ingh

e, N

eura

l Net

wor

ks fo

r App

lied

Scie

nces

and

Eng

inee

ring,

200

6

Page 36: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

After 3 iterations

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

topology preservation takes effect very quickly

Complete trainingConverged after 40 epochs.

Epochs

Page 37: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Complete training

• All vectors have found cluster centers

• Except one

• Solution: add one more neuron

1

2

36

5

4

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 38: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

1

23

65

4

7

Page 39: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

2D output

• Play with http://www.neuroinformatik.ruhr-uni-bochum.de/VDM/research/gsn/DemoGNG/GNG.html

• Self-organizing map

2 2NS Exp 2ijd

maxt t

t i f i

maxt t

t i f i

neighborhood size

learning rate

Page 40: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during
Page 41: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

A self-organizing feature map from a square source space to a square (grid) target space.

Duda, Hart, Stork, Pattern Classification, 2000

Page 42: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Some initial (random) weights and the particular sequence of patterns (randomly chosen) lead to kinks in the map; even extensive further training does not eliminate the kink. In such cases, learning should be re-started with randomized weights and possibly a wider window function and slower decay in learning.

Duda, Hart, Stork, Pattern Classification, 2000

Page 43: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

2D maps on Multidimensional data

• Iris data set– 150 patterns, 4 attributes, 3 classes (Set – 1, Vers – 2,

Virg – 3)– more than 2 dimensions, so all data can not be

vizualized in a meaningful way– SOM can be used not only to cluster input data, but

also to exlpore the relationships between different attributes.

• SOM structure– 8x8, hexagonal, exp decay of learning rate β (βinit = 0.5,

Tmax = 20x150 = 3000), NS: Gaussian

Page 44: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

What can be learned?• petal length and width have similar structure to the class panel low length correlates with low width and these relate to class Versicolor• sepal width – very different pattern• class panel – boundary between Virginica and Setosa – classes overlap

setosa

versicolor

virginica

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 45: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

• Since we have class labels, we can assess the classification accuracy of the map.

• So first we train the map using all 150 patterns.

• And then we present input patterns aindividually again and note the winning neuron.– The class to which the input belongs is the class

associated with this BMU codebook vector (see previous slide, Class panel).

– Only the winner decides classification.

Page 46: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Vers – 100% accuracySet – 86%Virg – 88%Overall accuracy = 91.3%

Vers – 100% accuracySet – 90%Virg – 94%Overall accuracy = 94.7%

Sandhya Samarasinghe, Neural Networks for Applied Sciences and Engineering, 2006

Page 47: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

U-matrix• Distance between the neighboring codebook

vectors can highlight different cluster regions in the map and can be a useful visualization tool

• Two neurons: w1 = {w11, w21, … wn1}, w2 = {w12, w22, … wn2}

• Euclidean distance between them

• The average of the distance to the nearest neighbors – unified distance, U -matrix

2 2 2

12 11 12 21 22 1 2n nd w w w w w w

Page 48: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

The larger the distance between neurons, the larger the U value and more separated the clusters. The lighter the color, the larger the U value.

Large distance between this cluster (Iris versicolor) and the middle cluster (Iris setosa). Large distances between codebook vectors indicate a sharp boundary between the clusters.

Page 49: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Surface graphThe height represents the distance.

3rd row – large height = separation

Other two clusters are not separated.

Page 50: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Quantization error• Measure of the distance between codebook

vectors and inputs.• If for input vector x the winner is wc, then

distortion error e can be calculated as

• Comput e for all input vectors and get average – quantization error, average map distortion error E.

, NS ,cii

e d e d c ix w x w

1 1, NS ,ci

n n i

E d E dN N

n ni ix w x w

Page 51: Project reminder Deadline: Monday 16. 5. 11:00 Prepare 10 minutes long pesentation (in Czech/Slovak), which you’ll present on Wednesday 18. 5. 2011 during

Iris quantization error

High distortion error indicates areas where the codebook vector is relatively far from the inputs. Such information can be used to refine the map to obtain a more uniform distortion error measure if a more faithful reproduction of the input distribution from the map is desired.