Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning

Machine Learning

Neural Networks (3)

Understanding Supervised and Unsupervised Learning

A

BA

B A

B

Two possible Solutions…

A

B

A

B

A

B

A

B A

B

A

B

Supervised Learning

• It is based on a labeled training set.

• The class of each piece of data in training set is known.

• Class labels are pre-determined and provided in the training phase.

A

BA

BA

B

Class

Class

Class

Class

Class

Class

Unsupervised Learning

•Supervised learning, in which an external teacher improves

network performance by comparing desired and actual

outputs and modifying the synaptic weights accordingly.

•However, most of the learning that takes place in our brains

is completely unsupervised.

•This type of learning is aimed at achieving the most efficient

representation of the input space, regardless of any output

space.

Supervised Vs Unsupervised

• Task performedClassificationPattern

Recognition

• NN model :PreceptronFeed-forward NN

“What is the class of this data point?”

• Task performedClustering

• NN Model :Self Organizing Maps

“What groupings exist in this data?”

“How is each data point related to the data set as a whole?”

Unsupervised Learning• Input : set of patterns P, from n-dimensional space S, but

little/no information about their classification, evaluation, interesting features, etc. It must learn these by itself! : )

• Tasks:– Clustering - Group patterns based on similarity – Vector Quantization - Fully divide up S into a small set

of regions (defined by codebook vectors) that also helps cluster P.

– Feature Extraction - Reduce dimensionality of S by removing unimportant features (i.e. those that do not help in clustering P)

Unsupervised learning

• The network must discover for itself patterns, features, regularities,correlations or categories in the input data and code them for the output.

• The units and connections must self-organize themselves based on the stimulus they receive.

• Note that unsupervised learning is useful when there is redundancy in the input data. Redundancy provides knowledge.

Unsupervised Neural Networks –Kohonen Learning

• Also defined – Self Organizing Map• Learn a categorization of input space• Neurons are connected into a 1-D or 2-D lattice.• Each neuron represents a point in N-

dimensional pattern space, defined by N weights• During training, the neurons move around to try

and fit to the data • Changing the position of one neuron in data

space influences the positions of its neighbors via the lattice connections

Unsupervised Learning

•Applications of unsupervised learning include

•Clustering

•Vector quantization

•Data compression

•Feature extraction

Self Organizing Map – Network Structure

• All inputs are connected by weights to each neuron

• size of neighbourhood changes as net learns

• Aim is to map similar inputs (sets of values) to similar neuron positions.

• Data is clustered because it is mapped to the same node or group of nodes

Self-organising maps (SOMs)

• Inspiration from Biology: In auditory pathway

nerve cells arranged in relation to frequency

response (tonotopic organisation).

• Kohonen took inspiration from to produce self-

organising maps (SOMs).

• In SOM units located physically next to one

another will respond to input vectors that are

‘similar’.

SOMs

• Useful, as difficult for Humans to visualise when

data has > 3 dimensions.

• Large dimensional input vectors 'projected down'

onto 2-D map in way maintaining natural order

similarity.

• SOM is 2-D array of neurons, all inputs arriving

at all neurons .

SOMs

• Initially each neuron has own set of (random)

weights.

• When input arrives neuron with pattern of

weights most similar to input gives largest

response.

SOMs

• Positive excitatory feedback between SOM unit

and nearest neighbours.

• Causes all the units in ‘neighbourhood’ of winner

unit to learn.

• As distance from winning unit increases degree of

excitation falls until it becomes inhibition.

• Bubble of activity (neighbourhood) around unit

with largest net input (Mexican-Hat function).

SOMs

• Initially each weight set to random number.

• Euclidean distance D used to find difference between input vectors and weights of SOM units (D = square root of the sum of the squared differences) =

n

iiji wx

1

2)(

SOMs

• For a 2-dimensional problem, the distance calculated in each neuron is:

222

211

2

1

2 )()()( jji

iji wxwxwx

SOM

• Input vector simultaneously compared to all

elements in network, one with lowest D is

winner.

• Update weights all in neighbourhood around

winning unit.

• If winner is ‘c’, neighbourhood defined as being

Mexican Hat function around ‘c’ .

SOMs

• Weights of units are adjusted using:

wij = k(xi – wij )Yj

Where Yj from Mexican Hat function

• k is a value which changes over time (high at start of training, low later on).

SOM-Algorithm1. Initialization :Weights are set to unique random values

2. Sampling : Draw an input sample x and present in to network

3. Similarity Matching : The winning neuron i is the neuron with the weight vector that best matches the input vector

i = argmin(j){ x – wj }

SOM - Algorithm

4. Updating : Adjust the weights of the winning neuron so that they better match the input. Also adjust the weights of the neighbouring neurons.

∆wj = η . hij ( x – wj)

neighbourhood function : hij

• over time neigbourhood function gets smaller

Result: The neurons provide a good approximation of the input space and correspond

Two distinct phases in training

• Initial ordering phase: units find correct topological order (might take 1000 iterations where k decreases from 0.9 to 0.01, Nc decreases l from ½ diameter of the network to 1 unit.

• Final convergence phase: accuracy of weights improves. (k may decrease from 0.01 to 0 while Nc stays at 1 unit. Phase could be 10 to 100 times longer depending on desired accuracy.

23

WEBSOM

All words of document are

mapped into the word category map

Histogram of “hits” on it is formed

Self-organizing map.Largest experiments have used:

• word-category map315 neurons with 270

inputs each• Document-map

104040 neurons with 315 inputs each

Self-organizing semantic map.15x21 neurons

Interrelated words that have similar contexts appear close to each other on the map

Self-organizing maps of document collections.

NN 4 24

WEBSOM

Clustering Data

K-Means Clustering

K-Means ( k , data )• Randomly choose k

cluster center locations

(centroids).• Loop until convergence

• Assign each point to the cluster of the closest centroid.

• Reestimate the cluster centroids based on the data assigned to each.

K-Means Clustering






K-Means Clustering






K-Means Animation

Example generated by

Andrew Moore using Dan Pelleg’s super-duper fast K-means system:

Dan Pelleg and Andrew Moore.

Accelerating Exact k-means

Algorithms with Geometric Reasoning.

Proc. Conference on

Knowledge Discovery in

Databases 1999.

30

K-means Clustering

– Initialize the K weight vectors, e.g. to randomly chosen examples. Each weight vector represents a cluster.

– Assign each input example x to the cluster c(x) with the nearest corresponding weight vector:

– Update the weights:

– Increment n by 1 and go until no noticeable changes of weight vectors occur.

)n(wxmin argc(x) jj

jcluster toassigned examples ofnumber the with

/)1(j c(x) that suchx

j

jj

n

nxnw

Simple competitive learning

32

Issues

• How many clusters?– User given parameter K– Use model selection criteria (Bayesian Information Criterion) with

penalization term which considers model complexity. See e.g. X-means: http://www2.cs.cmu.edu/~dpelleg/kmeans.html

• What similarity measure?– Euclidean distance– Correlation coefficient– Ad-hoc similarity measure

• How to assess the quality of a clustering?– Compact and well separated clusters are better … many different

quality measures have been introduced. See e.g. C. H. Chou, M. C. Su and E. Lai, “A New Cluster Validity Measure and Its Application to Image Compression,” Pattern Analysis and Applications, vol. 7, no. 2, pp. 205-220, 2004. (SCI)

Documents

Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning