Upload
octavia-byrd
View
239
Download
5
Embed Size (px)
Citation preview
Machine Learning
Neural Networks (3)
Understanding Supervised and Unsupervised Learning
A
BA
B A
B
Two possible Solutions…
A
B
A
B
A
B
A
B A
B
A
B
Supervised Learning
• It is based on a labeled training set.
• The class of each piece of data in training set is known.
• Class labels are pre-determined and provided in the training phase.
A
BA
BA
B
Class
Class
Class
Class
Class
Class
Unsupervised Learning
•Supervised learning, in which an external teacher improves
network performance by comparing desired and actual
outputs and modifying the synaptic weights accordingly.
•However, most of the learning that takes place in our brains
is completely unsupervised.
•This type of learning is aimed at achieving the most efficient
representation of the input space, regardless of any output
space.
Supervised Vs Unsupervised
• Task performedClassificationPattern
Recognition
• NN model :PreceptronFeed-forward NN
“What is the class of this data point?”
• Task performedClustering
• NN Model :Self Organizing Maps
“What groupings exist in this data?”
“How is each data point related to the data set as a whole?”
Unsupervised Learning• Input : set of patterns P, from n-dimensional space S, but
little/no information about their classification, evaluation, interesting features, etc. It must learn these by itself! : )
• Tasks:– Clustering - Group patterns based on similarity – Vector Quantization - Fully divide up S into a small set
of regions (defined by codebook vectors) that also helps cluster P.
– Feature Extraction - Reduce dimensionality of S by removing unimportant features (i.e. those that do not help in clustering P)
Unsupervised learning
• The network must discover for itself patterns, features, regularities,correlations or categories in the input data and code them for the output.
• The units and connections must self-organize themselves based on the stimulus they receive.
• Note that unsupervised learning is useful when there is redundancy in the input data. Redundancy provides knowledge.
Unsupervised Neural Networks –Kohonen Learning
• Also defined – Self Organizing Map• Learn a categorization of input space• Neurons are connected into a 1-D or 2-D lattice.• Each neuron represents a point in N-
dimensional pattern space, defined by N weights• During training, the neurons move around to try
and fit to the data • Changing the position of one neuron in data
space influences the positions of its neighbors via the lattice connections
Unsupervised Learning
•Applications of unsupervised learning include
•Clustering
•Vector quantization
•Data compression
•Feature extraction
Self Organizing Map – Network Structure
• All inputs are connected by weights to each neuron
• size of neighbourhood changes as net learns
• Aim is to map similar inputs (sets of values) to similar neuron positions.
• Data is clustered because it is mapped to the same node or group of nodes
Self-organising maps (SOMs)
• Inspiration from Biology: In auditory pathway
nerve cells arranged in relation to frequency
response (tonotopic organisation).
• Kohonen took inspiration from to produce self-
organising maps (SOMs).
• In SOM units located physically next to one
another will respond to input vectors that are
‘similar’.
SOMs
• Useful, as difficult for Humans to visualise when
data has > 3 dimensions.
• Large dimensional input vectors 'projected down'
onto 2-D map in way maintaining natural order
similarity.
• SOM is 2-D array of neurons, all inputs arriving
at all neurons .
SOMs
• Initially each neuron has own set of (random)
weights.
• When input arrives neuron with pattern of
weights most similar to input gives largest
response.
SOMs
• Positive excitatory feedback between SOM unit
and nearest neighbours.
• Causes all the units in ‘neighbourhood’ of winner
unit to learn.
• As distance from winning unit increases degree of
excitation falls until it becomes inhibition.
• Bubble of activity (neighbourhood) around unit
with largest net input (Mexican-Hat function).
SOMs
• Initially each weight set to random number.
• Euclidean distance D used to find difference between input vectors and weights of SOM units (D = square root of the sum of the squared differences) =
n
iiji wx
1
2)(
SOMs
• For a 2-dimensional problem, the distance calculated in each neuron is:
222
211
2
1
2 )()()( jji
iji wxwxwx
SOM
• Input vector simultaneously compared to all
elements in network, one with lowest D is
winner.
• Update weights all in neighbourhood around
winning unit.
• If winner is ‘c’, neighbourhood defined as being
Mexican Hat function around ‘c’ .
SOMs
• Weights of units are adjusted using:
wij = k(xi – wij )Yj
Where Yj from Mexican Hat function
• k is a value which changes over time (high at start of training, low later on).
SOM-Algorithm1. Initialization :Weights are set to unique random values
2. Sampling : Draw an input sample x and present in to network
3. Similarity Matching : The winning neuron i is the neuron with the weight vector that best matches the input vector
i = argmin(j){ x – wj }
SOM - Algorithm
4. Updating : Adjust the weights of the winning neuron so that they better match the input. Also adjust the weights of the neighbouring neurons.
∆wj = η . hij ( x – wj)
neighbourhood function : hij
• over time neigbourhood function gets smaller
Result: The neurons provide a good approximation of the input space and correspond
Two distinct phases in training
• Initial ordering phase: units find correct topological order (might take 1000 iterations where k decreases from 0.9 to 0.01, Nc decreases l from ½ diameter of the network to 1 unit.
• Final convergence phase: accuracy of weights improves. (k may decrease from 0.01 to 0 while Nc stays at 1 unit. Phase could be 10 to 100 times longer depending on desired accuracy.
23
WEBSOM
All words of document are
mapped into the word category map
Histogram of “hits” on it is formed
Self-organizing map.Largest experiments have used:
• word-category map315 neurons with 270
inputs each• Document-map
104040 neurons with 315 inputs each
Self-organizing semantic map.15x21 neurons
Interrelated words that have similar contexts appear close to each other on the map
Self-organizing maps of document collections.
NN 4 24
WEBSOM
Clustering Data
K-Means Clustering
K-Means ( k , data )• Randomly choose k
cluster center locations
(centroids).• Loop until convergence
• Assign each point to the cluster of the closest centroid.
• Reestimate the cluster centroids based on the data assigned to each.
K-Means Clustering
K-Means ( k , data )• Randomly choose k
cluster center locations
(centroids).• Loop until convergence
• Assign each point to the cluster of the closest centroid.
• Reestimate the cluster centroids based on the data assigned to each.
K-Means Clustering
K-Means ( k , data )• Randomly choose k
cluster center locations
(centroids).• Loop until convergence
• Assign each point to the cluster of the closest centroid.
• Reestimate the cluster centroids based on the data assigned to each.
K-Means Animation
Example generated by
Andrew Moore using Dan Pelleg’s super-duper fast K-means system:
Dan Pelleg and Andrew Moore.
Accelerating Exact k-means
Algorithms with Geometric Reasoning.
Proc. Conference on
Knowledge Discovery in
Databases 1999.
30
K-means Clustering
– Initialize the K weight vectors, e.g. to randomly chosen examples. Each weight vector represents a cluster.
– Assign each input example x to the cluster c(x) with the nearest corresponding weight vector:
– Update the weights:
– Increment n by 1 and go until no noticeable changes of weight vectors occur.
)n(wxmin argc(x) jj
jcluster toassigned examples ofnumber the with
/)1(j c(x) that suchx
j
jj
n
nxnw
Simple competitive learning
32
Issues
• How many clusters?– User given parameter K– Use model selection criteria (Bayesian Information Criterion) with
penalization term which considers model complexity. See e.g. X-means: http://www2.cs.cmu.edu/~dpelleg/kmeans.html
• What similarity measure?– Euclidean distance– Correlation coefficient– Ad-hoc similarity measure
• How to assess the quality of a clustering?– Compact and well separated clusters are better … many different
quality measures have been introduced. See e.g. C. H. Chou, M. C. Su and E. Lai, “A New Cluster Validity Measure and Its Application to Image Compression,” Pattern Analysis and Applications, vol. 7, no. 2, pp. 205-220, 2004. (SCI)