Upload
others
View
23
Download
0
Embed Size (px)
Citation preview
Self-Organizing Maps (SOM)
COMP61021 Modelling and Visualization of High Dimensional Data
Additional reading can be found from non-assessed exercises (week 9) in this course unit teaching page.
Textbook: Ch. 9 in [3]
COMP61021 Modelling and Visualization of High Dimensional Data2
Outline• Introduction • Kohonen SOM• Learning Algorithm• Visualization Method• Examples• Relevant Issues• Conclusions
COMP61021 Modelling and Visualization of High Dimensional Data3
Introduction• Self-organizing maps (SOM)
– SOM is a biologically inspired unsupervised neural network that approximates an unlimited number of input data by a finite set of nodes arranged in a low-dimensional grid, where neighbor nodes correspond to more similar input data.
– The model is produced by a learning algorithm that automatically orders the inputs on a one or two-dimensional grid according to their mutual similarity.
– Useful for clustering analysis and data visualization
Input space Initial weights Final weights
COMP61021 Modelling and Visualization of High Dimensional Data4
Kohonen SOM
Competition
Tww ),( 21=w
Txx ),( 21=x
)()(),( xwxwxwxw −−=−= TEd
2 .,. =Nge
hard-wiredconnection
COMP61021 Modelling and Visualization of High Dimensional Data5
Kohonen SOM
Cooperation
2 :radius"" =ikd
COMP61021 Modelling and Visualization of High Dimensional Data6
Kohonen SOM
Adaptation
(see the algorithm on the following slides for details)
COMP61021 Modelling and Visualization of High Dimensional Data7
Learning Algorithm
1τ
2τ
COMP61021 Modelling and Visualization of High Dimensional Data8
Learning Algorithm
COMP61021 Modelling and Visualization of High Dimensional Data9
Visualization Method
• In 2D/3D dimensional space, neurons are visualized as changing positions in the weight space as learning takes place. Each neuron is described by the corresponding weight vector.
• Two neurons are connected by an edge if they are direct neighbors in the neural network lattice. For 2-D/3-D data, the lattice via weights can be displayed in the original data space.
• The locations specified by weight vectors of neurons in a grid mimic the distribution of the training data.
COMP61021 Modelling and Visualization of High Dimensional Data10
Visualization Method
COMP61021 Modelling and Visualization of High Dimensional Data11
Visualization Method• Example: U-Matrix
COMP61021 Modelling and Visualization of High Dimensional Data12
Examples• Example 1: 1-D self-organizing map
COMP61021 Modelling and Visualization of High Dimensional Data13
Examples• Example 2: 2-D self-organizing map
COMP61021 Modelling and Visualization of High Dimensional Data14
Examples• Example 3: self-organizing map of synthetic data sets
After convergence of SOM learning, we achieve SOMs for different data distributions
The grid mimics the data distribution!
COMP61021 Modelling and Visualization of High Dimensional Data15
Examples• Example 4: Taxonomy of animals
A grouping with SOM according to similarity has emerged
Animal names and their attributes
birds
peaceful
is
has
likesto
Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0
Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 Big 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
2 legs 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 4 legs 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 Hair 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0
Feathers 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 Hunt 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0
Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
hunters
COMP61021 Modelling and Visualization of High Dimensional Data16
Relevant Issues
COMP61021 Modelling and Visualization of High Dimensional Data17
Relevant Issues• SOM extension
– PSOM: continuous projection: interpolation between centroid locations via parameterisation
– disSOM: SOM working on distance between objects; more general than distance Nonnegative Matrix Factorization
– Hierarchical SOM: extension from single to multiple layers for multi-scale data analysis
– Generative topographic map (GTM): a probabilistic counterpart of the SOM and is provably convergent and does not require a shrinking neighborhood or a decreasing step size.
– Kernel SOM: overcome two major limitations of Kohonen SOM
COMP61021 Modelling and Visualization of High Dimensional Data18
Conclusions• Kohonen SOM is a biologically inspired neural network for
high dimensional data clustering and visualisation.• Its most important property is topology preservation.• Learning gets involved in two phases: order vs. convergence• It is no guarantee that SOM is always convergent and hence
the parameter tuning is needed. • There are several variants or extensions, which tends to
overcome the limitations of the SOM.• There are a number of successful applications of SOM.