Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Multiple Spheres SOM
Songwen Zha
30 May 2012
A report submitted for the degree of Master of Computing of
Australian National University
Supervisor: Prof. Tom Gedeon
Page 2
Acknowledgements
I am sincerely and heartily thankful to my supervisor Prof. Tom Gedeon who gave his
guidance and encouragement during the whole project, and also I would like thanks
to Dr. Dingyun Zhu for his sugguestions about the coding process.
Moreover, I thank Dr. Weifa Liang who shared many presentations and writing
techniques for the project with me.
Finally, I also would like to thank my family and my friends for their help and support.
Page 3
Abstract
This report first gives an overview of neural networks and Self-Organizing Maps
(SOMs) and then introduces the Spherical Self-Organizing MAP (SSOM) which avoids
some ill-effects from the SOM. In order to explore a more effective method for
classification, a new type of Self-Organizing Map is proposed which is the Multiple
Spherical Self-Organizing Map (MSSOM).
In this report, I present a new way to create the MSSOM which is based on SSOM. In
this kind of MSSOM, the distance between spheres is no longer a constant, but
depends on the number of spheres and the size of the spheres. Due to the change of
the distance between spheres, the neighborhood structure becomes more flexible. I
also discuss Quantization errors (QE) and Topological errors(TE), which are used to
measure the classification results.
Finally, I designed 4 experiments to demonstrate that the method I presented is
viable. The results show that the method is feasible and worth further investigation.
Page 4
List of Abbreviations
NN Neural Networks
SOM Self-Organizing Map
SSOM Spherical Self –Organizing Map
MSSOM Multiple Spherical Self- Organizing Map
QE Quantization Error
TE Topological Error
Page 5
Table of Contents
Acknowledgements ......................................................................................................................... 2
Abstract ............................................................................................................................................ 3
List of Abbreviations ....................................................................................................................... 4
1 Introduction ................................................................................................................................... 9
1.1 Motivation ............................................................................................................................... 9
1.1 Objective of project ................................................................................................................. 9
1.3 Contribution .......................................................................................................................... 10
1.4 Report organization ............................................................................................................... 10
2 Background and relevant techniques ........................................................................................... 10
2.1 Neural network ..................................................................................................................... 10
2.2 Kohonen’s Self-organizing map ............................................................................................. 12
2.3.Spherical Self-Organizing Maps (S-SOMs) ............................................................................. 13
3. Spherical SOM ............................................................................................................................. 14
3.1 The weight adaptation algorithm for SSOM.......................................................................... 14
3.2 The neighborhood structure for the SSOM ........................................................................... 17
4 Multiple Spheres SOM and relevant techniques......................................................................... 17
4.1 Description ............................................................................................................................ 18
4.2 Process of Multiple Spheres SOM ......................................................................................... 20
4.3 Neighborhoods structure ...................................................................................................... 21
4.4 Distance between spheres .................................................................................................... 22
4.5 Training methods................................................................................................................... 25
4.6 Visualization .......................................................................................................................... 25
4.7 Evaluation .............................................................................................................................. 27
5Experiment of Multiple Spheres SOM ........................................................................................... 28
5.1 Experiment 1: Comparison of different SOMs ...................................................................... 28
5.1.1 Experiment description .................................................................................................. 28
5.1.2 Experiment process and data analysis ........................................................................... 29
5.2 Experiment 2: Searching for a better Multiple Spheres SOMs.............................................. 31
5.2.1Experiment description ................................................................................................... 31
5.2.2 Experiment process and data analysis ........................................................................... 32
5.3 Experiment 3: In-depth experiment on MSSOM ................................................................... 33
5.4 Experiment 4: Another new method to modify the distance ............................................... 35
Page 6
6 Conclusion and future work ........................................................................................................ 35
6.1 Conclusion ............................................................................................................................. 35
6.2 Future work ........................................................................................................................... 36
Reference........................................................................................................................................ 37
Page 7
Figure 1 An example of basic neuron ...................................................................................... 11
Figure 2 2D arrangement ........................................................................................................ 12
Figure 3 Weight Adaption Process .......................................................................................... 16
Figure 4 User interface of Multiple Spheres SOM ................................................................... 18
Figure 5 Data set list ................................................................................................................ 19
Figure 6 Data structure list ...................................................................................................... 19
Figure 7 Loading the data ........................................................................................................ 19
Figure 8 After the training ....................................................................................................... 20
Figure 9 The flow chart of Multiple Spheres SOM .................................................................. 21
Figure 10 2D output neuron map of Multiple Spheres SOM .................................................. 22
Figure 11 4 spheres SOM ........................................................................................................ 23
Figure 12 4 spheres chain glyph visualization ......................................................................... 26
Figure 13 4 spheres equal glyph visualization ......................................................................... 26
Figure 14 SSOM visual effects from different angles .............................................................. 30
Figure 15 4 spheres SOMs visual effects ................................................................................. 30
Figure 16 MSSOM (4s) visual effects ....................................................................................... 33
Page 8
Table 1 Difference between von Neumann and neural network. ........................................... 11
Table 2 Relation between N and the number of neurons in ICOSA ........................................ 17
Table 3 4 patterns input vector in parallel training ................................................................. 25
Table 4 4 patterns input vector in sequence training .............................................................. 25
Table 5 Hardware environment ............................................................................................... 28
Table 6 Attributes information for IRIS .................................................................................... 28
Table 7 Summary of dataset .................................................................................................... 28
Table 8 Summary of dataset .................................................................................................... 29
Table 9 Attributes information for IRIS .................................................................................... 31
Table 10 Summary of dataset .................................................................................................. 31
Table 11 Results of Huajie Wu’s method ................................................................................. 32
Table 12 Results of my method ............................................................................................... 32
Table 13 Results of Huajie’s method ....................................................................................... 34
Table 14 Results of my method ............................................................................................... 34
Table 15 Correlation between number of spheres and QE&TE .............................................. 34
Table 16 Results of new method ............................................................................................. 35
Page 9
1 Introduction
Self-organizing Features Maps(SOMs)[1]are a type of artificial neural network which
can map high-dimensional data to a low-dimensional representation space by
unsupervised learning. The SOM uses a neighborhood function to preserve the
topological mapping from the high-dimensional data space to the map neurons [2]. It
is very useful to visualize some certain characteristics of the model vectors and the
cluster structure by the regular grid structure of the SOM [3].
1.1 Motivation
Ideally, all the neurons in the SOM enjoy the same chance of updating its weights
and its neighbors` weights. However, the conventional SOM is a planar map and the
grids units at the boundary of SOM have fewer neighbors than the inside units. In
other words the grids units at the boundary get less chance to update its weights and
its neighborhood`s weights during the training process, which often leads to the
notorious “border effect” [4]. As the result, the sphere SOM is introduced to perform
better. This is because in the sphere SOM the boundary units are eliminated, all the
units are enjoy the same priority to compete and update.
In order to improve clustering results, the way the distance between spheres is
calculated has been changed. Better clustering could be found by modifying the
distance.
Visualization is a great tool to assist the user to analysis interactive data especially for
discovering and analyzing patterns within large multi-dimensional data sets. Using
certain visual elements, it is much easier to identify the patterns such as color and
shape. Multiple spheres SOM could give a view of three-dimensional (3D) view,
which means the observer may see or find more precise details of informative
patterns.
1.1 Objective of project
This report aims to allow users choose an arbitrary number of spheres to visualized
the clustering results by a modified version of Sangole&Leontitsis’s SSOM code and
explore the influence of the distance between spheres on the clustering results.
Page 10
1.3 Contribution
The contribution of this report is to extend previous work which constructed a
Multiple Spheres SOM by modifying Sangole&Leontitsis’s SSOM code. The previous
work did no t cluster well using many spheres, a new method of calculating the
distance between spheres was introduced to test a large dataset.
1.4 Report organization
Chapter 2 gives a background and overview of the relevant techniques, concepts and
algorithms, including neural networks, SOMs, SSOMs which are preparation for well
understanding Multiple Spheres SOM. Chapter 3 takes a further interpretation and
explanation of Sangole&Leontitsis’s SSOM. Chapter 4 describes how to modify
Sangole&Leontitsis’s SSOM code to change the distance between spheres in order to
explore the influence that the distance between spheres has on the clustering results.
Chapter 5 designs some experiments to examine the hypothesis whether the
distance between spheres would affect the clustering results.
2 Background and relevant techniques
2.1 Neural network
Neural networks or artificial neural networks are composed of interconnecting
artificial neurons which can simulate some properties and functions of biological
neural networks to solve specific artificial intelligence problems [5].
Compared to von Neumann machines, neural networks are a different paradigm for
computing. The Von Neumann machine is based on the processing or memory
abstraction of human information processing. However, neural networks are based
on the parallel multiple processing structure of the human brain. The Table 1 below
shows the main differences between von Neumann and neural network models.
Von Neumann Neural Network
Single CPU Multiple processing
CPU is complex Each neuron has simple function to
process
Processing fast Processing might be slow
Data pathways is simple Interconnections between neurons are
complex
Page 11
Data processing is outstanding Good at pattern recognition
Symbol processing is outstanding Good at sub-symbolic problems
Table 1Difference between von Neumann and neural network.
Neural networks consist of many neurons and each neuron can be described by a
simple model. In the Figure 2, a basic neuron is demonstrated. From the figure we
can see, there is an input layer and an output layer. The parameter X represents input
dimension which could get from the data set or the other neurons, the parameter Y
represents output dimensions, and the parameter W represents the strength of the
connections between input data and neurons. Each input and output could be
multiple dimensions.
X1
X2
XN
……
Y
W1
W2
WN
Inputs Weights Outputs
∑WX
Activation fuction
Figure 1An example of basic neuron
Figure 2 shows a simple single neuron’s adaptive network. At first, when the neuron
gets the input, the input Xn would multiply the weight. Then the neuron would
collect all the adaptive inputs. Finally the neuron would use activation function to
generate the output. The activation function is showed below:
Where f is the activation function, e.g. sigmoid function
y is the output of the neuron
n is the number of the inputs
is the i-th input value
is the weight of the i-th input value
In general, neural networks consist of many different basic neurons. A typical
neural network contains three layers: input layer, output layer, hidden layer. Each
layer takes a critical role during the training process.
There are three main learning algorithms to adapt the weights in neural
Page 12
networks: they are supervised learning, unsupervised and reinforcement
learning.
In supervised learning, the network would have a group of desired outputs, and
each output should be compared with the desired output and then adapt the
weight. In reinforcement learning, some information on the quality of outputs are
required so it can assign credit. In the unsupervised learning, the input should be
modeled statistically by the network. In this report, I will use unsupervised
learning in multiple spheres SOM, which means the network will group the input
data and find suitable patterns.
2.2 Kohonen’s Self-organizing map
In fact, the term “Self-organizing map” could be complemented by many different
approaches. Kohonen’s SOM is called a topology-preserving map [6]. They use a
neighborhood function to preserve the topological properties of the input space.
The self-organizing map describes a mapping from a higher dimensional input
space to a lower dimensional map space.
In this report, I would like to use Kohonen’s Self-organizing map. Therefore, all
the SOM in this report stand for Kohonen’s Self-organizing map. Teuvo Kohonen
first introduced the Self-organizing map in 1982[7]. Kohonen describes SOM as a
visualization and analysis tool for high dimensional data. Moreover, it also
usually can be used for clustering, dimensionality reduction, classification,
sampling, vector quantization, and data-mining.
A significant difference between SOM and the backpropagation neural network is
that SOM only contains an input layer and an output layer without a hidden layer.
In other words, all the input neurons are directly connected to the output
neurons. The figure 2 below shows the two-dimensional arrangement for SOM.
Figure 22D arrangement
Page 13
Next, I would like to introduce the algorithm for Kohonon’s SOM. It is assumed
that the output neurons are connected in an array and the neural network is fully
connected which means all the neurons in the input layer are connected to all the
neurons in the output layer. The weight vectors should be initialized and the
desired number of cycles should be set. Then the competitive learning algorithm
would be used to find the ‘winning neuron’.
First of all, an input vector X should be randomly chosen. Then the following
function would be used to find the nearest distance (Euclidean distance)
between input vector and all the N weight vectors:
, n=1,2…..N (2.1)
In the function above, represents a non-decreasing function depends on
counting, which is used to prevent cluster under-utilization [7].
Then the nearest neuron needs to be found. Therefore, if the distance between
input X and all the weight vectors is computed as the smallest one, the neuron n
is selected as the winning neuron [6].
After that, the weights associated with the winning neuron and all the neurons
residing within its specified neighborhood should be updated. The following
function shows how to update the weights.
(2.2)
(2.3)
Where is a predefined learning rate, ‘s’ is the neighborhood size parameter,
‘R’ is the size of the neighborhood that is to cover a hemisphere - it is considered
constant [6].
Finally, repeating the process before until the desired cycles is arrived or a
predefined stopping criterion is met.
2.3.Spherical Self-Organizing Maps (S-SOMs)
The Spherical Self-Organizing Maps(S-SOMs) is a kind of extension of the SOMs.
The normal SOM leads to the notorious “border effect” with neurons on the edges
of the grid having few neighbors, however, the S-SOM could eliminate these
effects because all the neurons in the S-SOM enjoys the same priority in
competition.
The S-SOM inherit the capability of visualizing high dimensional data which
means S-SOM not only has an overall symmetry and continuity in its structure
Page 14
but also gives a 3D framework for visualizing the data[8].
Actually, there are many different topologies for the S-SOM. In this report, I
would like focus on Sangole&Leontitsis’s S-SOM. it provides the index of the
neighborhood and the relationship between the neurons which would be
convenient to expand to the multiple S-SOM.
3. Spherical SOM
In SSOM, there is a non-linear mapping from the data space to the surface of the
sphere. In that case, all the neurons have the same chance to update themselves
and their neighborhoods. Actually, several approaches have been attempted to
eliminate the “border effect” [9]. Kohonen suggested using a heuristic weighting
rule method. Moreover, the problem also could be solved by implementing on a
torus instead of sphere. However, the torus-based SOM cannot provide an
intuitively readable map. People are more likely understand get a map on a
sphere. In addition, the spherical SOM is more suitable for data with an
underlying directional structure [10]. Therefore, the spherical SOM is considered
as an effective way to map from a higher dimensional input space to a lower
dimensional map space.
3.1 The weight adaptation algorithm for SSOM
In this implementation, all the data vectors are mapped into a spherical SOM. In
the following steps, the winning neuron which is closest to the input vector
would be selected and updated. The data space is Cartesian form. The Figure 3
shows the weight adaption process.
At the beginning of training processing, all the weight vectors should be
initialized and the desired number of epochs also should be set.
In the step 1, an input vector supposed would be selected randomly where
parameter p means the total number of training patterns. After the input vector
is selected, the distance or difference between the input vectors and the
weight vectors should be calculated. From the step 0, we already know the input
vector’s Cartesian coordinate. The weight vector stands by (i,j,k). The following
formula shows the way to compute the distance between input vectors and the
weight vectors. is used to prevent cluster under-utilization which contains a
count-dependent non-decreasing function and stands for the weight
from the nth input to weight vector (i,j,k)th given i=1,2,…I, j=1,2,…J, k=1,2,…K.
Page 15
(3.1)
In step 3, the winning neuron (i,j,k) should be selected by comparing with the
other neurons. The winning neuron would get the chance to update itself and its
neighborhood [8].
In step 4, all the weights associated with the winning neuron (i,j,k) and all the
neurons residing within the specified neighborhood( ) would be updated.
The following formulas show the way how to update the weights.
(3.2)
where
(3.3)
is a learning rate which is predefined, and stands for the initial
neighborhood size [3]. The current neighborhood size should be reduced
gradually
Page 16
Initialize the weight vectors and set the desired number of epochs through the training data set.
Randomly select an input from the dataset.
Calculate the distance between the input vector and the weight vectors for all the neurons in the network.
Select the winning neuron to be the one with the minimum distance.
Update the weights of the winning neuron and its neighborhood.
Yes
Yes
Check whether satisfy the stopping criterion.
Check whether all the inputs have been selected
Finish.
NoNo
Figure 3Weight Adaption Process
during the training process. At the end of process, only the winning neuron can
be updated. The number of epochs is set before as already mentioned in step
0.The neighborhoods size is generally reduced during the training. At the first
quarter of epochs, the current neighborhood size is equal to the initial
neighborhood size. At the second quarter of epochs, the current neighborhood
size is equal to half of the initial neighborhood size. At the third quarter of epochs,
the current neighborhood size is equal to 1. At the last quarter of epochs, the
current neighborhood size is equal to 0[13].
In step 5, whether the stopping criterion is satisfied should be checked. If not
then training process go to step 1 and repeat until the stopping criterion is
Page 17
satisfied.
In the final step, whether all the input patterns have been selected should be
checked to make sure the training process is correct.
3.2 The neighborhood structure for the SSOM
In Sangole’s spherical SOM, they defined the distance between two neurons on
the sphere as the length of the shortest path connecting them. Each neuron
should maintain a list of its immediate neighbors in a multiple dimensional
matrix. It is then easy to find the neighborhood neurons when the winning
neuron is selected.
There are five regular tessellations of sphere platonic polyhedral which are
tetrahedron, octahedron, cube, dodecahedron and icosahedrons [2]. The
icosahedrons-based geodesic dome is chose for the spherical SOM because the
icosahedrons is most similar to the sphere which means the change of the edge
length is smaller than the others. We call this arrangement as , where N
is the number of recursive subdivision. 2+10* neurons in the output space can
be arranged by . Table 1 shows the number of neurons for different
values of N.
N Number of neurons
0 12
1 42
2 162
3 642
4 2562
5 10242
Table 2Relation between N and the number of neurons in ICOSA
From the table we can see that we may not get the accurate suitable structure for
the dataset using a single sphere. However, we can choose amore suitable
structure using multiple spheres.
4 Multiple Spheres SOM and relevant
techniques
In this section, I would like to introduce the training process of multiple spheres
SOM. Then I would explain how to define the neighborhoods structure and the
distance between spheres. Also, some visualization results will be displayed.
Page 18
Moreover, two training methods which are implemented in the program will be
described. Finally, some evaluation criteria which are used to analysis the
classification result will be introduced.
4.1 Description
In this section, I would like introduce the main GUI of the windows for the
software and also describe the user interface.
From figure 4, we can easily find the user interface is friendly. Users can quickly
get the critical information from the user interface.
Figure 4User interface of Multiple Spheres SOM
Before the training, a certain dataset and a certain SOM data structure should be
loaded. All the dataset are saved in ‘.mat’ files (show in Figure 5). All the data
structures are listed in Figure 6 (for more details refer to Table 1).First of all, the
data set which I need to classify should be loaded. Then a suitable data structure
file from the Figure 6 should be loaded. The data structure which I choose
depends on the size of the data set. Finally, the first column parameter which
stands for training parameters on the right side of Figure 4 need to be set.
“Epochs” represents the parallel training cycles, and is set to 44 as the default
value. “Size” stands for the neighborhood size, and is set to 0.5 as the default
value which represents the neighborhood size is to cover a hemisphere and it is
considered constant. “Spheres” represents the number of spheres, and is set to 3
spheres as the default value. “Times” is used to count the times of the sequence
training, and is set to 3 as the default value.
Page 19
Figure 5Data set list
Figure 6Data structure list
The parameters in the second column are used for display, when the training
process is finished, all the parameters will have values.
Training button
Figure 7Loading the data
From Figure 7 we can see after loading the data set file and data structure, the
Page 20
two training button are visible. The first training button is used for the parallel
training which uses the parameter “Epochs” to count the cycles of training. The
second training button is used for the sequence training which uses the
parameter “Times” to count the times of training. I will describe the detail about
these two different training methods in Section 4.6.
Display button
Figure 8After the training
From the Figure 8, we can find that the plot glyph button is visible. The first
button “Plot ‘Chain’ Glyph” is used to create the glyphs in the chain, and the
second button “Plot ‘Equal’ Glyph” is used to generate the glyphs of equal size. I
will describe the details in Section 4.6
If the users want to get more information about this software, the “Help” button
will provide more information to them.
4.2 Process of Multiple Spheres SOM
In this project, Sangole&Leontitsis’s S-SOM would be used to generate the
Multiple Spherical Self-Organized Maps. The flow chart below shows the general
process of Multiple Spheres SOM. From the chart, it is can be seen that the
process of Multiple Spheres SOM consists of four critical parts which are
initialization, training process, visualization and evaluation.
In step 1, the data set file and data structure file are loaded, then the common
parameters such as size and spheres are set. After that, all the variables are saved
in the workspace. Compared to the SSOM, the Multiple Spheres SOM’s
neighborhoods structure is changed because the neighbors of neurons are not all
in the same sphere. The neighbors on the other spheres should be calculated, due
to which the neighborhood list should be reorganized. Also for the neurons on
Page 21
the spheres, the Cartesian coordinates are changed based on the spheres which
we set before. Therefore, we need to resize all the neurons’ coordinates as well.
Load the data set and data
structure
Load the common
parameters and save the
variables
Reorganize the neurons and
their neighbors
Read the epochs for
parallel training
Read the times for sequence
training
Parallel Training
Sequence Training
Step 1 Initialization
Step 1.1Step 1.2
Step 1.3
Step 1.4.1
Step 1.4.2
Step 2.1.1
Step 2.1.2
Step 2 Training process
Step 3 Visualization
Plot ‘Equal’
Glyph
Plot ‘Chain’
Glyph
Step 1.4.2
Step 1.4.2
Distortions and colors
QE and TE
Step 4 Evaluation
Step 4.1.1
Step 4.1.2
Figure 9The flow chart of Multiple Spheres SOM
In step 2, there are two types of training methods could be applied which
depends on the button the user choose. More details will be described in Section
4.5.
After the training process, we need to analyze the training result. Step 3 provides
two different kinds of way to analyze the data in various views which are Plot
‘Chain’ Glyph and Plot ‘Equal’ Glyph. Section 4.6 will give more detail about them.
In the final step, we will evaluate the classification result which we get from the
training process. In Section 4.7, I will give more specific explanation for the
evaluation.
4.3 Neighborhoods structure
One of the different between SSOM and Multiple Spheres SOM is the
neighborhoods structure. The neighborhoods size should become smaller and
smaller during the training. In the initialization, we set neighborhood size equal
0.5 as the default value which means the neighborhood size is to cover a
hemisphere. When the neurons try to cover a hemisphere, it may search to the
Page 22
other spheres. Therefore, the neighborhoods list should be modified. More
neighbors from the other spheres should be added into the list which due to the
training process would become more complex.
a
b
c
de
f
g
h
i
jk
l
m
a
b
c
de
f
g
h
i
jk
l
m
a
b
c
de
f
g
h
i
jk
l
m
Sphere 1.
Sphere 2.
Sphere 3.
Figure 102D output neuron map of Multiple Spheres SOM
The figure 10 shows part of Multiple Spheres SOM. There are three spheres
which use different colors to represent them in this diagram. We suppose the
winning neuron is a on the sphere 1. We also suppose the distance between
adjacent neurons is equal 1, and the distance between spheres is equal 1 as well.
Compared to S-SOM, the neighborhoods size is changed, we use nsize to
represent the neighborhoods size. When nsize=1, the red(left sphere) neurons
from b to g all are the neighbors of winning neuron red a. Because the distance
between spheres also equal 1, the blue(middle sphere)a and also the neighbors
of red a. When nsize=2, the red neurons from h to m are the neighbors of red a
on sphere 1, the blue neurons from b tog are the neighbors of red a on sphere 2,
and the green(right sphere) neuron a is the neighbor of red a on sphere 3.
However, in what we discussed above the distance between spheres is equal to 1.
In fact, the distance between spheres equal 1 does not make sense. In the next
section, I will provide a new way to set the distance between spheres in order to
get better classification.
4.4 Distance between spheres
In the initialization step, we already set the maximum neighborhoods size is
equal to the steps which from a neuron traverse the hemisphere. In fact, in the
S-SOM, it is an effective way to set the neighborhoods size equal 0.5.
However, for the Multiple Spheres SOM, the distance between spheres is
introduced into the project. In the S-SOM, all output neurons are on the one
sphere. If we still only chose the neighborhoods from the same sphere, it is not
Page 23
fair for the other neurons. If we set the distance between spheres much shorter
than the distance between adjacent neurons, it may bring more complex
computing in the training process. If we set the distance much longer than the
distance between adjacent neurons, it may be less effective to use the Multiple
Spheres SOM because the winning neuron may only need search among neurons
which are on the same sphere. As a result, a suitable distance between spheres
needs to be found.
The distance between spheres could be considered as the neighborhoods size in
the Multiple Spheres SOM. In another words, we aim that the steps by which the
neuron traverses the hemisphere is exactly equal to the neuron traversing half of
the number of spheres on the Multiple Spheres SOM. Figure 11 below shows the
example of 4 spheres SOM, and I choose the second sphere as the start sphere.
a
b
c
a
b
c
a
b
c
a
b
c
Distance between spheres
Distance between spheres
Distance between spheres
Distance between spheres
Sphere 1 Sphere 3 Sphere 4Sphere 2
Figure 114 spheres SOM
Following the idea above, in figure 11, we want to find a suitable distance to
make the steps from the a on the sphere 1 to the a on the sphere 3 equal to the
distance from a traversing the hemisphere. From Figure 11, we can easily find
that double distance equals to the steps to traversing the hemisphere which we
Page 24
use rsize to represent. And the half spheres are exactly equal two. If there are six
spheres, we also can find the triple distance is exactly equals to rsize. And the
half spheres are exactly equal three. As a result, we could conclude that make a
conclusion the distance between spheres is equal to double rsize divided by the
number of spheres. The formula as follow:
(4.1)
Where rsize equals the steps from a neuron on the sphere to traversing the
hemisphere.
Moreover, the pseudo code for updating neurons’ neighborhoods list and the
distance between spheres is below:
Algorithm updating neurons’ neighborhoods list and the distance between
spheres
Initialize the neighborhood's data structure;
n spheres of original neighborhood data structure is C;
Cnew=C;
%calculate the distance between spheres
if spheres=1
Distance=0;
else
Distance=2*rsize/spheres
end
% enlarge the neuron’s neighbors in multiple spheres
for i=1 to spheres-1 do
Xnew=[Xnew:X]
Cnew=[Cnew;C]
end
% resize the indexes of the neurons neighbors in multiple spheres
for i=1 to size(C,2)
for j=1 to spheres
for k=1 to nsize %nsize is total number of neurons
Cnew{k+nsize*(j-1),i}=Cnew{k+nsize*(j-1),i}+nsize*(j-1);
End
End
End
rsize is set to span all spheres as much as possible
ifrsize bigger than defaultradiusparementer, the excess are set empty value in order to avoid
over index
%resize the neighborhood list
First get the adjacent spheres index
Second get the k adjacent spheres’s index which distance equal to current rsize-k
Page 25
4.5 Training methods
There are two different training methods are applied on the Multiple Spheres
SOM which are parallel training and sequence training. The parallel training is
the traditional method which I mentioned before in Chapter 3.1. In parallel
training, the winning neuron should always be chosen for the first pattern for
each sphere in Multiple Spheres SOM. However, in sequence training, the first
pattern which the spheres choose is randomly selected. The follow two tables
show how these two training methods work in 4 spheres SOM in which input
vector have 4 patterns.
Epochs\Spheres S1 S2 S3 S4
1 P1 P1 P1 P1
1 P2 P2 P2 P2
1 P3 P3 P3 P3
1 P4 P4 P4 P4
2 P1 P1 P1 P1
… … … … …
Table 34 patterns input vector in parallel training
Times\Spheres S1 S2 S3 S4
1 P4 P3 P2 P1
2 P3 P2 P1 P4
3 P2 P1 P4 P3
4 P1 P4 P3 P2
5 P2 P3 P4 P1
… … … … …
Table 44 patterns input vector in sequence training
where P1 stands for the first pattern of the input vector.
From the tables above, parallel training used 1 epoch to present each pattern
involved in each sphere and sequence training spent 4 times to achieve that.
Therefore, 4 times in sequence training has the same effect as one epoch in
parallel training.
4.6 Visualization
One of advantage of Multiple Spheres SOM is to provide a 3D framework for
visualizing the data. This project also gives two kinds of display methods which
could be convenient for the user to analyze data from different viewpoints.
Page 26
The “Plot ‘Chain’ Glyph” provides the visualization of the chain of spheres. Users
can get a view of all the spheres at the same time. In this way, the center sphere
should be focused on and get the biggest view size, the other spheres size are
decreased by the distance from the center sphere. Users can choose the sphere
they need by adjusting the initialization set. The Figure 11 shows 4 spheres and
the sphere1 is set as the center sphere, where the ends of chain wrap around.
Figure 124 spheres chain glyph visualization
However, sometimes the data set has too much input, which usually means a lot
of spheres are needed. In that case, users may wish have to see a lot of spheres in
the same graph. Even though the user can set the center sphere, it is hard to
recognize the features of smaller spheres. The “Plot ‘Equal’ Glyph” could provide
an arbitrary continuous number of equal-size spheres which could avoid these
problems. Figure 12 shows the same result as Figure 11, and only sphere 1 is
shown on the graph.
Figure 134 spheres equal glyph visualization
Page 27
4.7 Evaluation
There are many ways to evaluate the quality of Multiple Spheres SOM. Here I
would like introduce quantization error and topological error methods which I
used in this project.
The SOM could compress the information while preserving the topological and
the metric relationships of primary data items. When the input dimension is
higher than the output dimension, the vector quantization and topology
preservation always conflict..
However, no matter how much we trained, there would remain some difference
between input patterns and the neuron which they are mapped to. These
differences we consider as the quantization error [11].The Quantization error
(QE) is defined as follow:
(4.2)
Where is the winning neuron vector for each data point .
The topological error (TE) evaluates the complexity of the output space [12]. The
topological error measures the average number of times the second closest
neighbor is not mapped to the first closest neighbor. The more the topological
error, the output space is more complex [11]. High topological error may indicate
the training was not adequate. The Topological error (TE) is defined as follows:
…
(4.3)
Where
…
From the definitions, we can find TE would increase when training because those
non-adjacent neurons on the map will approach each other in the data space. In
contrast, a high QE means that the vectors associated with the map neurons do
not represent the data points well.
Page 28
5 Experiment of Multiple Spheres SOM
All the experiments described here are based on the Multiple Spheres SOM
model. The network and test suites are implemented in the Matlab R2009a
(7.8.0.347) Windows version. The detailed information for the hardware
environment used is listed in the table below:
PC brand ASUS K42JV Laptop
CPU Intel core i5 M460(2.54GHz)
Memory 2.67GHz
Operate System Windows 7-32bit
Video Card NVIDIA Geforce GT335M
Network PCI Express Gigabit Ethernet Adapter
Table 5Hardware environment
5.1 Experiment 1: Comparison of different SOMs
In this experiment, I would like to compare the different SOMs’ quality by use of
the same dataset in both SOMs. The quantization error and topological error
methods are used to evaluate the classification results of different SOMs.
5.1.1 Experiment description
The dataset I chose is a famous dataset which is perhaps the best known
database to be found in the pattern recognition literature. This dataset consists of
3 classes. Each class contains 50 instances. The attribute information for the IRIS
dataset is listed below:
Arribute 1 Arribute 2 Arribute 3 Arribute 4 Arribute 5
sepal length
in cm
sepal width in
cm
petal length
in cm
petal length
in cm
class:
-Iris Setosa
-Iris Versicolor
-Iris Virginica
Table 6Attributes information for IRIS
The following table shows a summary of IRIS dataset properties
Dataset
name
Input
dimensions
Number of
instances
Missing
values
Data types
IRIS 4 150 No Multivariate
Table 7Summary of dataset
Page 29
5.1.2 Experiment process and data analysis
As I mentioned before, I am interested in comparing three different types of
self-organizing map which are Kohonen’s SOM, SSOM and Multiple Spheres SOMs.
This part of the experiment would be to find a better type of self-organizing map
by comparing quantization errors and topological errors among Kohonen’s SOM,
SSOM and Multiple Spheres SOMs. At first, all the parameters should be
initialized. The “Epoch” is set to 40. Epoch means the training times of the
network. In order to get an optimal result, this parameter should not be set too
large or too small, as they may cause overfitting or underfitting respectively. The
parameter “Size” is set to 0.5 which means is the size of the neighborhood that is
needed to cover a hemisphere in a spherical SOM.
Then, in order to best collect scientifically valid data results, I chose similar
numbers of neurons for each type of self-organizing map. Therefore, the
which contains 162 neurons is selected as the SSOM structure, and the
4-spheres with42 neurons, 168 in total is selected as the Multiple
Spheres SOMs.
Finally, I run each structure 10 times and show the average values in the
following table:
Kohonen’s SOM SSOM 4 Spheres SOMs
Quantization
error(QE)
0.255 0.218 0.235
Topological
errors(TE)
0.028 0.021 0.019
Table 8Summary of dataset
From the table above, we can see that the SSOM and 4 spheres SOMs are both
better than Kohonen’s SOM in both Quantization errors(QE) and Topological
errors(TE).
In the introduction section, I mentioned that Kohonen’s SOM has border effects.
The neuron on the border of Kohonen’s SOM has less chance to update itself and
its neighborhood. However, both SSOM and Multiple Spheres SOM are able to
avoid this problem because all the neurons are on the surface of a sphere not in a
planar arrangement. As a result, the SSOM and 4 spheres SOMs get a better result
than Kohonen’s SOM.
The following graphs are given the optimal visual effects by “Equal Glyph” display
schema for SSOM and 4-spheres SOMs. Figure 13 shows SSOM visual effects with
0.192 error in QE and 0.018 error in TE, and the figure 14 shows Multiple
Page 30
Spheres SOMs(4 spheres) visual effect with 0.114 error in QE and 0.013 error in
TE.
Figure 14SSOM visual effects from different angles
Figure 154 spheres SOMs visual effects
From the Table 6 we can see that there are only slight differences in the results of
SSOM and Multiple Spheres SOM. That is most likely because the IRIS dataset
only contains 150 instances and 4 input variables. The dataset is too small to get
a result which differentiates between these two models. As a result, I choose the
ECSH dataset which contains 3641 instances and 7 input vectors to get results on
a more complex dataset.
Page 31
5.2 Experiment 2: Searching for a better Multiple
Spheres SOMs
In this experiment, I would also like to use the quantization error and topological
error methods to evaluate the classification results between different SSOMs. I
chose a more complex dataset in order to get more general results in different
SSOMS.
As I mention in section 4, I introduced a new way to set the distance between
spheres. In HuajieWu’s method, he used 1 as the distance between spheres. In my
opinion, this is not suitable for the Multiple Spheres SOMs, because we already
set the distance between two adjacent neurons is equal to 1, so if we set the
distance between spheres also equal to1, it seems likely we split a big sphere into
some small spheres and the neighborhood structure do not change much.
Therefore, we need a new way to set the distance between spheres. The distance
value in my method depends on the number of spheres and the size of sphere
(the formula mentioned in Section 4.4).
In order to demonstrate whether my method is better, I chose the same dataset
which Wu used in his experiment, and compared the results between Wu’s
method and my method.
5.2.1 Experiment description
This dataset is used for judging whether the reader is looking at text which is
easy, calm, stressful or hard in the reading process. The attribute information for
the ECSH is listed below:
Arribute
1
Arribute
2
Arribute
3
Arribute
4
Arribute
5
Arribute
6
Arribute
7
xGaze
yGaze pupiLdiam pupiRdiam ECG GSR BP
Table 9Attributes information for IRIS
The following table shows the summary of ECSH
Dataset
name
Input
dimensions
Number of
instances
Missing
value
Data types
ECSH 7 3641 Yes Time
sequence
Table 10 Summary of dataset
Page 32
5.2.2 Experiment process and data analysis
In order to get the results comparable to the previous work, I used the same
parameter values as Wu. Therefore, the parameter “Epoch” is set to 20, the “Size”
is set to 0.5. I chose the total neurons in the SSOMs as 2,562 which is same as
Wu’s set. Also, I used the same experiment group as Wu.
The following table shows Wu’s result on ESCH, all the data results are the
average of 10 runs:
Error SSOM
2.652n
MSSOM(4s)
4s*642n
MSSOM(15s)
15s*162n
MSSOM(61s)
61s*42n
MSSOM(214s)
214s*12n
QE 193.77 381.71 188.75 587.73 426.14
TE 0.101 0.328 0.179 0.092 0.105
Table 11Results of Huajie Wu’s method
In Table 7, SSOM is a spherical SOM with 2,562 neurons. MSSOM stands for
Multiple Spheres SOMs. MSSOM (4s) represents 4 spheres with 642 neurons;
MSSOM (15s) which stands for 15 spheres with 162 neurons; MSSOM (61s)
represents 61 spheres of 42 neurons; MSSOM (214s) which represents 214
spheres of 12 neurons.
The table below shows my results on ESCH, all the data results are the average of
10 runs:
Error SSOM
2,652n
MSSOM(4s)
4s*642n
MSSOM(15s)
15s*162n
MSSOM(61s)
61s*42n
MSSOM(214s)
214s*12n
QE 193.77 91.99 431.33 3196.75 3001.99
TE 0.101 0.135 0.136 0.088 0.174
Table 12Results of my method
From the Table 7 and Table 8, we can find that both of these two methods find a
better MSSOM which with lower Quantization error and Topological errors. Both
methods show the smallest topological errors in the same group which is MSSOM
(61s) and almost the same values. Wu said in his report “MSSON (61s) has the
largest average number of neighbors per neuron”. In my method, I changed the
distance between spheres, however, when the number of spheres is big enough,
the distance between spheres become much smaller than 1, then more neurons
get the chance to update itself during training. TE is used to measure the whether
the neuron’s first closest neuron is connected with the second closest neurons.
With the neighbors increasing, the possibility of TE is reduced. Therefore, the
MSSON (61S) has the smallest QE in both methods.
Page 33
However, we also can find that the MSSOM (4s) in my method get a much lower
quantization errors (QE) than either SOM or Wu’s method. In the MSSOM (4s),
the distance between spheres is nearly equal to 5.76 which is significantly larger
than 1. The distance between spheres in MSSOM (4s) is much larger than the
distance between neurons in the same sphere. Therefore, the input patterns have
more chance to be mapped to a similar neuron.
The following graphs show the optimal visual effects by “Equal Glyph” display
schema of MSSOM (4s). The figure 15 shows Multiple Spheres SOMs(4 spheres)
visual effect with 0.114 error in QE and 0.013 error in TE.
Even though I get a much better result than Wu’s in MSSOM (4s), it is still not
completely convincing. If this is correct we need to try other experiments with
large spheres. Therefore, I design the following experiment.
Figure 16MSSOM (4s) visual effects
5.3 Experiment 3: In-depth experiment on MSSOM
In this experiment, I would like use another two topologies of SOM to compare
with Wu’s method. In order to eliminate the hypothesis in experiment 2, I would
use the same structure of MSSOM (4s) but with different numbers.
In experiment 2, MSSOM (4s) contains 4 spheres in which each sphere contain
Page 34
642 neurons. In this experiment, I will use 3 spheres and 5 spheres in which
sphere also contains 642 neurons to determine if the results I get in experiment
2 continues here.
The table 9 shows Wu’s method results. Each value is the average value of 10
run’s results:
Spheres number QE TE
3 416.71 0.383
4 381.71 0.328
5 305.45 0.459
Table 13Results of Huajie’s method
Table 10 shows the results of my method. Each value is also the average value of
10 run’s results:
Spheres number QE TE
3 96.52 0.113
4 91.99 0.135
5 66.25 0.117
Table 14Results of my method
From tables 9 and 10, we can clearly see that both QE and TE in my method is
much less than Wu’s method. Meanwhile, we can find that the QE is increasing
with the number of spheres. Then, the correlation is show below:
The number of spheres QE TE
Zha’s method Positive correlation No correlation
Wu’s method Positive correlation No correlation
Table 15Correlation between number of spheres and QE&TE
From the table above, we can easily get the result is that QE is positively
correlated with the number of spheres and TE has no correlation with the
number of spheres. As a result, we consider that QE has a more impact on results
and we get the same result as experiment 2.
Page 35
5.4 Experiment 4: Another new method to modify the
distance
From the three experiments above, we also see that the method I provided before
is an effective way to do classification on the small number and large size of
spheres. In order to find a more effective way, I modified formula (4.1) and did
the same experiment with the same dataset. The new formula as follow:
(5.1)
The reason why I modified like this is I want to keep the new method effective on
few- large-spheres as before and also require the new method could reduce the
errors on the many-small-spheres. From formula 5.1, we can find when the
spheres equals 4, the results of formula 5.1 is same as the formula 4.1.
spheres Huajie`s Distance=2*rszie/spheres 4*rsize/spheres*sqrt(spheres)
4 387.71 0.328 91.99 0.135 91.99 0.135
15 188.25 0.179 431.33 0.136 428.57 0.076
61 587.73 0.092 3196.75 0.088 2776.96 0.073
Table 16Results of new method
Table 12 shows that the new method slightly reduced the QE and the TE. Because
the distance in the new method would change less than before, the
neighborhoods structure would change more slowly than before. As a result, the
QE and the TE is decreased.
6 Conclusion and future work
6.1 Conclusion
I have proposed a new way to modify the structure of Multiple Spheres SOM. The
neighborhood structure is changed in the new method, the distance between
spheres in the Multiple Spheres SOM is changed depending on the number of
spheres and the size of the spheres. Then I designed three experiments to
demonstrate this method is better than Wu’s method.
In experiment 1, I demonstrate that the Multiple Spheres SOM which I proposed
is better than Kohonen’s SOM. In experiment 2, I get a much better result of QE
on MSSOM (4s), which is almost the half of the best result of Wu’s method. Finally,
I demonstrate not only MSSOM (4s) is better than Wu’s, but also the MSSOM (3s)
Page 36
and MSSOM (5s) are all better than Wu’s.
As a conclusion, my modification of the way to set the distance between spheres
is an effective way to get a better classification.
6.2 Future work
First of all, in order to compare with Wu’s method, I only chose the datasets
which were used in Wu’s experiments. Actually, more datasets should be used to
test on different types of MSSOM and more experiment group need to be
designed to further demonstrate the effectiveness and limitations of my method.
Also, even though I get a better result with a few large spheres, however, when
the number of spheres is large, the distance between spheres would be small.
The neighborhood structure is changed. Usually, when the winning neuron
searches its neighborhoods, the search distance should increase by 1 for each
time. Because all the neighborhoods structure are stored in the matrix, and the
index of matrix should be an integer. When the winning neuron tries to search
its neighborhoods, it may take more time to search on the other spheres which
may bring more QE and TE. It may also take more time to search on more spheres
because the distance between multiple spheres is too small. As a result, in order
to avoid those problems, the formula (4.1) should be modified to cater to the
large numbers of spheres. It should be investigated whether the problem is really
due to the fact that there are many spheres, or is due to the small sized spheres
used. A possibility to investigate is whether there should be a maximum on how
many spheres away a neighborhood can stretch.
Page 37
Reference
[1] Leontitsis.A&Sangole. AP 2006, ‘Estimating an Optimal Neighborhood Size in the
Spherical Self-Organizing Feature Map’, International Journal of Computational
Intelligence, vol 18, no. 35, pp. 192-196.
[2] Nishio.H, Altaf-Ui-Amin.MD, Kurokawa.K, Minato. K&Kanaya, S 2005, ‘Spherical SOM with
arbitrary number of nwurons and measure of suitability’,WSOM, pp. 323-330
[3] Sangole.A& Knopf. GK 2003, ‘Representing high-dimensional data sets as closed surfaces’,
Information Visualization 1.2Jun 2002 pp. 111-124
[4] Tuoya, Suggi. Y, Hitomi Satoh, Dongwen Yu, Matsuura. Y, Tokutaka. H & Seno.2008,
‘Spherical self-organizing map as a helpful tool to identify category-specific cell surface
markers’,Biochemical and Biophusical Research Communciation, vol.376, pp. 414-418
[5] Lawrence, J 1994, Introduction to neural network: design, theory and applications, 6thedn,
California Scientific Software Press.
[6] Kohonen, T 1990,‘TheSelf-Organizing Map’, Proceedings of the IEEE, vol.78, no. 9,
September,pp. 1461-1480
[7] Kohonen, T 1982,‘Self-Organized Formation of Topologicallly Correct Feature Maps’,
Biological Cybernetics, vol.43.
[8] Sangole, A & Knopf, GK 2003, ‘Visualization of randomly ordered numeric data sets using
spherical sel-organizing feature maps’, Computer & Graphics, vol.27, no.6, pp. 963-976.
[9] Wu, Y &Takatsuka, M 2005, ‘Fast Spherical self-organizing map–use of indexed geodesic
data structure’, WSOM
[10] Wu, Y &Takatsuka, M 2006, ‘Spherical self-organizing map using efficient indexed
geodesic data structure’, Neural Networks, vol. 19, no. 6, pp. 900-910
[11] Kirk. J. S, Zurada. M. J 1999,‘Algorithms for improved topology preservation in
self-organizing maps’, IEEE, pp. 396-400
[12] Uriarte, EA & Martin, FD 2005, ‘Topology Preservation in SOM’, Interational Journal of
Mathematical and Computer Science, vol.1, no.1, pp. 1-4.
[13] Sangole, A & Knopf, GK 2003, ‘Geometric Representations for High-Dimensional Data
Using a Spherical SOFM’, Smart Engineering System Design, vol.5, pp. 11-20.