View
218
Download
0
Category
Preview:
Citation preview
7/27/2019 Clustering Report
1/34
CLUSTERING
Clustering involves grouping data points together according to some measure of
similarity. One goal of clustering is to extract trends and information from raw data sets.
An alternative goal is to develop a compact representation of a data set by creating a set
of models that represent it [1].
There are two general types of clustering that are used: supervised and
unsupervised clustering. Supervised clustering uses a set of example data to classify the
rest of the data set. This can be called as classification and here the task is to learn to
assign instances to pre-defined classes [2]. For example, consider a set of colored balls
(all colors) that you want to classify into three groups: red, green, and blue. A logical way
to do this is to pick out one example of each class--a red ball, a green ball, and a blue
ball--and set them each next to a bucket. Then go through the remaining balls, compare
each ball to the three examples and put each ball in the bucket whose example it matches
the best.
This example of supervised clustering is illustrative because there are two
potential problems. First, the result you get is going to be dependent upon the balls you
select as examples. If you were to select a red, an orange, and a blue ball, then it might be
difficult to classify a green ball. Second, unless you are careful about selecting examples,
you may select examples that don't represent the distribution of data. For example, you
might select red, green, and blue balls, only to discover that most of the colored balls
were cyan, purple, and magenta (which are in between the other 3 primary colors). This
7/27/2019 Clustering Report
2/34
shows the importance of selecting representative samples when you execute supervised
clustering.
Unsupervised clustering, on the other hand, tries to discover the natural groupings
inside a data set without any input from a trainer. The main input a typical unsupervised
clustering algorithm takes is the number of classes it should find. In the colored balls
case, this would be like dumping them into an automatic sorting machine and telling it to
create three piles. The goal of unsupervised clustering is to create three piles where the
balls within each pile are very similar, but the piles are different from one another. Here
no pre-defined classification is required. The task is to learn a classification from the
data.
One of the most important characteristics of any supervised or unsupervised
clustering process is how to measure the similarity of two data points. Clustering
algorithms divide a data set into natural groups( clusters). Instances in the same cluster
are similar to each other, they share certain properties.
Clustering algorithms can have different properties [2]:
Hierarchical: These methods include those techniques where the input data are
not partitioned into the desired number of classes in a single step. Instead, a series
of successive fusions of data are performed until the final number of clusters is
obtained [3].
Non-Hierarchical or iterative : These methods include those techniques in which a
desired number of clusters is assumed at the start. Instances are reassigned to
clusters to improve them.
7/27/2019 Clustering Report
3/34
Hard and Soft : Hard clustering assigns each instance to exactly one cluster. Soft
clustering assigns each instance a probability of belonging to a cluster
Disjunctive: Instances can be part of more than one cluster
Figure below shows an illustration of the properties of clustering
Figure 1 Illustration of properties of clustering
Un-Supervised Clustering:
One of the most commonly used un-supervised clustering algorithm is K-means
algorithm. The algorithm is as follows.
Specify k, the number of clusters
7/27/2019 Clustering Report
4/34
Choose k points randomly as cluster centers
Assign each instance to its closest cluster center using Euclidian distance
Calculate the median (mean) for each cluster, use it as its new cluster center
Reassign all instances to the closest cluster center
Iterate until the cluster centers do not change any more
The figure below explains the concept of K-means clustering
Figure 2: Illustration of K-means algorithm [4]
A demo of K-means algorithm is shown below. The pictures depict the change of centers
for 4 clusters for 4 iterations.
7/27/2019 Clustering Report
5/34
(2)
(3)
7/27/2019 Clustering Report
6/34
(4)
After the fourth iteration, the centers do not move much and hence the centers are fixed at
this position. The disadvantages of this K-means algorithm is, initially one has to mention
the number of clusters and also with different set of initial random centers, one gets a
different cluster center in the end.
7/27/2019 Clustering Report
7/34
SUPERVISED CLUSTERING ALGORITHMS:
In this section four different types of supervised clustering algorithms are presented.
They are Vector quantization, fuzzy clustering, artificial neural net and fuzzy-neural
algorithms. Though fuzzy and neural nets initially go through unsupervised clustering, to
determine the cluster centers, only the supervised clustering algorithms are discussed
here.
VECTOR QUANTIZATION :
Origin of this algorithm is Shanons source coding theory, which is used for transmission
and encoding of data. The algorithm is as follows. A vector quantizer maps k-
dimensionalvectors in the vector spaceRk into a finite set of vectors Y = {yi: i = 1, 2, ...,
N} [5]. Each vectoryi is called a code vector or a codeword. and the set of all the
codewords is called a codebook. Associated with each codeword,yi, is a nearest neighbor
region called Voronoi region, and it is defined by:
The set of Voronoi regions partition the entire spaceRk such that:
7/27/2019 Clustering Report
8/34
for all i j
As an example we take vectors in the two dimensional case. Figure 2 shows some
vectors in space. Associated with each cluster of vectors is a representative codeword
(cluster center or cluster representative obtained by k-means algorithm or similar
algorithms). Each codeword resides in its own Voronoi region. These regions are
separated with imaginary lines in figure 1 for illustration. Given an input vector, the
codeword that is chosen to represent it is the one in the sa
Figure 3 : Vector Quantization illustration in 2-D space showing veronoi region formed
by imaginary lines
The representative codeword ( cluster center) is determined to be the closest in Euclidean
distance from the input vector (instances). The Euclidean distance is defined by:
wherexj is the jth component of the input vector, andyij is the jth is component of the
codewordyi.
FUZZY SUPERVISED CLUSTERING :
7/27/2019 Clustering Report
9/34
Fuzzy logic is becoming popular in the field of automatic control. Fuzzy logic
requires no analytical model of the system, and offers the chance to combine heuristic
knowledge with any model knowledge which may be available [6]. Fuzzy logic can also
deal with vague or imprecise data. In the field of fault diagnosis, fuzzy logic has been
used successfully in many applications, both as a means of residual generation, and to aid
in the decision making process of residual evaluation.
The idea behind fuzzy clustering is basically that of pattern recognition. Training
data is used off-line to determine relevant cluster centers for each of the faults of interest.
On-line, the degree to which the current data belongs to each of the pre-defined clusters is
determined, and this results in a degree-of-membership to each of the pre-determined
faults. This method is useful in cases where there are many residuals, or in which no
expert knowledge of the system is available. Fuzzy clustering is different from fuzzy
reasoning which is also used in residual analysis. Fuzzy reasoning mainly comprises of
IF-THEN reasoning based on the sign of the residual. Example of fuzzy reasoning :
IF residual1 is positive and residual2 is negative THEN fault1 is
Present
IF residual1 is zero and residual2 is zero THEN system is fault free
And so on.
Clustering is the allocation of data points to a certain number of classes. Each class is
represented by a cluster center, orprototype, which can be considered as the point which
best represents the data points in the cluster. The idea behind fuzzy clustering is that each
data point belongs to all classes with a certain degree of membership. The degree to
which a data point belongs to a certain class is dependant upon the distance to all cluster
7/27/2019 Clustering Report
10/34
centers. For fault diagnosis, each class could correspond to a particular fault. The general
principle is shown for three inputs and three clusters in Fig. 3.
Figure 4: Fuzzy clustering concept showing the cluster centers and the membership
grade of a data point
The fuzzy clustering fault isolation procedure consists of the following two steps:
Off-line phase: this is a learning phase which consists of the determination of the
characteristics (i.e. cluster centers) of the classes. A learning data set is necessary
for this off-line phase, which must contain residuals for all known faults. (For
more details on origin of idea of fuzzy clustering refer to [7] )
On-line phase: This phase calculates the membership degree of the current
residuals to each of the known classes. In this way each data point does not
belong to only one cluster, but its membership is distributed among all clusters
according to the varying degree of resemblance of its features with respect to
those cluster centers [8].
7/27/2019 Clustering Report
11/34
It is important that the training data contains all faults of interest, otherwise they cannot
be isolated on-line - though unknown faults can in some cases be detected.
The fuzzy membership matrix and the cluster centers are computed by minimizing the
following partition formula:
ik
mN
k
ik
C
i
f dumCJ ,1
,
1
)(),( ==
= subject to 11
, ==
C
i
kiu
(1)
Where C denotes the number of clusters, N the number of data points, kiu , , the
fuzzy membership of the k-th point to the i-th cluster, ikd , the euclidean distance
between the data point and the cluster center, and ),1( m a fuzzy weighting factor
which defines the degree of fuzziness of the results. The data class becomes more fuzzy
and less discriminating with increasing m. Ingeneral, m =2 is chosen ( it is mentioned
that this value of m does not produce optimal solution for all problems).
The constraint in eq. (1) implies that each point must entirely distribute its
membership among all the clusters. The cluster centers (centroids or prototypes) are
defined as the fuzzy weighted center of gravity of the data x ,
=
==N
k
m
ik
N
k
k
m
ik
i
u
xu
v
1
,
1
,
)(
)(
Ci .....2,1= (2)
Since kiu ,
affects the computation of the cluster center iv
, the data with a high
membership will influence the prototype location more than points with a low
membership. For the fuzzy C-means algorithm, distance ikd , is defined as follows
22
, )( ikik vxd = (3)
7/27/2019 Clustering Report
12/34
7/27/2019 Clustering Report
13/34
Figure 5 :Matlab fuzzy-logic toolbox demo of Fuzzy C-means clustering for 4 clusters
ARTIFICIAL NEURAL NET CLUSTERING :
Before discussing the supervised clustering technique in neural nets, basics of the
artificial neural network is discussed.
Artificial Neural Network is a system loosely modeled on the human brain [9]. It
is an attempt to simulate within specialized hardware or sophisticated software, the
multiple layers of simple processing elements called neurons. Each neuron is linked to
certain of its neighbors with varying coefficients of connectivity that represent the
7/27/2019 Clustering Report
14/34
strengths of these connections. Learning is accomplished by adjusting these strengths to
cause the overall network to output appropriate results.The most basic components of
neural networks are modeled after the structure of the brain. The most basic element of
the human brain is a specific type of cell, which provides us with the abilities to
remember, think, and apply previous experiences to our every action. These cells are
known as neurons, each of these neurons can connect with up to 200000 other neurons.
The power of the brain comes from the numbers of these basic components and the
multiple connections between them.
All natural neurons have four basic components, which are dendrites, soma, axon,
and synapses. Basically, a biological neuron receives inputs from other sources,
combines them in some way, performs a generally nonlinear operation on the
result, and then output the final result. The figure below shows a simplified
biological neuron and the relationship of its four components.
7/27/2019 Clustering Report
15/34
Figure 6 : Four main parts of human nerve cells, based on which artificial neurons are
designed
The basic unit of neural networks, the artificial neurons, simulates the four basic
functions of natural neurons. Artificial neurons are much simpler than the biological
neuron; the figure below shows the basics of an artificial neuron.
Figure 7 Structure of an artificial neuron with Hebbian learning ability.
(weights are adjustable)
D. Hebb has postulated a principle for a learning process (Hebb, 1949) at the cellular
level: if Neuron A is stimulated repeatedly by Neuron B at times when Neuron A is
active, then Neuron A will become more sensitive to stimuli from Neuron B (the
correlation principle [10]. It implicitly involves adjustments of the strengths of the
synaptic inputs, which led to the incorporation of adjustable synaptic weightson the input
lines to excite or inhibit incoming signals.
7/27/2019 Clustering Report
16/34
7/27/2019 Clustering Report
17/34
The architecture for a network that consists of a layerof M perceptrons is shown in Figure 8. An
input feature vectorx = ( Nxx ...............1 ),is input to the network via the set of N branching
nodes. The lines fan out at the branching nodes so that each perceptron receives an input from
each component ofx. At each neuron, the lines fan in from all of the input (branching) nodes.
Each incoming line is weighted with a synaptic coefficient (weight parameter) from the set
{wnm}, where wnm weights the line from the nth component xn coming into the mth perceptron.
Figure 9 : One layer of perceptrons network with N inputs and M perceptrons
The Perceptron as Hyperplane Separator:
Consider a perceptron as shown in Figure 7. The input vectorx = (x1,...,xN) is linearly combined
with the weights to obtain
,where b is the threshold. Then s is activated by a threshold function T(-) to produce the output y
= T(s) = 1 when s >= 0, else y = T(s) = -1. The set of all input vectors x such that
forms a hyperplane H in the input vector space. H partitions the feature vector space into right
and left halfspacesH+ and H-.
bxwxwS NN ++= .........11
0.........11=++=
bxwxwSx NN
7/27/2019 Clustering Report
18/34
An example: consider a single perceptron with two inputs. Let w1 = 2 andw2 = -1, b=0,
then 2x1 - x2 = 0 determines H. the points (0,0) and (1,2) belong to H
The feature vectorx = (x1,x2) = (2,3) is summed into
S = 2(2) - 1(3) = 1 > 0, so that the activated output is y = T(1) = 1
(corresponds to H+ in the plane, i.e right half)
(x1,x2) = (0,2) activates the output y = T(2(0) - 1(2)) = T(-1) = -1,
which indicates that (0,2) is in the left halfspace H-. The figure below shows these
points.
Figure 10 : Illustration of H+ and H-- in the hyperplane
The above example is a simple linear mapping between the input and the output. Now
consider another example which illustrates how non-linear relation between input and
output is implemented. Consider an XOR logic function or 2- bit parity problem.
N = 2 inputs, M = 1 output, and Q = 4 sample vector (input/output)
pairs for training, and K= 2 clusters (even and odd).
7/27/2019 Clustering Report
19/34
Table below shows the mapping of input and output for this 2-bit parity data.
Table 1: Logic for 2-bit parity data
However, we see from Figure 11 below that a single hyperplane can not separate the four
feature vectors into the required 2 classes, no matter how it is oriented (rotated and
translated) by the weights.
Figure 11: Hyperplane diagram for 2-bit parity data, showing one hyperplane is not
sufficient to separate the data into two clusters
The power of a single neuron can be greatly amplified by using multiple neurons in a
network of layered connectionist architecture, as displayed in Figure 12 below. Such a
multiple layered perceptron(MLP) is also called a feed forward artificial neural network
and abbreviated to FANN. The modifier "feed forward" distinguishes them from
feedback (recursive) networks. On the left is the layer of inputs, or branching, nodes,
7/27/2019 Clustering Report
20/34
which are not artificial neurons. The hidden layer(the middlelayer here) contains neural
nodes, as does the output layer on the right. This is the architecture of a two-
layeredNN(so called because there are two layers of neuronal units).
Figure 12 : A typical two layered network where the middle layer introduces the required
non-linearity between input and output layers
Neural networks may also have multiple hidden layers for the sake of extra power in
learning to separate nonlinearly separable classes. The Hornik-Stinchcombe-White
theorem, states that a layered artificial neural network with two layers of neurons is
sufficient to approximate as closely as desired any piecewise continuous map of a closed
bounded subset of a finite dimensional space into another finite dimensional space,
provided there are sufficiently many neurons in the single hidden layer. There is no
theoretical need to use more than two layers of neurons, which would increase the
computational complexity and instability in training, and slow down the operation
because the extra layers cause delays in processing (the idea is that the neurons in a single
layer are to process in parallel, while the different layers process sequentially). But extra
7/27/2019 Clustering Report
21/34
layers can prevent the necessity of using an excessive number of neurons in a single
hidden layer to achieve highly nonlinear classification.
Consider the same XOR implementation using the two layered network shown in
the figure below:
Figure 13 : A two layered network for XOR logic implementation
Let
result is two parallel hyperplanes that yield three convex regions. The hyperplanes are
determined by
The threshold at the first neuron in the hidden layer yields
The threshold at the second hidden neuron yields
This forces the results listed in Table 2, where we use 0.1 for 0 and 0.9 for 1 (this is the
usual procedure in using neural networks, because 0 and 1 have special properties that
7/27/2019 Clustering Report
22/34
inhibit gradient training).The four sets of above outputs yield the three unique vectors
(y1,y2) = (0,1), (y1,y2) = (1,1), and (y1,y2) = (0,0) that identify the three linearly
separable regions shown in Figure 14. We see from the figure that Regions 1 and 3make
up the odd parity (Class 2),while Region 3 is even parity (Class 1).We saw in the
previous example that a network of a single layer can not output the two correct classes,
no matter how we orient the hyperplanes via translation and rotation. In all cases of non
coincidental hyperplanes, we obtain three or four convex regions (the lower and upper
bounds, respectively).
Table 2 : Hidden layer mapping for 2-bitparity function
To show that the network with a second layer of perceptrons can learn the nonlinearly
separable classes of even and odd parity (XOR logic), we take the new weights at the
single output neuron to be in figure 13. These weight the lines on
which y1 and y2 enter the output neuron (perceptron). Using the hyperplane
7/27/2019 Clustering Report
23/34
we need to map y = (1,1) and y = (0,0) into the same class, Class 1, as shown in Figure 14
below.
Figure 14 : The Partitioning of the 2-bit Parity Feature Space with Two Perceptron
Layers
This is done by choosing the weights(u) as above and threshold to be . The result is
shown in the table below.
Table 3: The 2-bit Parity Mapping by Two Layers of Perceptrons
There are many different kinds of learning rules used by neural networks. The most
common class of ANNs is called backpropagational neural networks (BPNNs)[11].
Backpropagation is an abbreviation for the backwards propagation of error. Here
learning is a supervisedprocess that occurs with each cycle or epoch (i.e. each time
the network is presented with a new input pattern). It consists of a forward activation,
which results in flow of input and output of the neurons through the network, and the
7/27/2019 Clustering Report
24/34
backward weight adjustment schema based on the error calculated. More simply, when a
neural network is initially presented with a pattern it makes a random guess as to what it
might be. It then sees how far its answer was from the actual one and makes an
appropriate adjustment to its connection weights.
Backpropagation performs a gradient descent within the weight space towards a
global minimum. The global minimum is the theoretical solution with the lowest possible
error. In most problems, the solution space is quite irregular with numerous pits and hills
which may cause the network to settle down in a local minimum which is not the best
overall solution. This idea is depicted in figure below.
Figure 15 The weights versus error space.
Here for clarity this graph is drawn in two dimensions, however, often we have many
weights, say n, and this graph would be in n+1 dimensions.
Since the nature of the error versus weights space can not be known a priori, one
has to make several neural network analysis with different parameters to determine the
best solution. The speed of the learning can be controlled by the learning rate. Another
parameter, momentum, helps the network to overcome obstacles (local minima) in the
7/27/2019 Clustering Report
25/34
error surface and settle down at or near the global minimum. The issue of when to stop
the training is non-trivial. Training should not necessarily proceed to the global
minimum: this point is per definition optimal for the training set, but that may not be the
case for an independent data set.
The math and algorithm is as follows [12].
The main objective in neural model development is to find an optimal set of weight
parameters w, such that ),( wxyy = closely represents (approximates) the original
problem behavior. This is achieved through a process called training (that is, optimization
in w-space). A set of training data is presented to the neural network. The training data is
presented to the neural network. The data are pairs of Pkdx kk .......,2,1),,( = , where
kd is the desired outputs of the neural model for inputs kx andPis the total number of
training samples.
During training, the neural network performance is evaluated by computing the
difference between actual network outputs and desired outputs for all the training
samples. The difference, also known as the error, is quantified by
--------(1)
where jkd is thejth element of kd , ),( wxy kj is thejth neutral network output for the
input kx , and rT is an index set of training data. The weight parameters w are adjusted
during training, such that this error is minimized.
7/27/2019 Clustering Report
26/34
Training Process :
The first step in training is to initialize the weight parameters w, and small random values
are usually suggested. During training, w is updated along negative direction of the
gradient ofE, asw
Eww
= , until Ebecomes small enough. Here, the parameter
is called the learning rate. If we use just one training sample at a time to update w, then a
per-sample error function kE given by
----(2)
is used and w is updated asw
Eww k
= . The following sub-section describes how the
error back propagation process can be used to compute the gradient informationw
Ek
.
Error Back Propagation :
Using the definition of kE in (3.20), the derivative of kE with respect to the weight
parameters of the lth layer can be computed by simple differentiation as
------(3)
and
-------(4)
7/27/2019 Clustering Report
27/34
The gradient Li
k
z
E
can be initialized at the output layer as
-----(5)
using the error between neural network outputs and desired outputs (training data).
Subsequent derivatives L
i
k
z
E
are computed by back-propagating this error from l+1th
layer to lth layer (see Figure below) as
-------(6)
7/27/2019 Clustering Report
28/34
Figure 16: Relationship between ith neuron of the lth layer, with neurons of layer l-1 and
l+1
For example, if the MLP uses sigmoid (3.6) as hidden neuron activation function,
-------(7)
--------(8)
and
--------(9)
For the same MLP network, letl
i be defined as l
i
kl
i
E
= representing local
gradient at ith neuron oflth layer. The back propagation process is given by,
-------(10)
--(11)
and the derivative with respect to the weights are
7/27/2019 Clustering Report
29/34
----(12)
The algorithm in pictorial representation is given in figure below.
Figure 17 : Error back propagation algorithm stepsMatlab neural network tool box has a demonstration for error back propagation
algorithm, showing the change of error with respect to different combination of weights
for a two layered network. It also shows how it is possible to get the weights
corresponding to local minima. The figures below shows the Matlab demo.
7/27/2019 Clustering Report
30/34
Figure 18 : Variation of error with respect to layer one weights
7/27/2019 Clustering Report
31/34
Figure 19 : Arbitrarily chosen two points on the graph, depict the value of weights thatwill be obtained by the algorithm
Integration of Fuzzy systems and Neural Networks
:
Neural networks process numerical information and exhibit learning capability. Fuzzy
systems can process linguistic information and represent, say, experts' knowledge by
fuzzy rules. Thus, the fusion of these two technologies is the current research trend. The
aim is to be able to create machines with more intelligent behavior [13].
7/27/2019 Clustering Report
32/34
Some of the motivations for considering both fuzzy systems and Neural Networks:
(1) The Knowledge Base of a fuzzy system consists of a collection of "If... Then..." rules
in which linguistic labels are modeled by membership functions.
Neural Networks can be used to produce membership functions when available data are
numerical.
(2) Moreover, one can take advantage of the learning capability of neural networks to
adjust membership functions, say in control strategies, to enhance control precision.
(3) Neural Networks can be used to provide learning methods for fuzzy inference
procedures.
(4) In the opposite direction, one can use fuzzy reasoning architecture to construct new
NeuralNetworks
(5) One can also fuzzify the Neural Networks architecture to enlarge the domain of
applications.
(6) The fusion of Neural Networks and Fuzzy Systems is essentially based upon the fact
that Neural Networks can learn experts' knowledge (through numerical data) and Fuzzy
Systems can represent experts' knowledge (through the representation of in-out relation
by fuzzy reasoning).
The literatures talk about basically two types of combination
Neural-Fuzzy system :In this type of systems the learning ability of neural networks is
utilized to realize the key components of a general fuzzy logic inference system. Neural
networks are considered in realizing fuzzy membership functions
7/27/2019 Clustering Report
33/34
Fuzzy-Neural network system: These models talk of incorporating fuzzy principles in
neural network, to create a more flexibility and robust system. Inherently neural networks
model, algorithm can be fuzzified like, fuzzy neurons, fuzzified neural models and neural
networks with fuzzy training.
The developments are in progress in this field. There are different proposals for the
building of these integrated systems and algorithms are in the proposal stage. For more
detailed explanation of different types of combinations and proposals refer to [14].
REFERENCES
[1] http://www.palantir.swarthmore.edu/loicz/help/clustering.htm
[2] Clustering Connections and statistical language processing , Frank Keller,
University of Saarlandes
[3] http://cne.gmu.edu/modules/dau/stat/clustgalgs/clust3_frm.html
7/27/2019 Clustering Report
34/34
[4] Refining Initial Points for K-Means Clustering, P. S. Bradley, Computer Sciences
Department, University of Wisconsin, Usama M. Fayyad, Microsoft Research, Redmond,WA
[5] http://www.geocities.com/mohamedqasem/vectorquantization/vq.html
[6] Fuzzy Logic In Fault Diagnosis, Dr. Tracy Dalton, University of Duisburg,
Germany
[7] Bezdek J.C., Pattern recognition with fuzzy objective functions algorithms, Plenum
Press, New York, 1991.
[8] Adaptive Fuzzy Monitoring and Fault Detection, Stefano Marsili-Libelli,
[9] An individual project within MISB-420-0, Author: Daniel Klerfors, Professor:
Dr Terry L. Huston, St.Louis University( http://hem.hj.se/~de96klda/NeuralNetworks.htm )
[10] Posted notes of Prof. Carl G. Looney - Computer Science Department, University ofNevada .
( http://ultima.cs.unr.edu/cs773b/CHAP3.pdf )
[11] http://www-binf.bio.uu.nl/BPA/NIntro.pdf
[12] http://www.ieee.cz/knihovna/Zhang/Zhang100-ch03.pdf
[13] Collection from various websites
[14] Chin- Teng Lin and C. S. George Lee, Neural Fuzzy Systems , Prentice Hall, NJ.1996
Recommended