Clustering Report

Embed Size (px)

Citation preview

  • 7/27/2019 Clustering Report

    1/34

    CLUSTERING

    Clustering involves grouping data points together according to some measure of

    similarity. One goal of clustering is to extract trends and information from raw data sets.

    An alternative goal is to develop a compact representation of a data set by creating a set

    of models that represent it [1].

    There are two general types of clustering that are used: supervised and

    unsupervised clustering. Supervised clustering uses a set of example data to classify the

    rest of the data set. This can be called as classification and here the task is to learn to

    assign instances to pre-defined classes [2]. For example, consider a set of colored balls

    (all colors) that you want to classify into three groups: red, green, and blue. A logical way

    to do this is to pick out one example of each class--a red ball, a green ball, and a blue

    ball--and set them each next to a bucket. Then go through the remaining balls, compare

    each ball to the three examples and put each ball in the bucket whose example it matches

    the best.

    This example of supervised clustering is illustrative because there are two

    potential problems. First, the result you get is going to be dependent upon the balls you

    select as examples. If you were to select a red, an orange, and a blue ball, then it might be

    difficult to classify a green ball. Second, unless you are careful about selecting examples,

    you may select examples that don't represent the distribution of data. For example, you

    might select red, green, and blue balls, only to discover that most of the colored balls

    were cyan, purple, and magenta (which are in between the other 3 primary colors). This

  • 7/27/2019 Clustering Report

    2/34

    shows the importance of selecting representative samples when you execute supervised

    clustering.

    Unsupervised clustering, on the other hand, tries to discover the natural groupings

    inside a data set without any input from a trainer. The main input a typical unsupervised

    clustering algorithm takes is the number of classes it should find. In the colored balls

    case, this would be like dumping them into an automatic sorting machine and telling it to

    create three piles. The goal of unsupervised clustering is to create three piles where the

    balls within each pile are very similar, but the piles are different from one another. Here

    no pre-defined classification is required. The task is to learn a classification from the

    data.

    One of the most important characteristics of any supervised or unsupervised

    clustering process is how to measure the similarity of two data points. Clustering

    algorithms divide a data set into natural groups( clusters). Instances in the same cluster

    are similar to each other, they share certain properties.

    Clustering algorithms can have different properties [2]:

    Hierarchical: These methods include those techniques where the input data are

    not partitioned into the desired number of classes in a single step. Instead, a series

    of successive fusions of data are performed until the final number of clusters is

    obtained [3].

    Non-Hierarchical or iterative : These methods include those techniques in which a

    desired number of clusters is assumed at the start. Instances are reassigned to

    clusters to improve them.

  • 7/27/2019 Clustering Report

    3/34

    Hard and Soft : Hard clustering assigns each instance to exactly one cluster. Soft

    clustering assigns each instance a probability of belonging to a cluster

    Disjunctive: Instances can be part of more than one cluster

    Figure below shows an illustration of the properties of clustering

    Figure 1 Illustration of properties of clustering

    Un-Supervised Clustering:

    One of the most commonly used un-supervised clustering algorithm is K-means

    algorithm. The algorithm is as follows.

    Specify k, the number of clusters

  • 7/27/2019 Clustering Report

    4/34

    Choose k points randomly as cluster centers

    Assign each instance to its closest cluster center using Euclidian distance

    Calculate the median (mean) for each cluster, use it as its new cluster center

    Reassign all instances to the closest cluster center

    Iterate until the cluster centers do not change any more

    The figure below explains the concept of K-means clustering

    Figure 2: Illustration of K-means algorithm [4]

    A demo of K-means algorithm is shown below. The pictures depict the change of centers

    for 4 clusters for 4 iterations.

  • 7/27/2019 Clustering Report

    5/34

    (2)

    (3)

  • 7/27/2019 Clustering Report

    6/34

    (4)

    After the fourth iteration, the centers do not move much and hence the centers are fixed at

    this position. The disadvantages of this K-means algorithm is, initially one has to mention

    the number of clusters and also with different set of initial random centers, one gets a

    different cluster center in the end.

  • 7/27/2019 Clustering Report

    7/34

    SUPERVISED CLUSTERING ALGORITHMS:

    In this section four different types of supervised clustering algorithms are presented.

    They are Vector quantization, fuzzy clustering, artificial neural net and fuzzy-neural

    algorithms. Though fuzzy and neural nets initially go through unsupervised clustering, to

    determine the cluster centers, only the supervised clustering algorithms are discussed

    here.

    VECTOR QUANTIZATION :

    Origin of this algorithm is Shanons source coding theory, which is used for transmission

    and encoding of data. The algorithm is as follows. A vector quantizer maps k-

    dimensionalvectors in the vector spaceRk into a finite set of vectors Y = {yi: i = 1, 2, ...,

    N} [5]. Each vectoryi is called a code vector or a codeword. and the set of all the

    codewords is called a codebook. Associated with each codeword,yi, is a nearest neighbor

    region called Voronoi region, and it is defined by:

    The set of Voronoi regions partition the entire spaceRk such that:

  • 7/27/2019 Clustering Report

    8/34

    for all i j

    As an example we take vectors in the two dimensional case. Figure 2 shows some

    vectors in space. Associated with each cluster of vectors is a representative codeword

    (cluster center or cluster representative obtained by k-means algorithm or similar

    algorithms). Each codeword resides in its own Voronoi region. These regions are

    separated with imaginary lines in figure 1 for illustration. Given an input vector, the

    codeword that is chosen to represent it is the one in the sa

    Figure 3 : Vector Quantization illustration in 2-D space showing veronoi region formed

    by imaginary lines

    The representative codeword ( cluster center) is determined to be the closest in Euclidean

    distance from the input vector (instances). The Euclidean distance is defined by:

    wherexj is the jth component of the input vector, andyij is the jth is component of the

    codewordyi.

    FUZZY SUPERVISED CLUSTERING :

  • 7/27/2019 Clustering Report

    9/34

    Fuzzy logic is becoming popular in the field of automatic control. Fuzzy logic

    requires no analytical model of the system, and offers the chance to combine heuristic

    knowledge with any model knowledge which may be available [6]. Fuzzy logic can also

    deal with vague or imprecise data. In the field of fault diagnosis, fuzzy logic has been

    used successfully in many applications, both as a means of residual generation, and to aid

    in the decision making process of residual evaluation.

    The idea behind fuzzy clustering is basically that of pattern recognition. Training

    data is used off-line to determine relevant cluster centers for each of the faults of interest.

    On-line, the degree to which the current data belongs to each of the pre-defined clusters is

    determined, and this results in a degree-of-membership to each of the pre-determined

    faults. This method is useful in cases where there are many residuals, or in which no

    expert knowledge of the system is available. Fuzzy clustering is different from fuzzy

    reasoning which is also used in residual analysis. Fuzzy reasoning mainly comprises of

    IF-THEN reasoning based on the sign of the residual. Example of fuzzy reasoning :

    IF residual1 is positive and residual2 is negative THEN fault1 is

    Present

    IF residual1 is zero and residual2 is zero THEN system is fault free

    And so on.

    Clustering is the allocation of data points to a certain number of classes. Each class is

    represented by a cluster center, orprototype, which can be considered as the point which

    best represents the data points in the cluster. The idea behind fuzzy clustering is that each

    data point belongs to all classes with a certain degree of membership. The degree to

    which a data point belongs to a certain class is dependant upon the distance to all cluster

  • 7/27/2019 Clustering Report

    10/34

    centers. For fault diagnosis, each class could correspond to a particular fault. The general

    principle is shown for three inputs and three clusters in Fig. 3.

    Figure 4: Fuzzy clustering concept showing the cluster centers and the membership

    grade of a data point

    The fuzzy clustering fault isolation procedure consists of the following two steps:

    Off-line phase: this is a learning phase which consists of the determination of the

    characteristics (i.e. cluster centers) of the classes. A learning data set is necessary

    for this off-line phase, which must contain residuals for all known faults. (For

    more details on origin of idea of fuzzy clustering refer to [7] )

    On-line phase: This phase calculates the membership degree of the current

    residuals to each of the known classes. In this way each data point does not

    belong to only one cluster, but its membership is distributed among all clusters

    according to the varying degree of resemblance of its features with respect to

    those cluster centers [8].

  • 7/27/2019 Clustering Report

    11/34

    It is important that the training data contains all faults of interest, otherwise they cannot

    be isolated on-line - though unknown faults can in some cases be detected.

    The fuzzy membership matrix and the cluster centers are computed by minimizing the

    following partition formula:

    ik

    mN

    k

    ik

    C

    i

    f dumCJ ,1

    ,

    1

    )(),( ==

    = subject to 11

    , ==

    C

    i

    kiu

    (1)

    Where C denotes the number of clusters, N the number of data points, kiu , , the

    fuzzy membership of the k-th point to the i-th cluster, ikd , the euclidean distance

    between the data point and the cluster center, and ),1( m a fuzzy weighting factor

    which defines the degree of fuzziness of the results. The data class becomes more fuzzy

    and less discriminating with increasing m. Ingeneral, m =2 is chosen ( it is mentioned

    that this value of m does not produce optimal solution for all problems).

    The constraint in eq. (1) implies that each point must entirely distribute its

    membership among all the clusters. The cluster centers (centroids or prototypes) are

    defined as the fuzzy weighted center of gravity of the data x ,

    =

    ==N

    k

    m

    ik

    N

    k

    k

    m

    ik

    i

    u

    xu

    v

    1

    ,

    1

    ,

    )(

    )(

    Ci .....2,1= (2)

    Since kiu ,

    affects the computation of the cluster center iv

    , the data with a high

    membership will influence the prototype location more than points with a low

    membership. For the fuzzy C-means algorithm, distance ikd , is defined as follows

    22

    , )( ikik vxd = (3)

  • 7/27/2019 Clustering Report

    12/34

  • 7/27/2019 Clustering Report

    13/34

    Figure 5 :Matlab fuzzy-logic toolbox demo of Fuzzy C-means clustering for 4 clusters

    ARTIFICIAL NEURAL NET CLUSTERING :

    Before discussing the supervised clustering technique in neural nets, basics of the

    artificial neural network is discussed.

    Artificial Neural Network is a system loosely modeled on the human brain [9]. It

    is an attempt to simulate within specialized hardware or sophisticated software, the

    multiple layers of simple processing elements called neurons. Each neuron is linked to

    certain of its neighbors with varying coefficients of connectivity that represent the

  • 7/27/2019 Clustering Report

    14/34

    strengths of these connections. Learning is accomplished by adjusting these strengths to

    cause the overall network to output appropriate results.The most basic components of

    neural networks are modeled after the structure of the brain. The most basic element of

    the human brain is a specific type of cell, which provides us with the abilities to

    remember, think, and apply previous experiences to our every action. These cells are

    known as neurons, each of these neurons can connect with up to 200000 other neurons.

    The power of the brain comes from the numbers of these basic components and the

    multiple connections between them.

    All natural neurons have four basic components, which are dendrites, soma, axon,

    and synapses. Basically, a biological neuron receives inputs from other sources,

    combines them in some way, performs a generally nonlinear operation on the

    result, and then output the final result. The figure below shows a simplified

    biological neuron and the relationship of its four components.

  • 7/27/2019 Clustering Report

    15/34

    Figure 6 : Four main parts of human nerve cells, based on which artificial neurons are

    designed

    The basic unit of neural networks, the artificial neurons, simulates the four basic

    functions of natural neurons. Artificial neurons are much simpler than the biological

    neuron; the figure below shows the basics of an artificial neuron.

    Figure 7 Structure of an artificial neuron with Hebbian learning ability.

    (weights are adjustable)

    D. Hebb has postulated a principle for a learning process (Hebb, 1949) at the cellular

    level: if Neuron A is stimulated repeatedly by Neuron B at times when Neuron A is

    active, then Neuron A will become more sensitive to stimuli from Neuron B (the

    correlation principle [10]. It implicitly involves adjustments of the strengths of the

    synaptic inputs, which led to the incorporation of adjustable synaptic weightson the input

    lines to excite or inhibit incoming signals.

  • 7/27/2019 Clustering Report

    16/34

  • 7/27/2019 Clustering Report

    17/34

    The architecture for a network that consists of a layerof M perceptrons is shown in Figure 8. An

    input feature vectorx = ( Nxx ...............1 ),is input to the network via the set of N branching

    nodes. The lines fan out at the branching nodes so that each perceptron receives an input from

    each component ofx. At each neuron, the lines fan in from all of the input (branching) nodes.

    Each incoming line is weighted with a synaptic coefficient (weight parameter) from the set

    {wnm}, where wnm weights the line from the nth component xn coming into the mth perceptron.

    Figure 9 : One layer of perceptrons network with N inputs and M perceptrons

    The Perceptron as Hyperplane Separator:

    Consider a perceptron as shown in Figure 7. The input vectorx = (x1,...,xN) is linearly combined

    with the weights to obtain

    ,where b is the threshold. Then s is activated by a threshold function T(-) to produce the output y

    = T(s) = 1 when s >= 0, else y = T(s) = -1. The set of all input vectors x such that

    forms a hyperplane H in the input vector space. H partitions the feature vector space into right

    and left halfspacesH+ and H-.

    bxwxwS NN ++= .........11

    0.........11=++=

    bxwxwSx NN

  • 7/27/2019 Clustering Report

    18/34

    An example: consider a single perceptron with two inputs. Let w1 = 2 andw2 = -1, b=0,

    then 2x1 - x2 = 0 determines H. the points (0,0) and (1,2) belong to H

    The feature vectorx = (x1,x2) = (2,3) is summed into

    S = 2(2) - 1(3) = 1 > 0, so that the activated output is y = T(1) = 1

    (corresponds to H+ in the plane, i.e right half)

    (x1,x2) = (0,2) activates the output y = T(2(0) - 1(2)) = T(-1) = -1,

    which indicates that (0,2) is in the left halfspace H-. The figure below shows these

    points.

    Figure 10 : Illustration of H+ and H-- in the hyperplane

    The above example is a simple linear mapping between the input and the output. Now

    consider another example which illustrates how non-linear relation between input and

    output is implemented. Consider an XOR logic function or 2- bit parity problem.

    N = 2 inputs, M = 1 output, and Q = 4 sample vector (input/output)

    pairs for training, and K= 2 clusters (even and odd).

  • 7/27/2019 Clustering Report

    19/34

    Table below shows the mapping of input and output for this 2-bit parity data.

    Table 1: Logic for 2-bit parity data

    However, we see from Figure 11 below that a single hyperplane can not separate the four

    feature vectors into the required 2 classes, no matter how it is oriented (rotated and

    translated) by the weights.

    Figure 11: Hyperplane diagram for 2-bit parity data, showing one hyperplane is not

    sufficient to separate the data into two clusters

    The power of a single neuron can be greatly amplified by using multiple neurons in a

    network of layered connectionist architecture, as displayed in Figure 12 below. Such a

    multiple layered perceptron(MLP) is also called a feed forward artificial neural network

    and abbreviated to FANN. The modifier "feed forward" distinguishes them from

    feedback (recursive) networks. On the left is the layer of inputs, or branching, nodes,

  • 7/27/2019 Clustering Report

    20/34

    which are not artificial neurons. The hidden layer(the middlelayer here) contains neural

    nodes, as does the output layer on the right. This is the architecture of a two-

    layeredNN(so called because there are two layers of neuronal units).

    Figure 12 : A typical two layered network where the middle layer introduces the required

    non-linearity between input and output layers

    Neural networks may also have multiple hidden layers for the sake of extra power in

    learning to separate nonlinearly separable classes. The Hornik-Stinchcombe-White

    theorem, states that a layered artificial neural network with two layers of neurons is

    sufficient to approximate as closely as desired any piecewise continuous map of a closed

    bounded subset of a finite dimensional space into another finite dimensional space,

    provided there are sufficiently many neurons in the single hidden layer. There is no

    theoretical need to use more than two layers of neurons, which would increase the

    computational complexity and instability in training, and slow down the operation

    because the extra layers cause delays in processing (the idea is that the neurons in a single

    layer are to process in parallel, while the different layers process sequentially). But extra

  • 7/27/2019 Clustering Report

    21/34

    layers can prevent the necessity of using an excessive number of neurons in a single

    hidden layer to achieve highly nonlinear classification.

    Consider the same XOR implementation using the two layered network shown in

    the figure below:

    Figure 13 : A two layered network for XOR logic implementation

    Let

    result is two parallel hyperplanes that yield three convex regions. The hyperplanes are

    determined by

    The threshold at the first neuron in the hidden layer yields

    The threshold at the second hidden neuron yields

    This forces the results listed in Table 2, where we use 0.1 for 0 and 0.9 for 1 (this is the

    usual procedure in using neural networks, because 0 and 1 have special properties that

  • 7/27/2019 Clustering Report

    22/34

    inhibit gradient training).The four sets of above outputs yield the three unique vectors

    (y1,y2) = (0,1), (y1,y2) = (1,1), and (y1,y2) = (0,0) that identify the three linearly

    separable regions shown in Figure 14. We see from the figure that Regions 1 and 3make

    up the odd parity (Class 2),while Region 3 is even parity (Class 1).We saw in the

    previous example that a network of a single layer can not output the two correct classes,

    no matter how we orient the hyperplanes via translation and rotation. In all cases of non

    coincidental hyperplanes, we obtain three or four convex regions (the lower and upper

    bounds, respectively).

    Table 2 : Hidden layer mapping for 2-bitparity function

    To show that the network with a second layer of perceptrons can learn the nonlinearly

    separable classes of even and odd parity (XOR logic), we take the new weights at the

    single output neuron to be in figure 13. These weight the lines on

    which y1 and y2 enter the output neuron (perceptron). Using the hyperplane

  • 7/27/2019 Clustering Report

    23/34

    we need to map y = (1,1) and y = (0,0) into the same class, Class 1, as shown in Figure 14

    below.

    Figure 14 : The Partitioning of the 2-bit Parity Feature Space with Two Perceptron

    Layers

    This is done by choosing the weights(u) as above and threshold to be . The result is

    shown in the table below.

    Table 3: The 2-bit Parity Mapping by Two Layers of Perceptrons

    There are many different kinds of learning rules used by neural networks. The most

    common class of ANNs is called backpropagational neural networks (BPNNs)[11].

    Backpropagation is an abbreviation for the backwards propagation of error. Here

    learning is a supervisedprocess that occurs with each cycle or epoch (i.e. each time

    the network is presented with a new input pattern). It consists of a forward activation,

    which results in flow of input and output of the neurons through the network, and the

  • 7/27/2019 Clustering Report

    24/34

    backward weight adjustment schema based on the error calculated. More simply, when a

    neural network is initially presented with a pattern it makes a random guess as to what it

    might be. It then sees how far its answer was from the actual one and makes an

    appropriate adjustment to its connection weights.

    Backpropagation performs a gradient descent within the weight space towards a

    global minimum. The global minimum is the theoretical solution with the lowest possible

    error. In most problems, the solution space is quite irregular with numerous pits and hills

    which may cause the network to settle down in a local minimum which is not the best

    overall solution. This idea is depicted in figure below.

    Figure 15 The weights versus error space.

    Here for clarity this graph is drawn in two dimensions, however, often we have many

    weights, say n, and this graph would be in n+1 dimensions.

    Since the nature of the error versus weights space can not be known a priori, one

    has to make several neural network analysis with different parameters to determine the

    best solution. The speed of the learning can be controlled by the learning rate. Another

    parameter, momentum, helps the network to overcome obstacles (local minima) in the

  • 7/27/2019 Clustering Report

    25/34

    error surface and settle down at or near the global minimum. The issue of when to stop

    the training is non-trivial. Training should not necessarily proceed to the global

    minimum: this point is per definition optimal for the training set, but that may not be the

    case for an independent data set.

    The math and algorithm is as follows [12].

    The main objective in neural model development is to find an optimal set of weight

    parameters w, such that ),( wxyy = closely represents (approximates) the original

    problem behavior. This is achieved through a process called training (that is, optimization

    in w-space). A set of training data is presented to the neural network. The training data is

    presented to the neural network. The data are pairs of Pkdx kk .......,2,1),,( = , where

    kd is the desired outputs of the neural model for inputs kx andPis the total number of

    training samples.

    During training, the neural network performance is evaluated by computing the

    difference between actual network outputs and desired outputs for all the training

    samples. The difference, also known as the error, is quantified by

    --------(1)

    where jkd is thejth element of kd , ),( wxy kj is thejth neutral network output for the

    input kx , and rT is an index set of training data. The weight parameters w are adjusted

    during training, such that this error is minimized.

  • 7/27/2019 Clustering Report

    26/34

    Training Process :

    The first step in training is to initialize the weight parameters w, and small random values

    are usually suggested. During training, w is updated along negative direction of the

    gradient ofE, asw

    Eww

    = , until Ebecomes small enough. Here, the parameter

    is called the learning rate. If we use just one training sample at a time to update w, then a

    per-sample error function kE given by

    ----(2)

    is used and w is updated asw

    Eww k

    = . The following sub-section describes how the

    error back propagation process can be used to compute the gradient informationw

    Ek

    .

    Error Back Propagation :

    Using the definition of kE in (3.20), the derivative of kE with respect to the weight

    parameters of the lth layer can be computed by simple differentiation as

    ------(3)

    and

    -------(4)

  • 7/27/2019 Clustering Report

    27/34

    The gradient Li

    k

    z

    E

    can be initialized at the output layer as

    -----(5)

    using the error between neural network outputs and desired outputs (training data).

    Subsequent derivatives L

    i

    k

    z

    E

    are computed by back-propagating this error from l+1th

    layer to lth layer (see Figure below) as

    -------(6)

  • 7/27/2019 Clustering Report

    28/34

    Figure 16: Relationship between ith neuron of the lth layer, with neurons of layer l-1 and

    l+1

    For example, if the MLP uses sigmoid (3.6) as hidden neuron activation function,

    -------(7)

    --------(8)

    and

    --------(9)

    For the same MLP network, letl

    i be defined as l

    i

    kl

    i

    E

    = representing local

    gradient at ith neuron oflth layer. The back propagation process is given by,

    -------(10)

    --(11)

    and the derivative with respect to the weights are

  • 7/27/2019 Clustering Report

    29/34

    ----(12)

    The algorithm in pictorial representation is given in figure below.

    Figure 17 : Error back propagation algorithm stepsMatlab neural network tool box has a demonstration for error back propagation

    algorithm, showing the change of error with respect to different combination of weights

    for a two layered network. It also shows how it is possible to get the weights

    corresponding to local minima. The figures below shows the Matlab demo.

  • 7/27/2019 Clustering Report

    30/34

    Figure 18 : Variation of error with respect to layer one weights

  • 7/27/2019 Clustering Report

    31/34

    Figure 19 : Arbitrarily chosen two points on the graph, depict the value of weights thatwill be obtained by the algorithm

    Integration of Fuzzy systems and Neural Networks

    :

    Neural networks process numerical information and exhibit learning capability. Fuzzy

    systems can process linguistic information and represent, say, experts' knowledge by

    fuzzy rules. Thus, the fusion of these two technologies is the current research trend. The

    aim is to be able to create machines with more intelligent behavior [13].

  • 7/27/2019 Clustering Report

    32/34

    Some of the motivations for considering both fuzzy systems and Neural Networks:

    (1) The Knowledge Base of a fuzzy system consists of a collection of "If... Then..." rules

    in which linguistic labels are modeled by membership functions.

    Neural Networks can be used to produce membership functions when available data are

    numerical.

    (2) Moreover, one can take advantage of the learning capability of neural networks to

    adjust membership functions, say in control strategies, to enhance control precision.

    (3) Neural Networks can be used to provide learning methods for fuzzy inference

    procedures.

    (4) In the opposite direction, one can use fuzzy reasoning architecture to construct new

    NeuralNetworks

    (5) One can also fuzzify the Neural Networks architecture to enlarge the domain of

    applications.

    (6) The fusion of Neural Networks and Fuzzy Systems is essentially based upon the fact

    that Neural Networks can learn experts' knowledge (through numerical data) and Fuzzy

    Systems can represent experts' knowledge (through the representation of in-out relation

    by fuzzy reasoning).

    The literatures talk about basically two types of combination

    Neural-Fuzzy system :In this type of systems the learning ability of neural networks is

    utilized to realize the key components of a general fuzzy logic inference system. Neural

    networks are considered in realizing fuzzy membership functions

  • 7/27/2019 Clustering Report

    33/34

    Fuzzy-Neural network system: These models talk of incorporating fuzzy principles in

    neural network, to create a more flexibility and robust system. Inherently neural networks

    model, algorithm can be fuzzified like, fuzzy neurons, fuzzified neural models and neural

    networks with fuzzy training.

    The developments are in progress in this field. There are different proposals for the

    building of these integrated systems and algorithms are in the proposal stage. For more

    detailed explanation of different types of combinations and proposals refer to [14].

    REFERENCES

    [1] http://www.palantir.swarthmore.edu/loicz/help/clustering.htm

    [2] Clustering Connections and statistical language processing , Frank Keller,

    University of Saarlandes

    [3] http://cne.gmu.edu/modules/dau/stat/clustgalgs/clust3_frm.html

  • 7/27/2019 Clustering Report

    34/34

    [4] Refining Initial Points for K-Means Clustering, P. S. Bradley, Computer Sciences

    Department, University of Wisconsin, Usama M. Fayyad, Microsoft Research, Redmond,WA

    [5] http://www.geocities.com/mohamedqasem/vectorquantization/vq.html

    [6] Fuzzy Logic In Fault Diagnosis, Dr. Tracy Dalton, University of Duisburg,

    Germany

    [7] Bezdek J.C., Pattern recognition with fuzzy objective functions algorithms, Plenum

    Press, New York, 1991.

    [8] Adaptive Fuzzy Monitoring and Fault Detection, Stefano Marsili-Libelli,

    [9] An individual project within MISB-420-0, Author: Daniel Klerfors, Professor:

    Dr Terry L. Huston, St.Louis University( http://hem.hj.se/~de96klda/NeuralNetworks.htm )

    [10] Posted notes of Prof. Carl G. Looney - Computer Science Department, University ofNevada .

    ( http://ultima.cs.unr.edu/cs773b/CHAP3.pdf )

    [11] http://www-binf.bio.uu.nl/BPA/NIntro.pdf

    [12] http://www.ieee.cz/knihovna/Zhang/Zhang100-ch03.pdf

    [13] Collection from various websites

    [14] Chin- Teng Lin and C. S. George Lee, Neural Fuzzy Systems , Prentice Hall, NJ.1996